More optimization!

It’s Saturday morning, cats outside woke me up at 0615, everyone else is in bed. Time for some CC2 modding.

In my previous posts about profiling and CPU usage I made use of a primitive call timer. I’d forgotten at the time that I’d hacked up a basic call count profiler from lua.org examples

The lua.org example I had used the os module for the clock, but we don’t have that in CC2. I put it to one side and forgot about it.

Now, I want to focus my efforts better, there isn’t a great deal of worth in optimising a function that only gets called once every few seconds and doesn’t take very log anyway. But a function called a lot? My hacky profiler does record call counts using the lua debug library! So I have some thing I can look at!

Here we go, this is the total call count from calling the screen_vehicle_control script’s “update()” function in a tight loop for 20 seconds.

@scripts/library_vehicle.lua:1554      29445	2

That caught my eye! It’s called 29 thousand times and even with our low-res clock timer it accounts for 2 whole seconds of call time.

But why doesn’t it have a function name? hmm..

Hm. doesn’t look too scary.. oh…

It’s being called via a protected call! These are not fast. I originally added these so that bugs wouldn’t cause the whole screen to stop working. But pcalls are EXPENSIVE.

Without the call counter we get to measure the current update() performance with this pcall still in place for a 20 second loop:

calls	11436
calls/sec	571.8

Ok, lets remove the pcall wrapper and see what we get..

calls	11480
calls/sec	574.0

Not much difference.. boo.. Its’ still a better, but I had hoped for more.. What else can we look at. There are some more pcalls still there so lets kill those off.

calls	11639
calls/sec	581.95

A bit better!

Hmm. To reduce the _overall_ use of revolution, I added some randomisation and caching, so not all of the scanning/search/mapping functions all get called every time, in-fact most of the time the expensive ones aren’t called.

So, for timing only, we force the fog-of-war and radar refresh to happen every time, and here we go, this is the timing where we do everything in the update call

------- > 
start timer	3	1748090317	10033
done timer	1748090337
calls	5513
calls/sec	275.65
timer armed
------- > 
start timer	3	1748090351	10433
done timer	1748090371
calls	5510
calls/sec	275.5

With this, (and on this map I have) we have a stable speed of between 270-276 calls per second.

Lets look at the call counts now.

Right, we have a bit more to think about. There are maybe some things we can cache.

  • get_is_vehicle_air – 12770
  • _get_radar_attachment – 10496
  • get_vehicle_team_id – 10676
  • get_vehicle_docked – 69743

Ok, lets tackle get_vehicle_docked(), This is an intresting one, because it has to work in the HUD and in the other scripts, where there are two different ways of detecting that a unit is docked:

Left is before, and right is after. We use a global table to cache for a few seconds (2 sec – 60 ticks) the docked-state of each vehicle we check.

With these changes we are up to..

------- > 
start timer	3	1748092065	31645
done timer	1748092085
calls	5576
calls/sec	278.8
timer armed
------- > 
start timer	3	1748092100	32070
done timer	1748092120
calls	5568
calls/sec	278.4
timer armed
------- > 
start timer	3	1748092134	32467
done timer	1748092154
calls	5590
calls/sec	279.5

So we’ve gained a bit close to 3 more calls per sec.

Ok.. lets look at “get_is_vehicle_air()” which is a very simple function:

function get_is_vehicle_air(definition_index)
    return definition_index == e_game_object_type.chassis_air_wing_light
        or definition_index == e_game_object_type.chassis_air_wing_heavy
        or definition_index == e_game_object_type.chassis_air_rotor_light
        or definition_index == e_game_object_type.chassis_air_rotor_heavy
end

At first glance, we can’t do much to make this go faster. It’s between 1 and 4 integer comparisons.

Maybe we can..

If we look at “library_enum.lua” we have:

e_game_object_type = {
	chassis_carrier = 0,
	chassis_carrier_broken = 1,
        chassis_land_wheel_light = 2,
	chassis_land_wheel_light_broken = 3,
	chassis_land_wheel_medium = 4,
	chassis_land_wheel_medium_broken = 5,
	chassis_land_wheel_heavy = 6,
	chassis_land_wheel_heavy_broken = 7,
        chassis_air_wing_light = 8,
	chassis_air_wing_light_broken = 9,
	chassis_air_wing_heavy = 10,
	chassis_air_wing_heavy_broken = 11,
	chassis_air_rotor_light = 12,
	chassis_air_rotor_light_broken = 13,
	chassis_air_rotor_heavy = 14,
	chassis_air_rotor_heavy_broken = 15,
        chassis_sea_barge = 16,
<snip>

The scripts never ever get to see the _broken values, so we can make this go a bit faster by returning true for values of 8-14

This makes our script look like..

local chassis_air_min = e_game_object_type.chassis_air_wing_light
local chassis_air_max = e_game_object_type.chassis_air_rotor_heavy

function get_is_vehicle_air(definition_index)
    return definition_index >= chassis_air_min and definition_index <= chassis_air_max
end

We have gone from up to 4 global table lookups and up to 4 comparisons, I don’t expect this to have made a huge difference. But lets see if it helps.

Huh.. maybe it helped more!

start timer	3	1748092857	33063
done timer	1748092877
calls	5713
calls/sec	285.65
timer armed
------- > 
start timer	3	1748092892	33491
done timer	1748092912
calls	5667
calls/sec	283.35
timer armed
------- > 
start timer	3	1748092932	34073
done timer	1748092952
calls	5616
calls/sec	280.8

We’re now at 280-286 calls/sec

Ok, lets do “_get_radar_attachment()”.

Right. timing..

calls	5648
calls/sec	282.4
calls	5417
calls/sec	270.85
calls	5580
calls/sec	279.0

Hm.. not obviously faster.. If anything, worse hmm.

Hmm.. I will investigate further!

By:

Posted in:


One response to “More optimization!”

Leave a comment