1.31: Quit shining that light in my face

Oh, wait a second – I think that’s the light at the end of the tunnel!

Yes, I’m pleased to announce – although Rafter is going to flay me alive – that Battleground Europe v1.31 is likely to release within the next Two Weeks!

I think this is going to be a fun weekend: 99% of the CTDs turned out to have a singular cause, so once Ramp fixed those, 1.31 shifted into it’s final gear. There was also a connectivity issue between two host processes that was causing almost all of the host issues, so now that’s fixed…

But the fun is playing 1.31, albeit on the Open Beta server. Having some really great battles there. I think even Toto had fun today.

I’ve spent most of the last couple of months struggling away with the symptoms of that connectivity issue, mostly in the bowels of the cell host code. By last weekend, my head was on the verge of exploding from the fact that every angle I took just broke things differently, as has always tended to be the case with the affected cell host code.

Bad enough that, last Sunday – and for recreation – I rewrote the update system.

Oh – that’s not going into 1.31, that’d delay it another 6 months easily. Rather, I coded it up and did some preliminary testing on it.

It’s using similar concepts to area chat’s “grid system”. The current system uses almost all of the CPU in the process of selecting whom to send updates to, building a vis list for them, preparing that list for culling and polishing the list. The actual sending of the list is very trivial.

At every juncture, the code turned out to be resistant to parallelization without significant work, and the last thing I wanted to do at the end of a dev cycle was … rewrite a major system.

It’s actually a pretty simple bean-counting operation. It’s just complex because it was written over-complexly when less of the 3rd-party tools (such as STL) were commonly and reliably available.

Sadly, it’s also written to self-optimize for circa 2001 hardware. When you put it on modern hardware, the code kicks the CPUs ass in every conceivable way.

My approach migrates the overheads to suit modern architecture.

For example: the spaces in the grid will probably be about 1-2km on a side. Under normal circumstances, no vehicle is going to traverse these spaces more than once every few updates. So it is relatively safe to transfer work from the per-vehicle observation of the world to management of the spaces.

Each space in the grid self-associates with its neighbors when it becomes active (populated) and disassociates itself when it becomes inactive (empty).

While it’s unlikely a player will transition between cells on every update, it’s not impossible: a player running in an ordinal direction right on the boundary may “wobble” between cells from one update to another.

So the work of recovering inactive cells is deferred to a periodic “garbage collection” task.

I also found that for every vehicle that every person sees on their vis list, the host does a “is it alive?” check. This can leave ghost vehicles around, but not doing the check had all kinds of nasty side effects. So, I relocated that work to a management step, reducing the work done per-update-per-observed-vehicle by a small amount that adds up over the entire loop. (.01ms per loop, or 1ms of CPU time per second).

By writing from the top-down, from scratch, and by judicious use of const and restrict markup, I can also be confident that it is reasonably thread safe – which means I’ll be able to execute steps in parallel.

One of the more annoying aspects of the old code was its re-use of a “holder” concept. Throughout most of the code, a xxx_HOLDER is used for linking an entry within a list: it has a prev, a next and an item pointer. In the old cell/grid system it got padded out with a number of similar(ish) variants, all of which assumed they were exactly the same as this-or-that.

The main flavor was the “GRID_HOLDER” which contains prev, next and vehicle pointers. This is used for maintaining vis lists. When it gets around to building your next update, it checks to see if it’s own copy of the vehicle ID matches that at the vehicle pointer. If it doesn’t, then the vehicle has been freed!

AAAARRRGGGHHH!!!

This structure also contained several other fields, designed to help track what state you had last seen the vehicle in. Unfortunately, all of the information [has to] get overwritten when the holder is first accessed. Since this is rabbit hole guaranteed to have an angry badger at the bottom, the rest of the host code just works around this lack of delta information :(

I was able to discard the rest of the HOLDER structures and use Standard Template structures for managing the data sets instead. This provides much faster access to elements of the structures at the cost of extra management overhead – however, that is safely delegated away from the list generation.

The remaining holder, I renamed “Impression”: it captures an impression of the vehicle when last observed, so that it can provide delta information the host can use to reduce the amount of data that has to be sent.

By far, the biggest crime of the old update system is in-place sorting. Everything it does uses linked lists that are sorted in-place.

The loop starts by iterating over every vehicle in the game, including those that aren’t on this host – so it can do it’s “is it alive?” check.

Optimization #1: stay on task, only iterate over the active vehicles for this loop.

For each vehicle, it checks to see if it is ready for another update, assigns it a priority based on time since last update, and then in-place sorts it onto a linked list.

Optimization #2: take the weight and a pointer to the vehicle, and stuff them into a simple array.

Optimization #3: when the list is built, if it is larger than the max updates allowed per iteration, sort it using a fast, parallel sort algorithm.

Once we have the list of vehicles to update, it starts iterating over each vehicle and doing a series of checks on the vehicle’s data and status (sending EWS info to strat, sending area chat info the chat host, etc).

Optimization #4: none of these tasks are done every update, none is done more than once per second; move these tasks to the once-a-second “is it alive?” loop that checks for ghosts/timed out vehicles.

Now we get to the “World Update” phase. This is where we determine which vehicles the player can see and any strat updates they may need.

Every vehicle in the game world has a flag, “inUpdate”, which tells the code whether or not it is in update being currently generated (thread-unsafe, causes writes to ~1000 pages of memory).

Then we check the player’s previous vis list: First, the HOLDERS are all removed from the player’s list and moved to a temporary (again, in-place sorted) list. We then extract each HOLDER from the temporary list and check to see if it’s still valid.

Next we perform a subset of the main vis-list rules: Is this the vehicle I’m riding (so I have to see it)? Is this my vehicle (if I’m multi-crewed)? Is this vehicle still in range?

Based on this, a score is assigned, and then the vehicle is placed onto a “biasing” list that tries to further prioritize vehicles based on the viewer and the subject.

Then we go through the vis list and set all the vehicles inUpdate flag to TRUE.

Any vehicles that were deemed no-longer visible during this process have their HOLDERs shuffled onto a different (global) list so that we can tell the client which vehicles to stop drawing.

Optimization #5: Since there is a hard-limit to the number of vehicles you can see, there’s also a hard-limit to the number you might stop seeing in any one update. So add a simple array of vehicle IDs to the world update structure and simply add each removed vehicle to the end of the list and put the HOLDER directly onto the free list.

Optimization #6: Check the Impression list to see if a vehicle occurs; guaranteed to be faster than writing 1000 pages of memory every time. Also makes the operation thread-safe, so that the cost can later be amortized by running several iterations at once.

Optimization #7: Don’t in-place sort. The vis list is now an unsorted std::array of Impressions. If we remove vehicles, I use the “tail swap trick”

std::array<int> foo ;
foo.push_back(3) ;
foo.push_back(4) ;
foo.push_back(5) ;

std::array<int>::iterator it = find(foo.begin(), foo.end(), 4) ;

if ( it != foo.end() )
{
   // tail-swap trick
*it = foo.back() ;
foo.pop_back() ;
}

The cost of this is the copy of foo.back() into *it plus decrementing the size counter for foo.

The host now has to look up where the player’s vehicle is. I’m really not sure why this information isn’t kept. But every time an update is sent, the host has to spend nearly 5000 CPU cycles figuring out where in the world you are.

Optimization #8: figure out where I am when I move into a grid space; the cost is about the same, but the frequency is vastly reduced.

Knowing where you are is only half the story. The host now has to calculate which other spaces might be in your vis range. It uses a fairly efficient hash based system that spirals out from your current grid location and places the addresses of each spaces into a list. (This flattening of the spiral is called the “GRID_STRIP”).

Then, the code iterates through the list and looks up the space matching the coordinates. For all this effort, what we get is a pointer to a holder. The holder has a prev and next, which provide the internal chaining of the grid itself (the whole grid is one huge, pre-allocated doubly linked list).

It also has a pointer to another type of holder, which contains a prev-and-next  for iterating over the contents of the space, and a pointer to a vehicle (or a vehicle holder, I was never entirely clear which it actually was, it was always safest to dereference it’s vehicle pointer and know you definitely had a vehicle at that point).

Optimization #9: use an unordered_map to hold the grid: this does away with constraints checking, and create a grid-per-instance so we won’t have to check every vehicle we might see for being in the same instance.

Optimization #10: give each element in the grid a concept of a “neighborhood” – that is, the list of nearby spaces that are active (not empty). When a space becomes active, register it with its neighbors. When it becomes inactive, unregister it. Thus the “what’s nearby” evaluation is done at a significantly decreased frequency. Per-update CPU saving: ~60,000 cpu cycles.

This is partly because the “spiral” system used by the old system is absolutely guaranteed to conflict with CPU caching. The old code is almost designed to hurt a modern CPU :(

Optimization #11: Use arrays (std::vectors) where possible so that data is spatially coherent in memory, improving cache performance.

The host now goes through the vehicles in each grid space it looked up. It allocates a new HOLDER, copies vehicle information into it, and then performs a variety of tests to see if the vehicle is suitable, if it is already in the update list, if the vehicle has timed out and should be kicked, etc.

See Optimization #4: we do less of these checks per-viewer and instead do them once per vehicle at the appropriate frequency.

See Optimization #6: here we actually have to pay an increased price, checking all the entries in the current vis-list for a matching vehicle ID. Depending on further testing, I may actually maintain an unordered set of in-list IDs to reduce the cost of this in high-density areas where there are a lot of vehicles you wind up not seeing.

Each vehicle that is a candidate for being seen gets placed onto a biasing list. This involves looking through my vehicle’s biasing preferences: How many friendlies do I want to allow for? How many troops? How many boats? Etc.

Each addition to the biasing lists performs an in-place sort. If a list grows too long, the last element is removed. If it was on our previous vis list, then it gets added to the remove list.

Optimization #12: don’t allocate holders … yet.

Optimization #13: take pointers to the previous vis list and new vehicles being considered for addition, and store them unsorted.

Optimization #14: only if the resulting list is bigger than the player’s vis list should we apply biasing.

Optimization #15: consider each rule in my biasing list once, allowing me to:

Optimization #16: produce a simple biasing operator specific to each rule that can be applied with extreme efficiency to all vehicles.

Optimization #17: instead of writing the weights to the vehicle/holder, store them along with the pointer.

Optimization #18: pre-sort the list based on their general vis-desirability weight, then apply the bias-rules rule-at-a-time to the list, transferring the bias-favored vehicles onto the waiting vis-list, using an incremental “bias number” index to assign them weight.

Optimization #19: once all the biased vehicles have been transferred to the vis list, we can tell if there is room left for the remaining vehicles. If there is, re-sort the remaining “to be biased” pointer/weight list, and copy as many entries as will fit. Remaining entries are checked for transfer to the removal list.

The old system has to merge all of the bias lists together, which is again done with an in-place sorting linked-list insertion and potential trim. Garbage collection has to be performed since we are using globals out of the wazoo.

Optimization #20: our list is already good to go :)

We now have a full vis list for the player to see.

Except, we don’t send updates for the entire vis list; because we don’t have deltas, we can’t quite fit enough data into the old 522 byte packets to describe everything. As it is, each update is between 58-73 bytes, averaging around 65 bytes. And we have a bunch of other data we’re going to stuff into this packet, including strat data (building states).

To work around this and squeeze more updates into each packet, we use 3 different levels of update. Players further away send lower-resolution data, and omit some data entirely on some updates to squeeze more data per packet.

So we create a new HOLDER list. We go through the vis list, based on weights we calculated before, we create a new HOLDER, copy the holder information across, and calculate a weight based on how-long since this person last sent you an update. Then we use another sorted-list insertion into the “level” list. If a holder pops off the end of the list, we put it back onto the free list.

Once we’re done, this means we have 3 holder lists that form the list of vehicles we’re going to update at different fidelity levels.

This is essentially just a re-sort of the list, but a very expensive one.

Optimization #21: use the same pointer/weight trick as used for biasing.

Optimization #22: since we don’t support Dialup with its 576 byte MTU, use larger packets (1400 bytes including tcp/ip headers etc), allowing us to send more vehicles at higher fidelity per packet. This saves having to store/calculate the alternate fidelities of data.

Optimization #23: calculate the “resend” priority back when we were either checking the previous vis list or adding a new Impression.

We’re almost there!

The host now goes through each of the selected vehicles and calculates how much data will be needed to describe it. It uses this to determine how many it will actually write.

Now it goes through them again and writes them into the pending update packet. After each one, it checks to see if there is enough space left – because the calculations have sometimes been wrong in the past. Sigh. And because deltas (if they were working) would change the actual bytes used.

Now our world update is just about ready. To finish off, we want to cram in any strat data. When we supported dialup, this was crucial. The world update packets had to be extremely efficient bandwidth wise. Reducing the number of packets sent ensured smoother traffic flow (in theory).

So the host must keep a record of strat data that hasn’t been sent to this client yet. This used to be done in a separate process. Both the cell and the strat-updater processes ran on the same machine and talked to the player through a proxy that also ran on the same machine. This added an extra “stack” (layer of communications) that increased the cost of every message sent and received (roughly double+). I removed that quite a long time and merged the two processes. A net improvement in overall host performance.

However… For every update that has to be sent to a player, this strat updater grid has to be checked for any events the player has yet to see.

Optimization #24: re-use the new grid system to immediately queue events to players and dispatch them immediately; when a player changes spaces, send them the latest state for that space.

Optimization #25: since event-changes are comparatively rare, track a much larger radius for players and send them updates for spaces they have previously visited. Trickle them states for further-out cells separately.

Optimization #26: sending removals and strat data separately, after the update loop is complete, increases the “freshness” of vehicle data in all levels of cache and of the update arriving at the client machine.

The two next big steps of this process are to saturate this new system with unit test hooks (I’m going to write a unit test module that simply #includes the main code to avoid making it unreadable) and then to rewrite the code for generating the updates.

Our current method squeezes data for troops, tanks, trucks, guns, aircraft and ships into the same data structures. I’m working with Ramp to create vehicle-specific updates. In a sense, this is a de-optimization since it means different vehicles run different code paths. But it also means that the data can be treated more specifically allowing less data to be sent most of the time, meaning more useful data is sent per packet, which means less vehicles competing for packet space.

That may potentially allow us to separate “vis limit” from “bandwidth” – so players with kick-ass bandwidth can get more and more updates per second (while seeing the same number of vehicles). That will pan out as an optimization since the same vehicle will have less list reprocessing.

We’ve also been discussing ways to more closely tie the client/host data through all this to produce much better 3rd person representations.

Lastly… A key assumption in my approach is that all cell<->client updates will be sent via NetCode2 and not with our old network API. This offers a ton of advantages, not least of which is that the NC2 method uses 30% less CPU, is far more cache friendly, and is threaded.

That is the change that opens the door to dynamic mission origins and targets (doing all your mission stuff while spawned, being able to respawn after dying without having to go back to the map, etc) for another release down the line…

I’m really hoping we can put this together very quickly for a really nice 1.32 update, if not a 1.31 point release.

14 Comments

WOW really impressive stuff mate, I hope you guys can really pull it off what you just described. Sounds like a lot of very good updates and improved systems which cuold help WWIIOL in the long term quite a bit.

Your brain must be huge man, I am seriously impressed with all the work you get done, just WOW

so this could potentially make it so that when I fly (fighters at least) I get less infantry updates (maybe ground unit updates all together), givning me less lag when getting closer to the ground?

An original task for this system was to allow the use (by the player) of bias commands. These existed in them other flying games we’d previously worked on back in…ya know…1997, but never got implemented here.

A basic one would just let you set say, , which populates your vis list with all in range air units 1st…then following the present ruleset for how that list gets sorted depending on what side they’re on, who may be shooting at you, etc.

A more sophisticated one would let you specify an order.

So, add units to my list in the unit class order specified. I’m flying, so I always want air units added 1st, and fighters before bombers…but if there are not enough air units present to fill my list, start adding the ground unit (armor in this case) that I’m most concerned with seeing…and finally add infantry if there’s room.

An even more sophisticated one would add the ability to sort by unit, side, and overall unit class, overcoming the need to reserve a specific set of slots for friendlies and fill with the desired class/side combination the user prefers for the given battle state.

I will concede that a more n0ob friendly GUI can be used rather than . commands.

So, when I’m flying cap in my fighter, my bias list might look like:

0. Units scoring hits on me.
1. Enemy fighters
2. Enemy bombers
3. Friendly fighters
4. Friendly bombers.
5. Enemy armor
6. Enemy unarmored vehicles.
7. Enemy guns.
8. Enemy ground units. (catchall where an enemy ground unit class not previously specified in my list is added)
9. Friendly ground units.
10. Easter Egg flying monster vehicles and mobile sheep.
11. Player [xxx] (specificed by the command).

In the vein of 11, you’d also have to add the GM command, which would place the playername [xxx] at the top of everyone’s vis list, with a bright enemy-visible tag. Useful for those…problem….players who just need a little smacking around to encourage more socially friendly gameplay habits.

Hmm. My comment above seems to have removed every instance where I used greater than – less than symbols…as well as all the text comtained within those symbols. Re-edit to make it make sense, or leave it be and let Ollie deduce my meaning from what’s left? Meh…there’s enough there, he’s a bright lad.

Interesting. When I currently test total lag on my two accounts, by pulling the trigger on one and listening for the shot on the other, it seems the total lag is about .5 s. My machines are showing a solid ping of 80ms. How much of the extra 350ms is host processing and how much is due to the packet heartbeat? Is this something that is likely to be decreased with this new architecture?

Also, why not increase the vis list for people with high bandwidth and fast machines? Has the vis list has been constant at 64 since 1999? It seems there should be plenty of headroom for increasing that with today’s hardware, too.

Interesting stuff as always.

And Lutorm, no ths vis limit hasn’t been the same since 1999. It’s been increased at least once during that time. From what top what I can’t remember.

If I remember correctly, 1.31 will change the player vislimit again?

It definately is 128 players atm with 1.30.

Unless ten-player polycrewing suddenly shows up and the naval game gets to Gophur’s 1K-simultaneous-player threshold, naval gameplay seems unlikely to approach even the 1.30 vis-list limit in most likely naval combat contexts that don’t involve coastal proximity to ground unit concentrations.

Given this predictability and “unit specific updates”, is there a possibility that “unit specific interaction radii” also might become possible, so that at least some of the time, naval units could interact at realistic daytime ranges?

Snail: They don’t show up when I go to edit the comment. I’m guessing it now strips out anything it thinks is malformed HTML :(

sniff: We already have something like that (see Snail’s comments). It done vehicle-on-vehicle, and what the old system does is reserve slots. Each vehicle type has a bias list, which says things like “Reserve 1 slot for a friendly naval vessel, 1 for a friendly truck, 5 for friendly anything, 5 naval vessels, 10 infantry and 10 tanks”.

The friendly naval vessel/truck is so you can see your ride or the ride you would have if you could see it (probably needs updating with Ju52s :)

It tries to fill up each bias type with the closest N matching vehicles (if it doesn’t say friendly, it means enemy). Then it just fills the rest of the list based on distance.

Lut: a small portion of it is the packet heartbeat (we don’t have fire messages, we have “is firing” states that get sent in the hearbeat).

Currently your Player Vis Limit also controls your hearbeat rate. If you are using high vis, I think it’s 125ms between updates vs 175ms for Low.

In 1.31, it’s 100ms for high and 150ms for low.

The main reason now for having an upper limit is down to how many the client can handle. It can handle a hell of a lot of infantry, but vehicles still put a bit of a hurt on it. I’m hoping that’ll get addressed in 1.32.

I don’t know why, but Jaeger is convinced that all of the FPS issues are/were terrain rendering related, but when I load up the cell host with a couple hundred vehicles, there is a large FPS hit from vehicles (inc troops) being loaded and unloaded, FPS clearly drops as the vehicle count goes up and rendering non-troop vehicles hurts FPS lots more than troops.

It’s possible that drawing the world (out to 10km) is inherently more costly than drawing and animating and running the vehicles, but all my tests (and AHWulfs and Ramps and Martinis) seem to point to the vehicle system as the biggest (stable) FPS hurt, and if we can tighten that up, we can afford to render more vehicles for higher-bandwidth players.

Since we no-longer support dialup, the difference isn’t going to be as drastic anymore, either.

hchris: We removed the low-end vis limit, we didn’t change the higher end vis limits; basically “/um 1” is now “/um 0”, and “/um 3” is the highest instead of “/um 4”.

JW: Each unit type does have an LOD range in its vehicle data, which the host uses (you don’t see infantry out beyond 1.5km or something). Naval LOD is constrained by some other factors and – without looking, which I’m too tired to fire everything up to do right now – I think it’s either 6 or 8km.

– Oliver

I agree with you, I don’t think it’s the terrain rendering. The reason is simple: fps is noticeably higher offline than online, esp. in large battles. Since the terrain is the same in all cases, it has to be the extra vehicles (either rendering or something else).

–Each unit type does have an LOD range in its vehicle data, which the host uses (you don’t see infantry out beyond 1.5km or something). Naval LOD is constrained by some other factors and – without looking, which I’m too tired to fire everything up to do right now – I think it’s either 6 or 8km.–

Yes, understood. The point, in reference to discussions elsewhere, is that while the ground and air games generate excellent immersiveness within a 6 km bubble, the daytime/clear-weather naval game can’t get there. By and large, historical ship-gunnery engagements took place at plunging-fire ranges, and warships are designed with that damage mechanism in mind. A 10km bubble would be vastly better, and 15 km better still from gameplay-design and physics perspectives.

It’s none of my business…I was just curious as to whether perhaps code evolution eventually might enable such a development direction.

Hmmm…I was unclear above. That’s a 6km-**radius** bubble, of course, in the present code.

6km, that used to be closer to 8km.

Terrain draws out to 8km, buildings and vehicles 6km. I think – could be wrong.

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: