Rewriting the grid.

Two of the core components of our game server systems are the poorly named “World Grid” and “Update System” (the names reflect what the systems do, but not their overall role in the game server cluster). The combination of these two is alternately called “The Grid” and “The Update System”.

In short: they divide the world up into small parcels of virtual space, track the players moving around in those spaces, and generate data packets describing movements between one player and another.

Written by coders not happy to co-operate together, hodged and podged over 10 years, sprinkled with genuine pixie dust(*); neither component lends itself to maintenance, study or peace of mind.

They were developed for 1999 hardware and lovingly hand-optimized to running with magnificent efficiency on an old Pentium III 800. But you would really have to go out of your way to make them less hostile to a modern CPU like the Dual Xeon Quad-Core 3ghzs our servers currently run on.

For 7 years+, they have been the bane of my existence.

(* There are portions of the update system that break arbitrarily and that have mind-blowing dependencies on undocumented system behavior that make them as close to working by magic as is possible in a computer system)

Several years ago, I wrote a prototype “Grid” system to replace one of these two components. This became the basis of the area chat system. It has worked happily, efficiently and flawlessly now for those years.

Two years ago, I took a shot at porting that prototype to the world grid system, but other priorities intervened and it never made it beyond personal experimentation.

Two months ago, on a Saturday evening, a lightbulb mental-supernova moment lead to my re-creating a world-grid system rewrite from scratch, only to rediscover the two-year old version and a fantastic degree of similarity between the two. The advantage to the newer code was that it was written with test-driven design so I actually also had a guide to it’s level of operation and QA.

About the same time, I got core work ticketed that included grid and update system rewrites.

Last week, things came round to where I could make a branch and actually start on that work.

The new grid is in-place in my test branch, along with the first player-discernible change: full map updates while spawned (no more having to click things to see what their state really is).

This rewrite grants me several opportunities that I’m not yet pursuing (but keeping in mind).

The old version starts by going through every vehicle it knows about – both those talking directly to it and those on remote servers – and seeing if they look like they might want an update.

If they do, it starts building a “World Update” to be sent to that vehicle.

Step 1. Build the viewer’s new vis list.

  1. Find a grid that might be within range of the vehicle that hasn’t yet been processed for this vehicle for this update,
  2. Find a vehicle in that grid that hasn’t yet been processed for this update,
  3. Perform a suite of tests on the vehicle that may result in the vehicle being despawned/removed from the grid system (1700 -31,000 CPU cycles),
  4. In some cases, remove the subject from the vehicle and grid systems, that’ll save us having to generate updates for him and save us having to evaluate him again later on, right! Back to step 1 if this happens,
  5. Calculate the “bias” (desire to see) of this subject for the viewer (possibly upto 8000 cpu cycles, usually only a few hundred),
  6. Sort the resulting subject into the viewers new vis list,
  7. Trim the end of the new vis list if it gets too big,
  8. Repeat from 1.

Step 2. Redo the old vis list.

  1. Somehow, the code retrieves the old vis list. None of us who’ve looked at the code aren’t entirely sure how. *GULP*,
  2. Calculate biases for all of the old entries and sort it,
  3. Check vehicles that aren’t in the new list but are in the old list, and decide if we want them to stay,
  4. Generate “remove” events for vehicles that we’ve definitely decided won’t be in the new list,

Step 3. Build a new new vis list.

  1. For each vehicle in the new and old vis list, determine if a vehicle is “mandatory” (such as the vehicle you’re riding on),
  2. Or “friendly” (upto X friendly vehicles are tracked on a temporary list so we can ensure you always see X friendlies),
  3. Or “enemy” (sort the enemies onto their own temporary vis list),
  4. Once the three resulting lists have been compiled, calculate a “priority” for each vehicle on each list,
  5. Sort and combine the 3 lists into one final “vis list”
  6. Cleanup everything but the new-new-vis-list.

Step 4. Detour.

  1. Check to see if the player has recently moved.
  2. Update various subsystems, such as sending a position update to the area chat grid system.

Step 5. Assemble the world update boiler plate.

For bandwidth/client efficiency, we cram several different things into a world update packet:

  1. Game-time synchronization.
  2. Vehicle removals.
  3. Various bits of client state information.
  4. Numbers of vehicle updates and information about their packing formation.

Step 6. Assemble the World Update!

  1. Going thru the final vis list, each subject is re-evaluated (including tests to make sure they haven’t some how gotten removed for the system
  2. If this is a new vehicle, inject a “first time” update that describes the vehicle in more detail than normal,
  3. Run through around 7,500 lines of code to turn the vehicles information into tiny bits and bytes of magic,

That’s pretty much everything we need, just as long as we don’t suddenly go off accessing some other huge grid of memory stuff

Step 7. Check for strat updates!

These hosts have a mirror of the strat system with a twist. Whenever strat events arrive, they are immediately processed as elsewhere in the cluster, but then they are shunted into a subsystem which organizes them into a grid system – exactly unlike the one used by the vehicle system – to arrange them spatially in memory so that vehicles can quickly look up events happening in their neck of the woods.

The intent was to ensure you only got strat events for stuff within your vis range. It’s a good concept until you start moving around, and then it actually increases the amount of bandwidth used because things that leave and return to vis range get resent.

It’s also a separate, discrete system, with its own grid-node size, it’s own grid node structure (that just happens to be very very similar to the others so that for some usages they are interchangeable, but only if you know exactly which fields line up between the two structures – for instance both have “prev” and “next” list members, but they are reversed between the two; more frighteningly, this is because both structures were lining up members with a third structure that subsequently mostly went away).

The strat-grid is optimized around the cost of inserting events into it. My speculation is that the author started to realize he was introducing a rather drastic overhead, and so chose a path which would allow him to say “Not it!” when performance started to be an issue.

Retrieving a list of the strat updates needing to be sent to a given player is expensive, and worse still, if there are 1000 players fighting in a town, every one of them will perform the search every time we generate a world update for them (between 10-12 times a second), and this actually accounts for about 1/1000th of the total CPU usage of the cell host, the net result of which is largely finding that “nuthin” has changed.

After all that, I’m going to summarize this step as:

  1. Do some complex, memory intensive stuff, using a large, spatially mapped area of memory so as to guarantee that L1, L2 and probably L3 cache are all purged of the world-update data we were just working on.

Step 8. Send it (kinda).

  1. For each player attached to a vehicle (to allow for multi-crew), prune the strat events we may (or may not) have accumulated,
  2. Pack the data into a buffer for sending,
  3. Call a proprietary network API for sending the data reliably,
  4. [network api] pack the data in the buffer we were passed into a buffer for sending,
  5. [network api] allocate a packet to send the data,
  6. [network api] copy the packed data into the packet,
  7. [network api] send the packet,
  8. [network api] since we’re sending reliable data, pass the data to a “reliability layer”
  9. [reliability api] copy the packet data to retransmit buffer,
  10. [reliability api] put the retransmit buffer on the back of the retransmit queue,
  11. [reliability api] check that the retransmit queue isn’t bigger than the allowed size:
    since we are sending with TCP which does its own reliability, retransmit queue size is 0
    therefore, we can discard the retransmit buffer

You may laugh, but a few years ago – when I first discovered the reliability layer, the real kicker was that a typo in it meant that if it ever actually came into effect it would retransmit all the data utterly out of order causing general badness. Fortunately, this never happened because TCP reconnects are never matched with the prior connection.

Why did I not completely disable the reliability layer then? I did, but this somehow caused the World Update system to break. Fairy dust :( The above workaround, however, reduced the overhead and memory usage and didn’t break the world update system. This is one of the reasons that a rewrite was scheduled.

Step 9. Repeat for the next vehicle.

Enough of the old. How about the new?

Firstly, I’ve delegated a lot of the grunt work of the implementation to STL and Boost library functionality.

Then I’ve delegated a lot of the “if” work to grid-management level events: When a player moves into a grid cell (a box of X by X meters within the world space), a structure is created to describe that cell. The Grid::Cell object registers itself with a master object and checks for extant cells that would be within its vis range. These nodes then all mutually register themselves so that each active Grid::Cell has a list of currently active neighbors (that is, neighbors with one or more vehicles in them).

I’ve extracted all the per-vehicle evaluation code into a thread-safe module that produces a list of vehicles that need to go away, such that I can happily thread the result, taking better advantage of the 8 cpu cores we have.

One of my changes has been live now since 1.30: The list of ready-to-update vehicles is easily obtained using a couple of function calls.

Then I’ve taken the thousands of lines of code for generating, merging, sorting, fudging, teasing and terrifying the various lists into a “vis list” into an encapsulation (i.e. a class) so that all of the data is either discretely encpasulated or thread-safe. Meaning, I can build as many vis-lists as I want simultaneously.

The old biasing code was written for single pass efficiency, and because of where it was placed in the system, it didn’t have a convenient means of retaining any state. By encapsulating this into the vis list encapsulation, the bulk of the work it had to do is trivialized to once-per-vehicle. Because I’m doing the whole sweep, I spend a few extra one-time cpu cycles (less than 100) to make the overall pass more efficient. Net result: biasing one entry is a few dozen cycles more expensive, but each additional vehicle requires less than 40 additional cpu instructions and is cache friendly. After the first vehicle, that’s a saving of up to 800 cpu cycles per vehicle.

A simple std::sort on this list produces the biased vis list. Again, slightly more expensive for a single vehicle, but savings build up rapidly as you add vehicles.

Resulting code is roughly:

  # Recalculate priorities for vehicles on my previous
  # vis list.
  recalculateVisListWeights()

  # Get the gridcell's pre-built list of nearby vehicles.
  neighborhood = vehicle.gridcell.neigbhorhood

  # Add in any new vehicles
  addNewVehiclesToVisList(neighborhood)

  # Can we see enough vehicles to make
  # biasing an issue?
  if vislist.size() > maxVisibleVehicles:
  {
    # Adjust priorities based on biasing
    biasWeightVisList()
  }

  # Sort the final list
  sort(vislist.begin(), vislist.end(), BiasOrdering)

  # Prevent addition of too many vehicles per update
  # (stutter curbing)
  if vislist.newEntries() > config.update.maxNewVehiclesPerUpdate:
    cullAdditions()
  # If the list is too long, move excess to the remove list.
  trimVisList()
  packet = generatePackedData(vislist)

  result = send(packet)

(The original isn’t written in Python, I just find Python fairly useful for pseudo code)

There’s a degree of swings and roundabouts. When a vehicle moves between grid cells, there is some expense in maintaining the neighborhoods. But that’s OK. The size of the grid cells is big enough that it happens rarely. It means not regenerating an equivalent list for every single world update that’s going out. So a significant increase in complexity of the management part is paid off many many times in the more performance critical update loop itself.

Almost all of the above is working and tested. Right now I am working thru adapting the code that creates the individual vehicle update (that is viewer->subject delta) to be similarly encapsulated so that I can properly thread all of this. I’m thinking I’m going to use one CPU for the vislist generation and the other for generating updates, so that each CPU can maintain cache focus. But that will only happen after I’m happy with the functionality of the single threaded version. And most of the code has good test coverage so far (this first pass at redoing the packers I have yet to introduce coverage for, because it’s a temporary refactor).

The strat updates now get passed thru more-or-less directly to the players: they’re stashed on a simple queue for processing so that they can be batched. CP (that’s town to you) and Facility updates (which are trivial and infrequent) get sent to everyone. Building level changes now use the vehicle grid system to get the list of vehicles within range of the event an call a broadcast function I’ve added to the network API which dispatches them with maximum 2010 CPU efficiency.

So far: Only a single real benefit to players that is pretty minor. Once this work is done, though, Ramp and I can sit down and use this new infrastructure to revamp updates for infantry to be as streamlined as possible and thus more frequent. We’re also going to try and better tie the update system to the simulation engine so that you see less spikiness in 3rd person actions etc.

The coder in me is pretty damned excited about all of this :)

 

 

16 Comments

Very interesting stuff with some nice potential benefits.

Just out of interest is any updating of the multicrew system being talked about?
At the moment it’s not the most reliable system in the world and could use a bit of KFS1 attention!

Anyway, have a GREAT Christmas and New Years.

!S MW.

Does this mean you can introduce the vis-bias features from way back?

…@/

Fused ordnance. BTW, still in EQ2? I’ve thought about re-upping. The free crap didn’t really ring my bell…but the older stuff might be fun.

fascinating stuff. did i smell strat war updates and enhanced multi-crewing in all that or was that wishful sniffing on my part? :D

Easting: Yes, playing the free servers, running a little guild (well, it runs itself, but ykwim). It’s the same game as runs on the live server, only differences are in the market place items. If you go with a free account, it has the advantage that you’re not paying when you’re not playing, and so you only pay for things you actually want/need like extra toon slots, etc, etc.

mw/mad: I wasn’t alluding to them specifically, but the grid system and the persistent map connection are the two big obstacles to so many things. Cleaning up the grid stuff will afford a greater deal of flexibility needed for some features we do have planned, and stuff that’s needed to be done for a long long time.

Snail: Not sure what you mean, we have vis-biasing in now, it’s just a resource hog on the server that severely limits the upper capacity of a given cell host. Largely because:

for each connected player:
  for each possibly visible subject vehicle:
    for each vehicle already on the vis list:
      compare vias values and order accordingly

So in a fight like the 200-on-200 at Roermond, where all those players are in the same grid area, the cell hosts are expending a horrendous amount of CPU on biasing to obtain vis lists.

With the new system, I’m able to “compile” the viewers biasing so that each vehicle in the vislist requires as little as 20 cpu instructions to bias.

The old system sorted the lists on the fly, which is expensive and frequently entirely unnecessary.

you do good work oli, we need to clone you like dolly and get 3x the win :D

I mean letting players choose the biasing they want.

Preset bias settings, and a custom setting.

I’m flying a ground attack bird, select air-to-mud preset bias, which shows me enemy air, enemy aaa, friendly air, friendly aaa, enemy ground vehicles, enemy inf, friendly….ect ect.

I’m in a tank, select armor preset bias which shows me blah blah blah.

I’m hard to please, so let me custom select which type I want from top priority to bottom priority from a list of all types.

Fill my vis list by type according to my settings until it’s full, and update accordingly.

Let me exclude types as well in the custom setting…so that, for instance, when I’m flying my db7 dropping eggs on a town under attack, I can deselect everything but enemy vehicles and enemy aaa, and only fill my list with those…keeping my frame rate high by not trying to show me stuff I have no need to see.

That sort of thing.

…@/

i would imagine you would want to keep some very tight controls over any custom setting like that. like your db7 example why wouldnt you put enemy fighters at the bottom of your list? because you’re a good sport? ok cool, but what about all the other players?

its my understanding that the only units that can see you are also seen by you meaning if you gave the option to custom bias down, in this example, enemy fighters then in a heated battle german infantry is going to be visible to you at the expense of your natural predators the german fighters.

So will the improvements to management and processing of stuff happening within the grid, make it more feasible to change the grid itself?

There’s been occasional discussion over the past nine years or so regarding the original strat grid not including river bridges as significant locations.

More recently, the lack of strat differentiation between a link joining two nearby towns with no intervening obstructions and a link joining the Belgian and English coasts has come up in discussions about FRU functionality.

And, the occasional discussions about the relatively undeveloped nature of the naval game often include consideration of some sort of strat mesh functionality in water areas, to provide for sea control, sea force location, sea spawning and/or some other gameplay system.

All of these, we’ve been told in the past, are impractical because they’d require changes to the strat mesh.

mad: our vis-list system isn’t amicable, it doesn’t ensure that whom you see is who sees you.

JW: When I talked about “strat grid”, that’s to distinguish it from the strat system itself (what you call at the end, the strat mesh). The only purpose of the “strat grid” (which exists on cell hosts, not the strat host) is to strat events received from the strat server to player locations so that they can be retrieved while building a world update for a given vehicle.

It should never have been separate from the vehicle grid system, because that was what it was attempting to relate to. Player at X/Y => Vehicle Grid { X/Gv, Y/Gv } & Strat Grid { X/Gs, Y/Gs }

The strat system itself is not spatial at all, it is a relational system based on the ID of a given strat entity and the relationships (parent, neighbor) that entity has.

and here i thought i knew everything. could have sworn a rat at some point said who you see sees you or i misunderstood.

interesting, that could explain some ‘wtf’ occasions i’ve had though where i swear to god i’ve scanned everywhere and didn’t see anything and then dove in and something behind me materialized like a klingon bird of prey and shot me down. its not common but has happened.

If it’s still working like it used to, if someone hits you, they go to the top of your vis list. But it’s always been entirely possible for someone to see you when you were not seeing them, update wise. That’s just the nature of the beast when dealing with finite vis lists.

If there’s a crap load of enemies close to that Tiger, and you’re on a hill way off in an atg with very few people around you and he’s the closest enemy to you…then you can see where it can happen pretty easily. His vis list is full of a bunch of ei. In a better world, if you shot a round that came anywhere near him, you’d hop to the top of his vis list…but that’s not in there :P

This is why I look for a biasing system as I described above. Still not perfect, but at least the tiger driver has a choice about what to look for, and can plan and position accordingly.

What we strove for originally, that you may be referring to, is more concerned with cover/concealment. That is…we’d never draw a tree that some guy would then hide behind, and not have that tree draw for some feller #2 a ways off who’d see guy #1 looking like he’s hiding behind a tree, when there’s clearly (to feller #2) no tree there and guy #1 is completely exposed. This was a common problem with mp games back in 1999.

…@/

yeah i remember that. not that it mattered back then as that one single bush on the golf course wasn’t much cover/concealment from the char :D. i think it was the first or second major patch which solved that problem but still … the single bush was still the only bush in the field just shoot at it :D.

ah the memories.

Is this a step towards making player recognition in the “Strat Grid” more efficient so that CRS can proceed with further developments of the “Strat System” and make use of geographic anomalies, link and add more impactable tactical targets, or introduce a functioning visual supply grid? Or is this just fine tuning of what we already have?

No, the strat grid is more like a fishing net to catch updates from the strat system and funnel them to players. If the analogy seems a bit odd, that’s because the strat grid itself was whack.

– Oliver

Any work going on with the strat system that may provide additional targets?

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: