1.31 (or the lack thereof, as yet) has been almost as frustrating for us as it has for our players. It is likely to become known as the “if I’d known that” patch.
I’ve been wholly mired in cell host performance issues for a while, stuck in the maw of code that I really, really, really hate. Code that resists or springs a leak at every turn. It is the code that any sober refactorer would say rewrite to.
But it’s not just some subsystem, or some corner of code. It’s the fundamental basis for the cell host.
In some areas, it relies on insider knowledge of the malloc() and free() systems. Uncommented. It uses 2001 compiler tricks which have gone from super-smart to irrelevant to hindrance. The whole premise for the design is based around year 2000 CPU architecture and works in the most contrary way possible to the sorts of CPUs that 1.31 requires :(
If I’d known at the start of 1.31 that I was going to have to spend significant time on it, I could have sat down with a cool head and implemented the design(s) I’ve experimented with (remember, the chat grid system was a prototype for such a concept). But it’s categorically not something I’m going to start working on when we’re seemingly a significant way towards code freeze, and that has sort of been the case for most of 1.31… Grr.
The sad fact is that the current grid system is fairly efficient at closing up the updates it sends and dispatching them. Most of the overhead is spent in the selection of a player to be sent an update, the magical recovery of the previous update, testing it for validity and preparing to store the list of vehicles that need updating.
After that, it doesn’t matter how many players are going to be in your update, it rips through the actual data population. And yet the startup time takes some 15x as many CPU cycles as the actual preparation and dispatch of the data…
I actually wrote a cell-ready grid implementation, or most of it, a few weeks ago. But when none of the other coders wanted to review it with me, I lost the courage to go ahead with it =( Even though I discovered that after this post, in 2007, I wrote virtually the same code (I mean, a diff between the two files only shows an 4% difference between two multi-thousand line files!)
Our switch to subversion has finally begun creaking the way I said it would and several of you insisted it wouldn’t. SVN has been a damned hindrance this dev cycle. SVN is fine if you are a big corporate development team with the assurance of a one way flow of changes. But if stuff starts feeding in multiple directions, the old conflict monster starts to rear its head in the worst ways. And last week we got into a state where SVN can’t even reverse merge one set of changes…
No denying that we are better off with svn than if we were still using VSS. But I’ve started looking at other options, in particular Mercurial. Git is already off the list: I can see why Ahwulf hates it. Worse than Perforce, IMHO.
Subversion has somehow made us even more reluctant to check stuff in and branch than even VSS did. I haven’t really figured out why. The upshot is that our SVN check-ins tend to have multiple items in them, which deals Subversion an unfair hand. It also makes issue tracking and resolution a PITA.
Mercurial looks like it might really offer us some potential in having several developers co-operate on transitory branches (local clones), and also promote more early checkins (which makes it easier to separate specific changes and back track things). With Mercurial your local working copy becomes a master repository in its own right. So have at it with the commits, and only push them to the master repo when you’re done.
But you didn’t come here to hear me grumble, I lured you in with talk of 1.32…
All this work on the cell hosts, some of the infrastructure changes and the work Ramp has been doing on the infantry and vehicle systems have opened up a number of doors. Ramp’s increasing familiarity with these relatively untended systems has shored those doors up ripe for work in 1.32.
Most significantly, the discontinuation of dial-up support. The problem isn’t the total amount of bandwidth but the maximum amount of bandwidth available at any given moment. While we had to support dial-up, we couldn’t splurge and we had to try and cram all of our data down into incredibly tiny little packets.
We’ve still got to be wary of bandwidth usage overall, but we can now afford to slam more specific data down to you when it’s really pertinent. For example, we can afford to be sloppy about the accuracy of turret positions. Until the turret fires. And then you really need to know exactly where the turret was pointing so that you can accurately draw the round coming out of it (we don’t send a “fire message” describing the round, except for grenades, which is why sometimes it looks like the round landed nowhere near you but it actually hits you).
The increase in speed also means it is finally practical to more strongly couple the simulation with the network. Under dial-up support, the time between updates was over the threshold for human perception, so the client merrily goes about its business at X fps while sending updates to the host at significantly less than X updates per second.
Updates to the host are thus snapshots. Instead of saying “this trooper is going into a sprint” they say “at T+025ms I was moving at 0.5m/s”. Across the network, the guy client watching you has to guess what it is you are doing. And lo, much warping was born.
Our plan for 1.32 is to get way more aggressive. To send updates even faster. Somewhere around every 50ms. At that speed, we can afford to make a fundamental change to the client by delaying state changes until they are going to get sent to the network. In most online FPS games, this is programming 101. But when I started on WWII Online, it was upto 250ms between updates to the host. That’s 1/4 of a second. That kind of delay between pressing a movement key and seeing your trooper react or your tank’s gun fire is just unacceptable.
We can’t afford to just send 20 updates a second, because the updates are a big large, and it makes no attempt to perform deltas. Given the reasons above, the update system is quite independent from the simulation. This is a system that was prematurely optimized. Diffs and deltas were semi-expensive 10 years ago, so no real depth of thought was given to how a differential system might be achieved (if it were appropriately tied into the simulation itself).
That tie, rather than increasing complexity and computational overhead, actually reduces it.
When I outlined my concept for this queue system to Ramp and Killer, they were immediately sceptical. I had to man up vs Killer to get him to let me finish and explain how the self-same system provide the means of identifying what has changed without needing to add some kind of “what changed” layer.
So the concept is to retain the same basic update system as a keyframe/heartbeat, but to allow events to trigger more frequent, smaller delta updates. That means we can keep most of the current update system while deconstructing parts of it down to a much simpler future-proofing system.
Ramp and I have also discussed how we can phase in the stages to achieving proper (in-flight) multi-crew, starting with a highly-efficient (because it’s incredibly simple!) swap-to-any-position option (yes, you can switch to the driver’s position if he moves to the commanders seat first; perhaps even allow the driver to usurp any position).
We’re also (gorilla) looking at how we exchange turret orientations. Right now, 3rd person turret sighting is only accurate to within 1.41 degrees (360/255). ARGH! With dial-up out of the picture, we want to crank that up to 0.01 degree accuracy (360 / 32400). We tested this out briefly the last couple of nights and it makes quite a radical difference. Ramp had to look into code that has – in places – gone untouched since 1999, and found a number of factors that have been royally screwing 3rd person perception of rounds being exchanged.
If Rafter reads this, he’s probably trying to decide whether to strangle or shoot me. I shouldn’t be working on stuff like that right now. But when you’re stymied on code that is getting nowhere, you need the odd little victory to get you rolling again. And we didn’t put a great deal of effort into this (infact, Ramp did what little work was involved, all I did was apply his patch to the host and restart it ;-P).
I’d really like it to make 1.31, it’s virtually a no-brainer, but remember what I said about “since 1999”? Yeah, that makes a “no-brainer?“.