Programmers often revel in the abstract nature of what they do: the ethereal, almost magical qualities of written words reaching that critical mass where they come to life and become a functioning apparatus… Gandalfs and Merlins, masters of secret magics: We like building black boxes. Its often a matter of pride: FooBar does exactly what it says on the box, it works and it just works.
It is often taken as an insult to our artistry and our qualifications that you think you might be able to divine when it is nearly done. Managers! Go draw some Gant charts.
And testing begins when the application is complete, right? I mean, all FooBar::Add(FooBar a, int b) does is add a and b. What are you implying by suggesting that may need testing?
Trouble is, as I implied in my confession of sin, we programmers don’t tend to use our code, we maybe just test it a bit (but I won’t beat that horse right now). So we miss on a chance for a double whammy: Earning money for writing code, and earning money for what we write.
Well, each of us gets paid for writing code – we clock in, write code, clock out, rinse and repeat until the deadline, hand over the application or API, get someone to sign off on it, go to the next product scheduling meeting, get our next assignment, ad nauseum.
But not so many of us try to get paid for what we write. Have a good long think about that. I’m not talking about bonuses, I’m talking about using the code we are writing to make the product and company healthier and more profitable, to keep our managers off our backs and the suites smiling.
It certainly never occurred to me when I reached the level of manager at Demon Internet, all I could see was the need to write the next bit of code that would allow my staff to process more orders in less time. Only when I started asking myself why Cliff promoted someone else to “help” me did I realize that I’d left him completely in the blind as to how much we were costing, how much we were improving and how much we were earning, how effective my code was and – most importantly – how cost effective I was. I just figured he’d check the corporate bank balance and see It Was Good.
But you don’t even need to become that good a capitalist. The payoff occurs even earlier. That monkey riding your back, the one not good enough to understand the magic you lay down? He doesn’t actually want to ride your back. Did you know that?
Other than looking over your shoulder and humping your leg, he’s not good enough to understand your magic.
So feed him.
You really hate when you’ve handed something off, and they come back and say “When Fred tried to amalgamate the income returns on a Friday with compound derivatives and a customer with seven accounts the system would only let Kirsty input anything that was blue“. You know that absolutely none of that is relevant to what went wrong, but you’re going to have to try and recreate the exact scenario because these idiots couldn’t debug their way out of an open doorway, and its going to take days. Besides, testing isn’t even what you’re supposed to do. You get paid to write code, not to go back and iron out stupid wrinkles.
You just aren’t being lazy enough. You need to learn to instrument your code.
Instrumentation isn’t hard, and if you learn the habit, it actually becomes easier than writing code without instrumentation. How often do you wind up retrofitting code with printfs or log entries or dummy variables to let you inspect state?
Often enough that if you picked up the instrumentation habit, you’d have a good solid system that was there all the time to let you flick a switch and do it orders of magnitude more easily. Heck, you could instrument Fred and Kirsty‘s builds so that they could repeat their ludicrous menage-a-deux and unknowingly diagnose it for you. Sweet, no?
For you C/C++ types, I’m not even talking about the obscene, all-singing, all-dancing, reflection based systems that Java and C# developers drool over and overburden their applications (and budgets) with. That is a whole (and admittedly lucrative) venture of its own.
What I’m talking about is building your code in such a fashion that you could easily run your Add(…) function through a quick, offline, out-of-band test case that proves its not the problem and let you move on. Oh, god, are you whingeing about testing already?
It takes some a little getting used to, but once you start developing the habit, you’ll start writing your functions and classes with just that little twist of subtlety that makes you master of the magic rather than slave to it. And yes, you will wind up testing your code sometimes, but if you learn the habit it won’t really be like testing, it will be more in the spirit of artistry, simple validation.
WWII Online/Battleground Europe’s host systems have a very nice feature that provides for a pretty good instrumentation system. Unfortunately only one of the developers really availed themselves of it, and in the long term even he got lazy about it.
Instrumentation begins with constants. Diagnosing issues – such as users (or other-coders-using-your-code) abusing it outside of its scope or intent – becomes much easier when you have to resort to the debugger if your constants have identities. i=5 isn’t going to help you nearly as much as i=(COUNTRY)ITALY when you have to come back to line 3201 in a year.
But this isn’t about making it easier for you to inspect your own work. We’re talking about feeding the monkey.
Learning the instrumentation habit – encapsulating functions around results and breaking your magic up into steps, applying a pinch of sequential programming to your object orientation development and design provides something magical in itself – quantification.
During my first two implementations of TOEs, I made a very serious error in judgement. I was writing a complete system from scratch with only minimal need to tie into the existing system. Infact, its creation would obliterate much of the existing system. So I developed it in terms of itself, with no frame of reference. With heavy pressure to deliver and – lets be honest – a little arrogance, I decided to forego some of the more “frippery” elements of my normal instrumentation.
It was liking modelling with clay in a dark, wet room. There was still instrumentation in there, but I didn’t have any real sense of how far along I was, or how meaningful my current codebase was to what I was finally going to have to developer. There was certainly no means for the monkey to tell how I was doing. The monkey saw status updates from me, describing what code I had written or tested or worked on. Since I get paid for writing code, I was doing my job; the monkey remained ignorant that something was causing me to write the same code. After all, he wouldn’t understand the magic anyway.
Now, the producers probably couldn’t have understood the problem I was encountering, but give the guys some credit. What they are good at is working around problems like this on a meta-scale. Their zen is in dealing with large-scale abstract issues, especially those relating to timeframes, and working out alternatives like tasking you with something else briefly or scheduling a server upgrade.
In the end, it was the monkey that realized I had come to a halt, and it was the monkey that devised a solution. If, instead of having to ride my back for “is it done yet”, I had put him been in a situation to say “this seems to have stalled”, TOE development would have happened sooner.
I was so confident in TOEs being deliverable as a whole unit that I scrimped on some of my normal development practices; uh, the ones I’ve picked up against just such a situation. Sure I was testing this and that, but I didn’t put out — I didn’t see the need to expose any of what I was doing until I judged it was ready.
Reality is that very few monkeys survive in their jobs if they do try to judge code, what they have to be good at is judging productivity, which is something we coders generally just don’t understand. Its too tangible for us. What the monkeys need is a solid way to see progress towards completion. It allows us to work our magic while they have their own magical ability to sense trouble brewing. All we have to do is feed them.
In the case of TOEs, I ran into insurmountable coding issues – brick walls: a compiler bug, a compiler installation issue and OS version issues. Being a coder, I did what I get paid to do – I tried to code my way out of the hole. The monkey carried on humping my leg, and I carried on telling the monkey I was writing code.
Here is the contradiction. We allow “the monkey” to hand us specs and designs, but then we defy them to understand what we do with them. Infact, if my monkeys had been able to track where I was at, they would have adjusted those designs. Its freaky, but the monkey is actually capable of making decisions based on data. Did I just shock you?
I’d fallen into the common mindset amongst many developers of dismissing the monkey’s ability to participate in my work by focusing on his inability to write code and thus judge mine. That didn’t matter, the monkey sees in terms of designs and specifications, and while he may act as though that initial document is the law, he’s usually capable of accepting that coders can’t walk through walls.
I came to my senses with the third round of TOEs, perhaps not in a fashion that I can use as exemplary to this post due to time constraints, but I did, at least, apply the lessons I’ve learned prior to CRS and incorporate instrumentation into my concepts and implementation.
One example would be the “dumpresupply” command, which went in right near the start. The implementation of the supply queues is heavily weighted in favor of such a command – something virtually impossible with the game’s earlier supply systems.
A trivial command, yet it made a universe of difference to development, because I was able to hand over unfinished code for testing and get useful feedback. Not “something is wrong with supply” but “steps a + b + c do not produce the expected result of d. Output after A, output after B, output after C, ERROR”.
Version 1.27 of Battleground Europe has been greeted with an excellent reception in all but two areas. First was post-release stability of the hosts. Yep. I ran into issues. I knew I would because of attempts #1 and #2, and we failed to get the server upgrades performed in time for release. I’ve had undue praise from players for my dedication to getting things fixed, but it wasn’t nearly as difficult as it might have seemed. The host changes were suitably instrumented so that finding the cause of these otherwise ghostly problems was almost painless.
The third time the hosts crashed after 1.27, I was hit by a wave of panic; there was nothing in the logs to lead to a cause and no consistency between the 3 crashes. I decided against banging my head on the desk and crying, and went played with the cat for 10 minutes to muster the strength even to face the ominous prospect infront of me. When I came back to my machine, I remembered that my code was instrumented. 90 seconds in a database client and I had the cause pinpointed.
The second failure-to-please in 1.27 is performance. The client has some specific instrumentation, with varying levels of precision, most of which has been added to try and track specific problems. The client is complex, and it runs billions of operations a second on pretty much a single task. When something goes awry, the old “attach it to a debugger” notion is ludicrous. Invariably Martini has to take a rather sledge hammer approach to finding a performance bottleneck.
We’ve looked at various performance/defect tracking solutions, but the cost of integrating them to an existing project of our client’s scope is just intimidating and impractical. And that’s just to integrate it – after that you still have to apply the tools to finding, solving and redesigning solutions.
Our client is developed for optimum performance. The sort of frippery that might let us diagnose issues in it just doesn’t exist.
In the early days a fear gripped CRS that if we built diagnostic systems into the engine, they would leak into the hands of people who might realize a way to exploit the client through it and kill our project stoney-cold dead: When you wanted to test something in WWII Online version 1.1, there were no shortcuts.
If you wanted Ciney turned French to start a test, you spawned in French at Anhee, ran over there on foot or fired up a second client and spawned a truck to transport yourself, and then you captured each facility manually one at a time. And you hoped that nobody spawned in an shot you or recaptured something.
I shudder to think of the amount of hours programmers spent on carrying out this kind of test precisely because they were averse to incorporating testing and instrumentation into their code.
The point here is that client functions are generally very all-inclusive, which means you can’t easily isolate them and work on them independently. Many of the new systems that we have developed – and I say we here because Martini, Rickb and Ramp have all been tending to do this automatically – are developed in an isolatable fashion that can be compiled into a trivial client. On the host I regularly pull out modules and compile them into a simple testing harness without the overhead of an entire host.
But in both systems its not always possible – once you start to touch on older systems you find yourself cornered into requiring the entire application. I doubt Martini could build a “headless” instance of the effects system for stand-alone benchmarks and it would probably take me some effort to build a stand-alone instance of the TOE supply system because of its ties to the strat system.
Without instrumentation, we have no means by which to track the conditions under which the client performs poorly, which makes finding this loss of performance a scary and nebulous task. It’s not as simple as, say, pointing at STOs and saying “they cost FPS” – they probably do, but are they what’s causing the problem? Removing them might fix the issue here and now, but they might not be the cause and somewhere else might be some simple defect, bug or oversight or even some piece of unoptimized code that has become overused … that is hogging CPU cycles and pushing STOs out of the bed, so to speak.
Programmers are often shy of developing instrumentation because they picture themselves being reduced to mere operators, running code again and again and crunching the resulting numbers.
It’s strange that its usually the monkey that has the inspired notion of writing some kind of tool or utility that does that work instead and allows the monkey to do that task. As though the monkey would rather not have to ride on the programmers’ backs.
At the end of the day, when code doesn’t work, someone has to figure out 1. where and 2. why, someone has to figure out the 3. cause and figure out 4. a solution. Programmers profess that 3 and 4 are their domain and speciality, and yet they perpetually leave themselves open to 1 and 2 by failing to offload that burden to the code and the system as often as possible.
Teach yourself to instrument, train yourself to automate: delegate the task of defect tracking to the code, because the monkey isn’t a programmer so bug finding is always going to wind up on your plate sooner or later.