A few tools go a long way

I need to start working on my sales pitch/technique. As Motor pointed out, 1.31 is getting close. You want to know how close? Join the queue. Now, isn’t that just wrong? Hey, hands up, mea culpa, I’ve been a significant cause of 1.31 delays. But that just makes the was-once-a-manager in me even more irate.

How can we possibly not know how close we are? Aren’t we a seasoned software developer?

I’ve railed on most of the topics herein before. Automation, tools, issue tracking, etc.

Well, we have Trac. Trouble is, some of our coders (hand still up) aren’t very good at keeping their tickets updated (I’m ashamed to admit that petulance took me the other day and I went through all my open tickets and commented ‘No update’ on them).

Our Trac isn’t tied to our version control system (the version we have just doesn’t want to integrate a repository on another machine; after all, Trac is supposed to be a Subversion server that adds issue tracking).

TracExplorer might help, except our trac isn’t https based, and TracExplorer needs it to be.

Even if we got Trac properly integrated, it’s … noddy. It does not provide that Big Brother meta view that you need for managing programmers.

We have no build automation systems; the ones I cobbled together with FinalBuilder died of death when the immediate crisis had passed and some of our team went back to “we will hardly ever use this information” :(

We don’t branch like our lives depended on it. So we regularly break each other’s stuff (the rest of us have to make our code work under Windows before we check it in, but not vice-versa). And (hand still up) that leads to breakages when we do branch because it’s become so irregular that you forget…

No code-review or change tracking, like FishEye.

We have forums for our closed beta testers to report in-development issues and feedback, but those forums sit on a shonky old box that probably won’t survive its next reboot. For a couple of years now, I’ve had problems just logging in to them, and lately, it’s gotten so bad that I give up trying after a few minutes. That doesn’t stop the producers posting Trac tickets with no contents other than a link to those forums.

Wikis: we have wikis! Oh god, we have wikis :( We had a spate of wiki installs at some point in the past, and sort of fell to 3 of them. Yes, 3 different wiki systems. And more wikis than that. Oddly, nobody is ever quite sure which wiki which piece of information is on (maybe we should have a separate wiki for that).

None of the management seem to think that’s a major deal, they assume we just search, find the documentation and proceed. rofl. S.O.P is to call Bloo and, if he doesn’t know, quietly forget about what you were going to do.

When we do have tools, we never afford the research time to actually really find and implement one that is going to meet our needs.

Why did it take so long to make the move from VSS to Subversion? I stagger at the memory of this, but I was told: Well, come in one weekend and switch us over. (And ultimately, I did; my raised hand is starting to get tired now).

The worst sin of all: there is absolutely zero infrastructure for across-the-board software testing.  Major portions of the host systems are under my own self-rolled test harness; but even I have succumbed and turned some of them off when the testing methodology broke. There’s no time allocation to fix them and nobody but me seems to care anyway…

“If it ain’t broken, don’t fix it”.  But few people discuss whether that means “wait till it has collapsed” vs “make sure someone’s keeping an eye on it”. In software, there are very rarely signs to indicate a piece of software is going to go nuclear when the next person changes an unrelated line of code…

“Nose to the stone” syndrome. “We have a deadline. If we stop to play about with bug tracking, that’s time we won’t spend developing”. So, instead you gamble that you won’t have many bugs to track from your cowboy efforts to meet the deadline.

This all adds to a lack of – not supervision – but supervisability. If Gophur or Rafter want to know where one of us is at, it requires an interruption from the work to stop and report.

I can’t speak for the other coders, but it’s not that I mind having to report on where I am at and account for myself; it’s the fact that they have to take my word for it. If I don’t tell ’em, they are completely in the dark.

If we actually had a system in-place, the workflow process should go something like this:

  • Someone creates ticket for work;
  • Developer begins making changes;
  • Developer reaches a checkpoint and does a commit:
    • Ticket #, time spent, and annotation included in commit.
  • Automated build system picks up the changes and runs it’s basic tests;
    • Error is detected and the coder is notified
  • Developer fixes his error, commits, and proceeds with work;
  • Finally developer is done and forwards ticket to testing;
  • Automated build system picks up the completed work and runs it’s extended tests;
    • “Unable to find font arial.font” forwarded to developer
  • Developer resolves the issue and commits;
  • Automated build system validates and assigns a tester to the ticket;
  • Tester runs the resulting build through it’s steps
    • Finds a problem, annotates the ticket and returns it to the developer

(Obviously there are some loops in the workflow that I didn’t draw)

This system would provide a management overview of activity: tickets that have been worked on/commented on, tickets flowing to test, tickets flowing back.

If you’re running a restaurant, you care about how quickly the food goes out, but you also care about how quickly that food comes back. The same concept should be applied to software development. As far as I know, Trac – at least ours – doesn’t have the capability to show you this kind of workflow expenditure.

One obstacle to getting any company to make the leap to integrated tools is that everyone knows, programmers don’t like to be watched over. They get antsy if you ask them to come to meetings, they glower if you ask them to write documentation, etc, etc.

Reality check: Programmers don’t like additional work that seems a distraction at best.

We could, kind of, achieve the above work flow with what we have, with a few minor flaws:

None of the tools are integrated for us, which means each step is an inconvenience. Update a ticket? Fire up a web browser, go to the trac site, log in, find the ticket, enter the details.

Does it take very long? WRONG QUESTION.

The point is: I can forget. And nobody is going to monitor this level of detail. So I can get away with it. It may be purely accidental, or it may be laziness. Both are equally welcomed.

About half of our code checkins have absolutely no description whatsoever. Because the setup allows us to, and the chances of anyone reading our commits is relatively low (I’ve only seen Gophur question an uncommented checkin once or twice, when he stumbles on one in the svn commit mails).

I suspect that a fair portion of these are because the commit has more than one piece of work in it, and the author has gone through and updated the trac tickets and so doesn’t bother with commenting the checkin. But I almost never see anyone put a revision number in their trac comments. So I can’t actually tell…

Lets review: A checkin that modifies multiple files for several disparate purposes gets no comments. Kind of a least-desirable case. If you want to know what was done, have at reading the changes in source code form.

And did I mention the lack of automation? I can’t tell you how often one of us has checked in a one line trivial change without so much as a compile. Neither can anyone else.

This is one of the things that really cracks me up. Playnet’s mistrust of automation results in a manic insecurity about anything that hasn’t been visually inspected which starts its manifestation with the belief that it is the programmers’ responsibility to visually and thoroughly inspect the effects of any change they make.

I’m sure that if you asked any one of us, they’d say that’s ridiculous. But watch us work for a week, and you’ll see it right there. “Didn’t you test that?” rolls off the tounges of certain people.

What a bizarre question

Yes, it is the programmers’ responsibility to ensure the functionality of the code they deliver, but that’s not the same thing. Coders make lousy testers, however, especially for code they’ve just written, like a new mother – you can’t see that your baby is ugly, all you can see is the miracle of creation.

The best employment of a programmer in these situations is for them to provide the means of validating their work: unit tests, for example, and testing methodology writeups.

These are both things that programmers don’t generally like to do. That’s because they’re usually done terribly. The programmer is left to manage and operate the tests, which means self-assessment. And any good programmer knows whether code is good as soon as they’ve written it! [*g*]

What programmers really don’t like is, again, distraction. Having to leave their IDE to log into a web site, find and update some archaic web-style markup to annotate testing methodologies which probably won’t get used … is pure PITA.

Again; integration and automation… Writing unit tests in-line and knowing that your code won’t be accepted without them and their validation results, well that’s less of a burden because programmers are all about making work for computers.

And being able to in-line comment testing methodologies kills two birds with one stone if you know something will come along and extra that data and translate to whatever format the QA guys need is more of the same of what programmers are best at: writing instructions for a compiler/interpreter to worry about.

The last few months have been really painful. Right now people seem open to the idea that we need “something”. But just when I was starting to hope … those words “who will have time?” came up.

Long time ago, Rickb made a command-line version of the client (no GUI or graphics) which could be used for logging in to hosts to test host functionality.

When I needed it most, it quit working. After scrabbling around with it on his own time for a few nights, Rick finally checked it in so that I could fix it. I did try, but what he checked in only worked under MacOS, so I lost a few nights getting it to work under Linux. Then the pressure to fix what I was working on got too great and I dropped the command line client.

Rick eventually restored it to a mostly working state, but we both lost interest in maintaining it (at various times, both of us were explicitly asked to stop ‘fucking about’ with it and focus on what we were supposed to be working on).

That was some 2, nearly 3 years ago. And we just haven’t needed it since then. Sure, I might have used it now and then, but I’ve been able to futz around it’s absence. And few of the times I’ve felt the absence have taken me longer to futz around than it would have taken to stop and get it working then and there.

A few weeks ago, it became clear that we had a major performance issue with one of our host processes. I futzed. I futzed some more. I posted about tearing my hair out on my blog. I futzed my futzes. It looked good. Testers logged in. It fell apart. Futz-futz-futz. Looked good. Fell apart.

These issues began casting a very ominous shadow, and then it ran right into one of our side contracts. Seriously, I nearly peed myself just a little, laughing, when their email more-or-less insisted we use some kind of client to simulate hundreds of players logging in to a cluster. And then I actually did leak just a little when our internal reply was “we can do that with the command line client, right?”.

It took the best part of a week, but Ramp was able to resurrect it and – with a miniscule amount of input from me – get it working decently enough. It’s not a command line client, any more, unfortunately, he took shortcuts to getting it working. And it’s quite a lot of revisions short of Rick’s last version which – apparently – he never checked in (and I don’t blame him).

So – the “pseudo client” has had just enough work done to it to get it to solve this particular problem. Actually, it only works for 1.30. It could probably be made to work for 1.31, but I ain’t doing it unless I see a ticket allocating time for it.

Any of my colleagues who read this far might be a bit surprised. I’ve been doing plenty of gorilla just lately: For instance, despite a towering workload and glaring producers, I found office time to screw about with virtualization despite being strictly told not to :)

I had actually told people what I wanted to do, and was told specifically not to do it.

So I went ahead anyways. I talked Killer into letting me install VMware ESXi on a dual quad-core in the machine room, onto which I slipped an experimental Ubuntu box. And then I played around with the “VMware converter” a bit. And abracadabra, virtual dev cluster running on the box (which I’d actually built using my own license of VMware Server at home).

Why virtual though? Because virtualization was part of my performance solution: rather than dragging out my flirtation with parallelization at a code level, simply run 2-4 virtual machines per physical dual quad core and parallelize at the process level.

I also virtualized our internal trac/web/tools box, which drew some puzzled looks. More “dicking around”. Except, it was previously running on the only box with a similar hardware configuration to our cell hosts. Hey – can you see where this is going?

Correct: A virtual dev cluster accompanied by one physical machine… On a stable and maintainable platform.

So I’m gonna give it one more try, post 1.31. But I’m really gonna have to work on how to get through to them that this is not whimsical niceties, but the fundamental requirements of solid software development; that the lack of any investment in provisioning ourselves with the barest of these facilities is downright negligent.

Our current modus operandi does achieve one thing: perpetual motion. Some would refer to that as “freefall”, with the implication of an eventual splatter. If we’re a skydiver, we have no watch or altimeter – just a note that says “deploy chute before it’s too late” – and we’re operating on the premise that pulling the cord will tell us whether it was packed properly without any of that pansy checking before jumping…

19 Comments

have a look at hudson (http://wiki.hudson-ci.org/display/HUDSON/Meet+Hudson)
Not sure how useful it’ll be – but might be useful to you

We have tools?

Public wiki doesn’t count. So, it’s only 2 development wikis.

doh, forgot trac’s wiki. I hate that one.

You’ve never visited the forums? We’ve got lots of tools.

;)

Oh, and:

TracExplorer might help, except our trac isn’t https based, and TracExplorer needs it to be.

Do you guys need me to help out on this? Should be trivial enough to make a self-signed cert.

I know the frustration of all this. I’m having a smilar situtation at my work. No automatic testing and an overall lack of control of the whole process. But atleast were slowly getting better, we’ve got hudson up and runnning. Were thinking about setting up virualized test sites to deploy our newly buil systems onto. Only way to fix it is to fight your way through it. Step by step improvements of the process.

Just deleted several paragraphs in response.

It’s hard to have a discussion about particulars…and an abstracted response just doesn’t get it done.

What you see as an institutional fear of automation is not that, however. And that misunderstanding on your part sort of ruins the flow of how to get from A to Z in what you’re trying to do.

Meh. I’ll just stop here.

Was not expecting responses :) How DARE you guys read that much WALL’O’TEXT!

Mike: Actually, Andrew had mentioned that. I didn’t see anything terribly wrong with it, but I know that Bloo and I have our eyes on a particular integrated toolset… *cough*jira*cough*.

Bloo: There’s also one sitting on one of Dana’s Macs too, isn’t there?

Krenn: [lots of tools] ROFLAMO.

Krenn: It’s not quite that simple. Trac is running on an ancient RedHat 6/Fedora Core 1/Fedora Core 3 operating system. It is a Trac 0.9/10.0 experiment that Ramp dropped in that just sort of started getting used, running under Apache{not quite 1 OR 2}. The same box also runs my kfsone tools, and they are slightly co-dependent in really bad ways :( As well as freeing up the physical machine, I wanted to virtualize it so that it will be easier to try out virtual appliances against it and ultimately swap it out.

Samuel: Be wary of the step-by-step. You’re a few steps behind us :( Integration is paramount, otherwise people will start dodging parts of the process and management will latch on to the reluctances they see as proof that it is making things worse.

Snail: Toss me an email, if you like, but the early simple mistrust of automation is definitely, solidly wedged into several folks as a firmly-gripping fear these days. I’ll concede that I didn’t elaborate fully, and there’s more to it than just that. It’s impossible to get time allocated for the implementation of tools, because the ghost of the post-release fire-fighting sweat shop has never quite left the building.

This “fear” manifests itself when you can get them to agree a tool is needed, but run into a steel-reinforced concrete wall when you try to get time allocated for implementing and training. If the man-hour cost to the company isn’t a flick of a switch, “the look” creeps onto people’s faces.

Even Killer is affected by it. He came into my office a few weeks ago while I was talking to Gophur; he’d spent the entire day fighting with whether or not to try installing something on a spare box. After 15 minutes of rambling, I got up, walked into the machine room, popped the CD into a drive and powered the machine on. 30 seconds later, we had his answer: it wouldn’t run on the ancient machine. I popped another CD in, and off it went. Killer said “well I didn’t want to spend all day messing about with installs” and Gophur suddenly had to go for a smoke :)

Well damn it if those last 2 paragraphs don’t tell you all you need to know.

What you described there does not = fear.

There’s another word for it. Swat Killer about the head and shoulders with said word and tell him I said to get his lazy arse in gear :P

Honestly, you’d have to see it at play… Remember Doc’s old “Every tool we have is broken” slogan? It’s basically that morphed and absorbed. It used to be simple mistrust. It’s not terror, it’s not conscious avoidance. It’s a fear or phobia. They’re willing to consider new tools at a discussion level because they want rid of the ones they have, not – I don’t think – because they truly recognize they’re eating soup with chopsticks? :) Hence this has been discussed several times but the traction never really happens.

kfsone :

Mike: Actually, Andrew had mentioned that. I didn’t see anything terribly wrong with it, but I know that Bloo and I have our eyes on a particular integrated toolset… *cough*jira*cough*.

JIRA is actually fairly freaking good – not very impressed with confluence or crowd – but jira is good – except don’t try to connect it to another JIRA instance (we’ve ended up having a custom java code to act as a bridge and totally prone to #FAIL

Really? Ugh. That’s actually what Bloo is all horny for it over.

Now that we have a few ESXi boxes, though, I can relatively painlessly roll vm boxes for serving various solutions (e.g. I need to make a Mercurial server; I ran into some unpleasant weirdness when trying to access a Windows mount simultaneously from Windows and CIFS … just like the Wiki/man pages/book said I would *slap*)

Institutional fear of reading.

http://www.heroengine.com

There’s a good sales pitch on the power of tools.

All for the low, low price of a million dollars!
(When they first released that, I think it was actually 1.5 M. They were very explicit about only wanting a handful of customers).

Considering the budget of a AAA MMO title these days, that’s a bargain for what it does and what it’d save you from scratch.

But I’m not suggesting someone buy it. I’m putting it there purely for the reason I said I was putting it there…in light of a suggested fear of tools. Say fer instance, Doc vs terrain.

We are using JIRA and Confluence at work, and one of the good things, apart of that they can share login etc. is that you can easily extend Confluence so you can pull some information automatically.

For example, these days I’ve been working in a plugin to display automatically the interesting parts of the the configuration of our Apache servers and application server instances. I grew bored of being told to update the information as an excuse for not reaching some agreement in meetings so now it is always updated. The best documentation is the one that writes itself :).

Oh, and don’t discard Hudson because of JIRA. Hudson is a Continuous Integration Server, not an issue tracker, so that’s the piece you want to use for automatically detecting changes in the repository, updating to last revision, performing tests and notifying of something being broken because of last commit done by Mr. whatever.

In any case, as you have noticed the difficult thing in this is not technical but “organizational”, so it requires management support and behavioral changes. My recommendation in these cases is to aim first for the “low hanging fruits”, meaning the things that would reward more the developers so they start getting “in the mood”.
It took me a some moths to get my organization to use a repository, some years to get them to use JIRA and even more to use a Wiki… Still unlucky with automatic regression tests :).

In any case, good luck.

About 3 years ago my work started the arduous process of switching over to test driven development, continuous integration, and a switch from VSS to subversion. Honestly it took us 2 years and we are still quibbling over branching and intra-project/domain object/service references (though we are getting very close to a happy place on this front).

I can’t over-stress the importance and benefits of CI and automated tests. After we switched to MVC (from ASP.NET WebForms) we now require 100% test coverage and run builds on every commit. It sounds daunting at first, but remember it took us 2 years to get to that point, but I can tell you the code we produce now, in terms of quality, performance, and maintainability is light years from what we were doing just 6 months ago.

JIRA and WIKIs etc are all nice – but are minor tools compared to the incredible benefits of tests and CI.

Oh and subversion rocks. One of the best things we ever did was get out of VSS and move to subversion.

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: