Quit that already

3 years ago, we had an issue with the auth server. It would run fine for a while, and then the database would just stop responding.

I was told it “happened” now and again. 30 minutes later it “happened” again, and repeated itself every 30-90 minutes for the next 6 hours.

We couldn’t find a single cause, so we solved the problems we were seeing right there. We were all set to upgrade the SQL server. But then the problem stopped happening. We spent another couple of days looking for a cause but never found one.

2006, Thanksgiving. Ramp is on a river boat with his family and nowhere near the ‘net unless ‘gators have bluetooth. Killer is in Houston. Gophur is out of town and out of booze, Doc is swimming in poosville (sewage back-up issue). Beep goes the pager. Beep, beep, beep, beep, beep, beep. “Uh” goes the host guy.

Absolutely nothing seems untoward. For 15 minutes I look at log files, messages. I can’t find a hint of a whiff of anything wrong, people are spawning, capturing, dying, respawning. There are even people logging into the game.

“beep, beep, beep, beep, BEEP” goes the pager.

Finally, I notice that one of the auth processes is still checking on the same customer it has been checking every time I’ve looked. This guy is either logging in like a freak or somethings up. I restart auth, whoosh. Everything goes green.

Well, it had been running for 7 months. Processes get tired.

Long story medium-length: after a 3 year hiatus, our problem is back. With help from Ramp, we did the database upgrade that’s been waiting for 3 years; we ran table checks, we ran hardware checks, we ran software checks. After 3 outages and a couple of aspirin, things stabilized again last night. And I was just dozing off this afternoon when BEEEP. And then it just kept doing it all afternoon.

Finally, Ramp and I investigate further, there’s absolutely no indication of a problem. Things work totally normally but this one query just doesn’t ever come back. Nor does it time out either.

After much greying and pulling of hair, I notice a rather strange coincidence. 20030910 – when the problem went away 3 years ago – a new table was created, and all of the auth connection logs were copied into it and the primary log cleared out.

Now, after 3 years, and many hundreds of millions of connections later, there’s rather a lot of data in there. 

As it happens, Ramp recently built me a replacement auth box so that I can build an upto-date auth host contemporary to the current game engine, it just got reprioritized to the top of the list.

5 Comments

So the log file was simply getting too big?

^ sound right.

but it would be nice with a better ending after reading a page of something that doesn’t mean more to be than blah… =)

The words I’m saying now
Mean nothing more than “meow”
To an animal.

I could finally login about 12 am gmt last night.
Ripping my hair out yesterday. I was looking forward to getting rat arsed playing wwii.
Instead i got mega rat arsed and lost my voice screaming at the tv when the rugby was on :P

Fun. Sounds like this stupid problem I had a couple jobs ago.

I worked for a place doing a real estate website, and it was really nice, running under Oracle with a nicely normalized database.

Well, almost all under Oracle.

There was this one little part, where some MBA type with just enough ColdFusion smarts decided to graft in a log of the searches that were being done. He didn’t know Oracle though…

so he stuck it in Access.

ACCESS. On a PRODUCTION SERVER.

That thing was a bitch and a half. There’s a bug in the Access ODBC drivers, if you fire a querystring that’s too big through, one of the connections goes away, silently. The user gets a timeout, and then… searches again. Another connection dies. Repeat until all 16 (or whatever) are dead, and then the entire site locks up because it can’t log search criteria.

It also had a 1 GB filesize limit, being Access. God help you if you didn’t notice it was getting close and perform a file swap-out before it hit the limit.

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: