Simplified Chinese in 12 bits.

Unicode contains some 20,934 characters in the “Chinese” namespace, combining both Simplified and Traditional characters along with all kinds of other East Asian characters into one set. According to GB2312 it seems there are only about 3,666 characters forming the bulk of Simplified PRC.

For callsigns (character names) we use a restricted subset anyway that isn’t supposed to include punctuation, etc. If the number of characters is less than 4058 then each character can be expressed as a 12-bit number translated back to their original Unicode (UTF-8 or -16) characters through a simple mapping table (4058 allows for a-z, 0-9 and some special characters).

If I can get it down to 12 bits, then I can store a 5-character gamename in a single 64-bit integer. Which is kind of important :) I’d estimate about 10% of the host code expects these native 64-bit integer values to contain compressed gamenames.

Sure, we could come up with a different way to store the game names, sure, I could introduce a “gamename” type with more than 64 bits; that means re-tuning all of our older, custom, hand-tuned packet formats and message layouts, which is likely to be a significant and time consuming process.

Feels kind of weird having a 5 character callsign limit though.

Lots more research for me to do.

11 Comments

If I am not mistaken the Chinese characters are more like small words than actual letters. So one character could stand for three letters or something. That’s why they have so many characters.

So you have 5-15 letters in a game name, depending on what Chinese characters you use.

Well, look at it this way, they’ll have 1100424404906276768 possible game names. We only have 208827064576.

Oops, correction, we only have 2821109907456, forgot to include dem digits :)

Just a database/table with all the gamenames on it (it will have to be updated everytime you launch the game).

Instead of sending the nametag, sent a user-id, and look on the table for the nametag of this userid.

Userid #1 will be the first name on the table, #2 the second and so on.

Even you can reduce it to 32 bits or 24 bits (there is more chinesse people than 24 bits, but it woud be a cool problem to have!)

You can split the database in 16bits users, so really only the latest users will be added/updated.

The database also can be protected/encrypted for extra security, although it only contains a large file full of name-tags.

The system can be used in the future for storing personal decals (userid-1.bmp,…).

In Mechwarrior 4, every time you enter a map/server, the game loads all the decals of players in existing map/server and store them in /decals. Next time you enter a map, it checks if it has the decal or not before requesting them to sent you the decals.

I was in good hope that the Chinese patch would catch an annoying bug/misbehaviour in the game map.

It seems the 5 char limit leads to confusion in one of the map/strat/w.e. server processes when you have several players who share the first 5 characters of their callsign in one mission (eg. gelbe1 and gelbe13, rote1 and rote13). One of the player icon vanishes from the map, the other gets the designation of the vanished player. It is reproducible, we have lots of rotes and gelbes in our squad. Once they are all in one mission, utter icon chaos ensues.

My knowledge of chinese is very limited, so i cant tell how likely these collisions will be with PRC callsigns.

I know db lookups are expensive and you dont want to sift through your complete code base. If possible in any way I would prefer to use a unique id (some hash value of the callsign) to be consistently used in all server processes. Since this would require changes in the server and network code, it is very likely a NOGO atm. Maybe you could consider that for later changes.

Just name every user MonkBasher, problem solved!

Erh, ampos, did you miss the part where I said “Sure, we could come up with a different way to store the game names”?

Ampos wrote: “In Mechwarrior 4” well that’s all you needed to say. If it works in a 16×16 game with fixed maps, how could it fail to work on a 24x7x365 persistent, online game with thousands of players? :)

“/report my game is laggy with all these free trial players logging in and updating my game name database”.

When callsigns are well-compressible, it’s infinitely more efficient to send them as IDs rather than having to continuously look up displayable names from a table. Player names, squad names, contact report IDs, waypoint names, channel names… All use “callsigns”.

Callsigns = unique. If we can fit them comfortably into 48 or 64 bits, maintaining an active database of all the various mappings on your client is a waste of bandwidth and CPU, we might as well just send you the compressed callsign.

Erh, rote7, there is no 5 character limit yet. When there is, it will only be callsigns containing Chinese characters. It has nothing to do with ASCII callsigns.

And I’m having to work around this problem in the first place because, in ASCII, callsigns provide a superbly compressible unique ID – see above.

I dunno Oli, perhaps I’ve just done too much RDBMS work on fully normalized schema design, but I can’t help but feel you’re wrong about the data storage.

Yes, having to map back and forth adds some complexity on one place, but it simplifies so many other areas. You don’t even need to have the client maintain the mapping; it could all be presented to the client in denormalized form… but that would still present you with your 12 bit problem.

The client systems already have to maintain lots of data about other players – unit type, vector, orientation, weapon state, etc etc etc. One more bit of data isn’t that much, for those in the visible player limit. The comm system names could be part of the comm text, instead of constrained to a userID field on the packet. Even if you have to have the client maintain a mapping for 5000 people online at once, that’s a paltry lookup table for modern systems, and you just update it when people log on.

I know you have more familiarity with the system, but I’ve also known many cases where somebody too close to the issue insists It Just Isn’t Possible… right up until it’s shown otherwise (and yes, just as many the other way where an outsider insists it is, but can’t be convinced otherwise because they can’t see the code).

/database wonk

Hmm, I came up with the idea of the character limit because the icon problem arises only if almost identical names/callsigns are in the same mission.

Maybe the source of the problem is somewhere else, all clues i could get pointed in the direction of limited storage space for callsigns in the network packets (read somewhere that some bits are reserved for rank, squad etc pp) and that the map/strat/chat host dont have the same degree of information available about players.

That would start to touch the whole design of the client which they are not wanting to touch or redesign.

Embedded databases on the client side for gaming are coming. Just not right now. I see it has part of the whole muti threaded issue of gaming.

This game does turn 7 in a few weeks so the foundation for the game, server and client is based on ideas that are close to 10 years old.

Krenn:

Trouble is you’re talking about an RDBMS, I’m not.

We don’t use the callsign as an index in storage, we use the PlayerID.

We would never want to expose PlayerID – that is our internal Unique ID. We would always want a secondary Unique ID.

The fact that the callsign can be expressed as a sub-64bit number makes it a superb secondary unique key. a) It’s already unique, b) we’re going to have to pass it around anyway, c) it provides a human-readable association with activity in the game.

I realize that it comes with some unfortunate constraints, and I’m not disagreeing that I might have said “but that will limit us to 8 or 10 characters per callsign”.

But I’m telling you: it was an otherwise excellent decision. Inevitably you sometimes still have to do mappings, but the amount of CPU time you have to spend doing lookups is massively reduced when the primary index you exchange between clients/servers is one that is deemed publicly exposable, and not only publicly exposable but readily human readable.

If you really sit and think about it, the callsign is an absolutely fantastic unique index for most purposes.

Krenn wrote: “Even if you have to have the client maintain a mapping for 5000 people online at once”

You just said “at once” – same kind of atomic thinking Ampos erred into.

Incidentally, the upshot of this work for non-PRC players will be an increase in callsign length from 8 to 10 characters.

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: