Unicode…

I always knew I would to learn about wide characters and internationalization, I was just never in a hurry to. The client uses UTF-8, unfortunately UTF-8 requires 3 bytes for Chinese characters which presents me with some fairly significant issues .

Since we will be supporting Chinese traditional maybe I can remap the characters down to – essentially – their own code page that uses less than 16 bits per character.

Since we use the player’s callsign as an index all over the place it has to be a value that can be stored in a base value: that sets a restriction of 64 bits (unsigned long long) unless I convert the code to 64-bit (although I’m not sure GNU C/C++ have a 128-bit register type defined yet?)

Neither prospect makes me rub my hands together in delight but … That’s why they call it “work” :)

9 Comments

Less-than-16-bit encoding for Chinese characters sounds a bit difficult, considering that according to Wikipedia even 16 bits isn’t enough for all the characters in the latest dictionary.

UCS-2 might be fine, though. I wonder how many Chinese want a callsign longer than four characters?

I also wonder how difficult it would be to rename the current callsign “user-id”, and add a separate arbitrary-length callsign?

Also, wouldn’t it be enough to require that the first 64 bits of callsign-encoded-in-UTF-8 are unique, and use that as index where necessary? Or are the data fields for an actual printable callsign in your protocols actually limited to 64 bits?

So the past comes up again. When designing the game to be played on modem, bandwidth was precious. Call signs were unique you didn’t have to use the id, which you have, and only pass a single piece of information.

How much more bandwidth will the call sign take up?

BTW this game played well by modem when compared to other MMO’s

Sounds like an excellent time to use the spare data area you need for China to give the rest of us the ability to choose different uniforms! :)

we use the player’s callsign as an index all over the place

This seems like a particularly bizarre choice to me… is this something you inherited from the original team and it’s too embedded to clean up with a proper “player UID” that never changes even if the player’s callsign changes?

What you call it doesn’t matter, the player’s name is integrated into servers as their identification. You could change it to a numeric unique ID, but that would then be your name, if you see what I mean, no.18304723?

Krenn, each player has a Unique ID, their record number in the player table. Since our processes are distributed, not every server has every user loaded, so using the playerID would mean spending an awful lot of time doing playerID <-> username conversions; but given that the username is unique, and since it was small enough to fit into a 64-bit integer (actually, 10 characters would but we were using the upper 16 bits for something else a long time ago)…

Coupled with the fact that we wanted a “unique ID” that wasn’t the player’s account ID, and since we already had the complication of PlayNet Account IDs, nobody wanted to add a fourth unique ID for identifying players (too easy to mix things up) and being a UINT64 rather than our UINT32 playerIDs, you get compiler warnings if you confuse the two…

So they decided that the callsign-ID was “safe” to send over the network. Obviously, a discrete uniqueID would have been too – but then you have to look it up all the time to find out what you want to know, which is “what’s his name?”

There’s always the possibility I could convert it to a “bignum”, but then we blow away a lot of efficiency :( The 64-bit port might be the easier route.

typedef CALLSIGNID unsigned long long long ;

– Oliver

Not every chinese character is used as a “Last name / First Name”.

Like Smith or Jones – there are many likely combinations that recur.

The first names are difficult – and even the westernization of Chinese first names is not standardized.

Chinese characters – unfortunately for coders – use many homonyms (for, four or too, two, to) and the characters that sound the same have widely different meanings; hindering easy westernization.

Somebody, somewhere has already solved this problem to the extent that it can be solved.

I suggest not reinventing the wheel, and researching the solutions already discovered…and then deciding where/how to spend the most valuable resource of all: Programmer Time Available.

:)

Oh, I have no intention of doing so, but that means the producers are chomping at the bit with nothing tangible from me… I’m trying to catch up on awful lot of other people’s past work etc, but because of the existing body of software and the fact we’re not starting from scratch, I can’t just pick up a book and start from there so its slow going. You can read dozens of pages of an article before you realize “crap, this can’t apply to the special case I have” :(

For all we know you can write “The Dragon Eating Parachuting Pooches” in 4 chineese letters.

Maybe its not a problem.

That would be 7 characters. “pilot” is 3 characters.

We’re talking about Simplified Chinese. “hot pilot” and “extreme pilot” are 5 characters, “skilled pilot” is 6 characters.

Now that’s just babelfish translation.

But even if we’re just looking at a couple of hundred thousand players, you need room for meaningful callsigns.

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: