A far more painless unicode experience

Poor Ramp: When he had to localize our web sites and tools etc, he had to wrestle with Apache, Tomcat, Perl, MySQL 3.x, the billing system and an assortment of other tools to try and get everything talking UTF8.

I still use Roxen‘s webserver for our internal management tools and systems, and by-and-large it just does the right thing. All the fancy features that Apache/Tomcat have added in the last 4-5 years – like JSP Tags – basic stuff in Roxen, and very much matured.

And UTF-8? Almost totally a walk in the park, which was surprising given the Roxen server I was running has an uptime of 20 months and the install itself is actually 5 years old…

It did go through a bit of a quiet phase as a product where, frankly, I thought it was about to keel over or vanish into the software ether but it seems like a resurgence of interest in their Content Management System by news/media organizations has breathed fresh life into them.

I worked with Roxen’s CMS when I was at Granada TV and loved it to bits. It’s what I used to build the DAoC Player Wishlist. The “Personal Edition” is focused more on the Content than Management, the CMS itself is primarily a web/graphical workflow and revision management system.

For instance, the sites were served via a “CVS” (revision control) filesystem that is brilliantly abstracted away. Lets say you have the site up and running and you want to mess with the style sheets? Log in to the CMS, click the branch option, make your changes, save them. Nobody sees anything. If, however, you view the site while logged in as your CMS user, you can see the changes.

It also had a nice FTP interface and etc for making it really easy for groups of people to work on major or minor revisions to the website and test them out thoroughly – a complete workflow system. Granada were prepping to spend a lot of money on one of two really expensive other CMs and get some bespoke features added to them. Roxen CMS did all of it and more and without the same nasty hardware requirements.

With the first alphas getting ready to ship to China, we finally needed to start testing our management tools with UTF-8. It was all going well until we pulled UTF-8 data from the database.

Seems that Roxen is doing some kind of on the fly conversion of data from MySQL, as though it is assuming it is latin1 encoded and translating the byte sequence from Latin1 to UTF8 producing garbage.

Unlike Ramp, I just had to find why one app was having issues, and it was likely to be caused by various Latin1 characters secreted in our data sets and pages at various points, and my resulting efforts to “encourage” UTF-8 encoding resulting in various translations going on.

In the end, I just decided to roll it back to a blank slate and …

Oh… It just seems to do the right thing.

I just have to figure out what I didn’t do to make it work now :) I was expecting pain, using Unicode is supposed to hurt. An application that seems to do the right thing? Unheard of :)

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: