Database “stuff”

Amongst the many internal changes to 1.30 is a fairly radical overhaul of how players get from an account to logged in with a set of privileges and access rights on a server. It used to be a wholly personal experience – a table with every individual user’s access permission-set with each individual permission.

Two of the downfalls of this approach were (1) that it was frakking insane, (2) that you couldn’t tell why this player had ALL_SPAWN.

When I overhauled the chat host a few years ago I introduced a concept of chat_groups which associated a set of chat commands with a set of players through a meaningful name, e.g. “GMs” or “staff” or “sure we didn’t notice you trying out that speed hack”.

Unfortunately, it required that for most groups players be manually added or removed from a group. Each group could only have one game-privilege and so we wound up with lots of groups to maintain and no coherent system for saying “this player should be in those 5 groups and have these commands and those abilities”.

And then the China project brought certain todo list items to the top of the list. The priority assigned to this work comes from the China project, but the work we are currently doing is something that has been often discussed.

The 1.30 overhaul associates chat groups, multiple privileges and even vehicle restrictions to “subscriptions” (that terminology blows: subscriptions are simply a way to indicate that a particular account has access to a particular product/feature set). So now the auth/host side of the accounting system finally begins to approach something I won’t deny all knowledge of.

These changes also require a shift from embedded SQL in the host executable to calls to stored procedures which means we have a much cleaner and better defined API and less bits of random SQL that may well treat the same data differently.

But we’re also pushing the database a wee bit harder now, which has meant – on top of everything else – SQL tuning and tweaking. I am not a DBA. I am not a DBA. I am not a DBA. Please god, I am not a DBA so let me out of this room?

Some of the new queries were generating a lot of transactions with lots of data and I was seeing a major spike in text-processing time. I have several wrapper classes for creating MySQL query strings, including one which uses a pool of mutex-locked 4k memory blocks, grabs one for itself, builds the query in this buffer and then puts itself onto a queue for consumption by a set of database worker threads.

But the process of marshaling, formatting and then sending and parsing all this data suddenly went thru the roof.

I’ve previously bitched about MySQL prepared statemement handling on auto reconnect – they store everything you need to know about a statement … except the SQL itself. If they just added a strdup of the source query … statements could actively recover themselves after a connection reset. Oh well.

I now have a reasonably robust class that encapsulates prepared statements and keeps them happy.

I can’t share the full thing with you but I can share the “scratch pad” version I originally knocked up for POC. It may be a useful reference for writing your own.

 * $Id$
 * PreparedStatement class for MySQL 5.0/5.1/5.2
 * Copyright (C) Oliver ‘kfsone’ Smith 2009
 * Permission to use/redistribute this code is granted provided
 * this statement remains in-tact.
 * Description:
 * Provides a wrapper for MySQL’s C API MYSQL_STMT structure that
 * handles auto-reconnection gracefully and sprintf style format
 * masks for generating the BIND list dynamically.
 * Construct the PreparedStatement object with the connection handle
 * it will be used on and keep in-scope until you no-longer require
 * the object.
 * To populate the parameters use the appropriate setXXXX function
 * with the index of the field you wish to populate.
 * Wrap calls to functions with try { … } catch ( const std::exception& e ) { … }
* Example use:
 *  // Add points to a user record and record where they
 *  // came from.
 *  bool myFunction(int userID, int addPoints, const char* source)
 *  {
 *    try
 *    {
 *      // You provide ‘MYSQL* getMYSQL()’ which returns the database connection…
 *      // Declaring this member static means it is only constructed once.
 *      //  Parameter 1 (%d) is the points to add
 *      //  Parameter 2 (%120s) is a string (upto 120 characters)
 *      //  Parameter 3 (%u) is the user ID (unsigned, not fully implemented here)
 *      static PreparedStatement stmt(getMYSQL(),
 *       “UPDATE user SET points = points + %d, last_source = %120s WHERE userid = %u”) ;
 *      // Populate the bind fields with current data
 *      stmt.setInt32(0, addPoints) ;
 *      stmt.setString(1, source) ;
 *      stmt.setInt32(2, userID) ;
 * // Send data to server and execute statement
 *      stmt.execute() ;
 *      // Operation successful.
 *      return true ;
 *    }
 *    catch ( const std::exception & error )
 *    {
 *      fprintf(stderr, “%s(%d, %d, %s): %s”
 *                      , __FUNCTION__, userID, addPoints, source, error.what()) ;
 *      return false ;
 *    }
 *  }


class PreparedStatement : public MYSQL_STMT
    // ‘conn’ is the MySQL connection to use; expected to auto-reconnect
    // This implementation provides no way to change what connection
    // you want the prepared statement to be on…
    // ‘query’ is an SQL query with sprintf-style format mask placeholders,
    // supported are:
    //  %d, %u, %ld, %lu => long integers (unsigned work not complete)
    //  %lld, %llu => long long integers (64 bit, unsigned work not complete)
    //  %f => float
    //  %g => double
    //  %#s => character string of upto # bytes long (e.g. %1024s)
    //  %% => escape for actual % character.
    PreparedStatement(MYSQL* conn, const char* query) ;

    // Destructor
    virtual ~PreparedStatement()
        if ( m_sth != NULL )
            mysql_stmt_close(m_sth) ;
            m_sth = NULL ;

        for ( unsigned int i = 0 ; i < m_bind.size() ; ++i )         {             if ( m_bind[i].length ) { delete m_bind[i].length ; m_bind[i].length = NULL ; }             if ( m_bind[i].buffer )             {                 delete [] (char*)(m_bind[i].buffer) ;                 m_bind[i].buffer = NULL ;             }             m_bind[i].buffer_type = MYSQL_TYPE_NULL ;         }     }     // Execute the statement with the current values of the BIND structures.     void execute() ;     MYSQL_BIND* bind(unsigned int index)     {         if ( m_bind.empty() || index >= m_bind.size() )
            throw std::runtime_error(“parameter index out of bounds”) ;
        return &m_bind[index] ;

    void setInt32(unsigned int index, int value)
        MYSQL_BIND* entry = bind(index) ;
        assert( entry->buffer_type == MYSQL_TYPE_LONG ) ;
        *(int*)entry->buffer = value ;

    void setInt64(unsigned int index, long long value)
        MYSQL_BIND* entry = bind(index) ;
        assert( entry->buffer_type == MYSQL_TYPE_LONGLONG ) ;
        *(long long*)entry->buffer = value ;

    void setString(unsigned int index, const char* value)
        MYSQL_BIND* entry = bind(index) ;
        assert( entry->buffer_type == MYSQL_TYPE_STRING ) ;
        unsigned long len = strlen(value) ;
        if ( len >= entry->buffer_length )
            len = entry->buffer_length – 1 ;
        strncpy((char*)entry->buffer, value, len) ;
        *entry->length = len ;

    void setFloat32(unsigned int index, float& value)
        MYSQL_BIND* entry = bind(index) ;
        assert( entry->buffer_type == MYSQL_TYPE_FLOAT ) ;
        *(float*)entry->buffer = value ;

    void setFloat64(unsigned int index, double& value)
        MYSQL_BIND* entry = bind(index) ;
        assert( entry->buffer_type == MYSQL_TYPE_DOUBLE ) ;
        *(double*)entry->buffer = value ;

    void _prepare()
        // m_sth will be null on initialization or after
        // connection has been reset.
        if ( m_sth == NULL )
            m_sth = mysql_stmt_init(m_conn) ;
            if ( m_sth == NULL )
                throw std::runtime_error(“NULL from mysql_stmt_init”) ;
        if ( mysql_stmt_prepare(m_sth, m_query.c_str(), m_query.size()) )
            printf(“- prepare failed\n”) ;
            throw std::runtime_error(mysql_stmt_error(m_sth)) ;
        if ( mysql_stmt_bind_param(m_sth, &m_bind[0]) )
            printf(“- bind failed\n”) ;
            throw std::runtime_error(mysql_stmt_error(m_sth)) ;

    typedef std::vector Binds ;

    MYSQL*         m_conn ;
    MYSQL_STMT*         m_sth ;
    Binds        m_bind ;
    std::string         m_query ;
} ;

PreparedStatement::PreparedStatement(MYSQL* conn, const char* query)
    : m_conn(conn)
    , m_sth(NULL)
    , m_bind()
    , m_query(query)
    MYSQL_BIND constructor ;        // We’ll construct binds here
    memset(&constructor, 0, sizeof(constructor)) ;

    // Parse the query string for sprintf-style formats like ‘%d’ etc.
    unsigned int pos = 0 ;                    // String offset
    while ( (pos = m_query.find(‘%’, pos)) != m_query.npos )
        int charsAdvanced = 1 ;        // We have advanced by the ‘%’
        if ( pos + charsAdvanced >= m_query.size() )
            break ;
        char modifier = m_query[pos + charsAdvanced++] ;
        if ( modifier == ‘%’ )
            m_query.erase(pos++, 1) ;
            continue ;
        unsigned int size = 0 ;
        while ( isdigit(modifier) )
            size = (size * 10) + (modifier – ‘0’) ;
            modifier = m_query[pos + charsAdvanced++] ;
        bool isLong = false ;
        if ( modifier == ‘l’ )
            if ( pos + charsAdvanced >= m_query.size() )
                throw std::domain_error(“truncated ‘%l’ parameter modifier”) ;
            if ( m_query[pos + charsAdvanced] == ‘l’ )
                isLong = true ;
                ++charsAdvanced ;
                if ( pos + charsAdvanced >= m_query.size() )
                    throw std::domain_error(“truncated ‘%ll’ parameter modifier”) ;
            modifier = m_query[pos + charsAdvanced++] ;
        // Replace the format mask with a parameter place holder (?)
        m_query.replace(pos, charsAdvanced, “?”) ;

        constructor.length = NULL ;

        switch ( modifier )
        case ‘d’:
        case ‘u’:
            constructor.buffer_type = (isLong ? MYSQL_TYPE_LONGLONG : MYSQL_TYPE_LONG) ;
            size = isLong ? sizeof(long long) : sizeof(int) ;
        break ;
        case ‘f’:
            constructor.buffer_type = MYSQL_TYPE_FLOAT ;
            size = sizeof(float) ;
        break ;
        case ‘g’:
            constructor.buffer_type = MYSQL_TYPE_DOUBLE ;
            size = sizeof(double) ;
        break ;
        case ‘s’:
            if ( size == 0 )
                throw std::domain_error(“Must specify size to allocate for strings when using %s”) ;
            ++size ;
            constructor.buffer_type = MYSQL_TYPE_STRING ;
            constructor.length = new unsigned long ;
        break ;
            throw std::domain_error(“Unknown parameter”) ;
        break ;
        constructor.buffer = new char[size] ;
        constructor.buffer_length = size ;
        memset(constructor.buffer, 0, size) ;
        m_bind.push_back(constructor) ;

    _prepare() ;

    if ( !mysql_stmt_execute(m_sth) )
        return ;

    printf(“– got mysql error #%u\n”, mysql_stmt_errno(m_sth)) ;

    // Connection got reset.
    if ( mysql_stmt_errno(m_sth) != ER_UNKNOWN_STMT_HANDLER )
        throw std::runtime_error(mysql_stmt_error(m_sth)) ;

    printf(“– trying reset\n”) ;
    mysql_stmt_reset(m_sth) ;
    mysql_stmt_close(m_sth) ;

    m_sth = NULL ;

    _prepare() ;

    if ( mysql_stmt_execute(m_sth) )
            throw std::runtime_error(mysql_stmt_error(m_sth)) ;

_exampleUsage(MYSQL* connection)
        // t_foo should look something like:
        //  CREATE TABLE t_foo (first int, second text)
        PreparedStatement stmt(connection, “INSERT INTO t_foo (first, second) VALUES (%d, %80s)”) ;

        // Round 1.
        stmt.setInt32(0, 1234) ;                // Set parameter 0 to integer value 1234.
        stmt.setString(1, “hello world”) ;      // Set parameter 1 to string value “hello world”
        stmt.execute() ;

        // Round 2.
        stmt.setInt32(0, 2345) ;                // Set parameter 0 to integer value 2345.
        stmt.setString(1, “second entry”) ;     // Set parameter 1 to string value of “second entry”
        stmt.execute() ;

        // Pause for the user to restart the database simulating
        // a crash that would cause a connection reset.
        printf(“———-\n”) ;
        printf(“– Please restart your database now…\n”) ;
        system(“sleep 30”) ;                    // So you can ctrl-c it to proceed without waiting
        printf(“– Resuming\n”) ;
        printf(“———-\n”) ;

        stmt.setInt32(0, 9999) ;                // Set parameter 0 to integer value 9999.
        stmt.setString(1, “reconnected!”) ;     // Set parameter 1 to string value of “reconnected”
        stmt.execute() ;

        printf(“Yay us!\n”) ;
    catch ( const std::exception& e )
        fprintf(stderr, “exception caught: %s\n” , e.what()) ;

    printf(“Really, yay us: the destructor didn’t explode either!\n”) ;


This might have been asked before, but why MySQL? It is not a real database, and it sure doesn’t scale well. If you’re already going to do work on stored procedures, why not PostgreSQL? Would help a lot on performance.

MySQL has sooo many convenient features that after a while sql becomes very unlike any other sql dialect out there.

Also, IIRC there are alot of oddities with dates.

I bet wading through all queries to look for those special constructs.

Hmm that last sensense was not really complete.

I think he meant ‘hate’ instead of ‘bet’.

Unfortunately, Kallek, you’re operating on too little information (and an apparent reluctance to use the search function).

Frankly, when we start pushing transactions at the sort of rate that will make MySQL the wrong solution for our needs it will be a choice between Microsoft SQL and Oracle – we actually own several Oracle licenses.

Myth: People only use MySQL because it’s free.
Fact: We own several Oracle licenses. We use MySQL as the backend for the game.
Fact: MySQL makes a great embeddable SQL server.

Myth: MySQL isn’t very stable.
Fact: That was true of MySQL versions 1 and 2. Some of our MySQL 3.x servers have uptimes of several years.

Myth: MySQL doesn’t perform very well.
Fact: Microsoft SQL server can generate serious disk IO problems under certain operation patterns; Oracle would do a lousy job as the back end for your family blog. However, MSQL can provide a pretty awesome corporate, multi-application backend; Oracle’s performance endures as the nascent “bloat” pays for itself by coping with what you throw at it as you scale up; PostgreSQL is robust and performs nicely with a good deal of two-way throughput and has some great features like “listen” and the ability to perform table optimization on busy tables with a minimal performance hit.
Fact: MySQL performance is in the asynchronous access pattern sector – it is awesome for scenarios where reading and writing trends are unrelated.

MySQL makes for a fantastic local cache for proxies, it makes for an awesome startup/exit storage medium with SQL-like access capabilities.

We run at around 2-10 transactions a second on our primary game database, and most of the time only one of those is a read. Elsewhere on the network, the CS&R database runs at a lot more transaction, but 99.97% of those transactions are simple reads.

On the stats box, MySQL runs at a peak combined cpu/disk usage of 28%; under the same conditions Postgres runs at 57% and services queries upto 40% slower. That said, the Postgres version of the database receives new stats injections 5-8% faster and with a high of 5% less cpu/disk overhead committing the incoming data.

So no, PostgreSQL wouldn’t help performance in general.

Unfortunately, this is particular feature we’re implementing is one case where I would be much happier with Oracle, Postgres or Microsoft at the back end – but in the light of the bigger picture it’s not worth switching to a database less suited to the overall operation for the sake of making a couple of queries more efficient.

You are probably right, and I try to be as pragmatic as possible in choosing the right language/os/dbms for the job.
Much of my aversion for MySql does not come from performance issues, but more from features that do not work like expected.
There was also some fuss over the innoDb engine that had me worried, but that seems to have settled down.

But you’re absolutely right, it’s about the right tool for the job, and don’t think you’ve chosen MySQL for the pretty dolphin. Hell, I’ve even used SQLite on occation.

That link kinda emphasizes the point – our database workload is low enough that we’re not even having to worry about using prepared statements until now. Likewise with stored procedures.

It’s basically the far end of the Oracle stick: Oracle is huge, drinks CPU, memory and disk, can take weeks or months to configure and is then performs like it’s running on a P3 800.

But all that bloat and baggage starts to pay off when you give it real work to do: when you’ve got 150 client processes generating 3000 queries a second and your average table-row counts are in the tens of millions.

If we were dealing with lots of interactive content for our community, then probably PostgreSQL – it does well at low- to mid-level asymmetrical access patterns it’s just not as good as MySQL at being a dedicated reader or writer. Meanwhile MySQL lags behind in features – that’s how MySQL gets its performance – the way command line versions of tools usually outperform the pants off GUI versions – but they are slowly getting the features in there.

Though – I still don’t understand why they were such pricks about completing the auto-reconnect support for prepared statements by retaining a copy of the query that was used…

If they were worried about an extra pointer per statement … big whoop. They could have had an (optional) map per MYSQL which listed the source for all the current STHs.

The “Binds” typedef is supposed to be MYSQL_BIND. For some reason, though, WordPress’s sourcecode formatter goes ahead and converts it to lowercase tho. (probably because its in angled brackets).

Yeah, the problem with Stored Procedures and pushing logic on the DB is that it becomes even more the bottleneck, and load become also less easier to “distribute”. It sure is nice and the benefit of having sql statements in one place and “compiled against the current scheme” are big pluses.

Not saying you should not use them, we do use them quite a bit, just don’t get too carried away ;). We have one project where they started encapsulating everything in procedures that called procedured which used views upon views upon tables… the end result is that the system is slow as a dog… now that it uses ~300 connections in 6 app hosts and on a dedicated DB server. The previous version was “not that fancy” and ran in a single shared host agains a shared db with less than 15 connections. Way to go! :)

I’m sure you won’t get that far, just trying to say one has to be careful when putting more eggs in the same basket.

PD: If code snippets could also be made a bit darker, the new theme would be easier on the eyes. I find the contrast between regular text and code snippets too sharp.

Actualy, I’ve worked on more than a few systems where all database access was through restricted views and procedures. Microsoft SQL is particularly good at handling scenarios like this – Oracle was a close second last time I looked at it seriously.

However it requires excellent DBAs and a good coder DBA human interface. Coders will want stored procedures that meet their exact needs but will use a view with a rowset list for all kinds of stuff – and that’s not the right pattern.

If you have really good DBAs, they will take authority for data presentation /and performance/ which means they will say to the coder “Ok, you want a procedure that will give you name, squad, name, rank and points and then you want a procedure that will give you name, squad and points. How often do you call each of these? Oh, so you call the one with more fields most of the time and the one with the lesser fields ‘in emergencies’? In that case I’ll give you one SP that returns all the fields and you can just skip the ones you don’t need”.

Also, programmers are usually the worst people to have write stored procedures or SP patterns because SP languages behave and perform entirely differently than regular code. Consider loops and cursors: Coders will write loops in an SP thinking the overhead will be lower, while in reality cursors are the native “loop” and in many scenarios the overhead of tearing up/down a cursor won’t factor in compared to the improved efficiency of using a cursor over a loop (since loops are often implemented by SQL in terms of a cursor, duh!)

And good DBAs track the performance of everything, its instinctive for them, so they will start their SPs with performance knowledge of the database and work out from there. Programmers will tend to develop object-oriented SPs and probably forget it’s not going to be compiled into executable code so nothing is going to come along and optimize out bad stuff.

Lastly – a good database engine is going to want money thrown at it. The bloat in Oracle makes it very good at being distributed allowing the DBAs to do some very cool stuff in terms of balancing loads – basically the work that people reinvent when they build centralized servers.

It sounds like the project took the right approach but didn’t do it quite right in the delivery. (Been there, doing, I mean done that)

Well, actually the project didn’t take the right approach, but not for using SPs and views and such, but as you hint by being developed without taking performance into consideration. 3 layers of SPs and 3-4 layers of views so everything is “theoretically sound” (they are not from the OOP crowd) and then, for example, receiving the requests, building XML with JavaScript on the server and then sending such XML to the SPs to be parsed in PLSQL so the SPs just deal with XML (they return XML) is not the most performant solution.

Using database users as logical users so you cannot make proper use of connection pools is something also that hurts performance.

I mean, each one of the techniques might have it’s place in some systems, but put them all together in an application that has to be performant and you are up for a nasty surprise. Not performing a single stress test helped as well :). They built the architecture from an academic point of view, and then reality came and showed them a few tricks.

Oracle does well on the database side, bloat and prize apart, but I’ve learnt not to trust them in anything else they touch. Application servers, JDBC drivers or connection pools… not on the same league :(.

Yeah, that’s been my experience too – it’s like certain hardware manufacturers and their “wtf did you hire a 3 year old to write these” device drivers :)

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: