Time to get my head into [1.32] gear

Oorah: The production team have given me the go-ahead to do the core work on the cell host grid system that I wanted to do. I can’t tell you how giddy this makes me.

It should be utterly transparent to our players, so I want to keep the development time short.

To achieve that, my plan is to rigorously instrument the code and give it a complete set of unit tests. I want the code to have this self-QA when I first check it in.

I’m using my own Unit Test system. I’m not 100% satisfied with it: it’s simplicity results in Unit Tests that aren’t as clean as I’d like. So I recently bought a Unit Testing book for my Kindle and it sounds like what I developed is quite close to CUnit. But, then, I don’t like the look of CUnit either – I’d prefer something more … automated.

Then there’s the matter of what to do with the resulting data. At the moment, I handle all of my instrumentation data pretty much manually. I’d like to start putting it into some kind of testing system that could give producers/other devs an oversight of where code is at, and the ability to investigate data more easily.

Lastly… I really want to get the Mac build done with Intel C++, so that we’re all on the same page, and so we can start using some of the C++0x features: initializer lists, auto, static assertions, etc…

I thought that std::array sounded promising: std::vector<> has a lot of overhead, sometimes you want the convenience of working with a vector but you need the efficiency of a static array.

std::array is missing key functionality of a vector like insert, push_back etc.

I’m guessing the reason is that some of the functionality would be debatable:

// Array of 5 integers, fully populated.
std::array<int, 5> my5ints = { 1,  2, 3, 4, 5 } ;

// Find where ‘3’ is in the array.
auto it = std::find(my5ints.begin(), my5ints.end(), 3) ;

// Insert another number here — what happens to the value of ‘5’?
my5ints.insert(it, 10) ;

I guess I could inherit array and add my own routines like push_back, etc.

(Ooops, I just went off and created my own inplace_vector<type, size> for that very reason, hehe; it’s basically std::array with a size field added).


Great to hear! Do you expect, that you already can implement new features with it in 1.32 or will we need some more patches, until we as players see the results of those changes? :)

Only one correction about your post: std::vector has no more overhead than a dynamic array, if you use it right. It is a very old myth, that std::vector has a big overhead. It couldn’t get proven even once. The compiler will often inline nearly everything, when accessing the std::vector. You only need to use things like reserve, resize, constructors etc. wisely.
std::array has no push_back function, because it shall be a non-volatile C array with a C++ container interface. Also you get the possibility to create copies of the array in the normal C++ way of doing copies: operator =.

std::vector does have hidden intrinsic overheads. Firstly, there is the construction overhead:

The data is stored in “reftype* _M_instance”. That has to be allocated separately. So,

struct WithVector {
  int i ;
  vector<int> v ;
} ;
struct WithArray {
  int i ;
  int v[64] ;
} ;

int foo() {
  withVectorInstance wvi ;
    // Pops a wvi structure onto the stack and calls constructor,
    // which must then call the std::vector constructr,
    // which must call "_M_instance = new int[some-minimum-value];"
  withArrayInstance wai ;
    // Pops a wai onto the stack and calls constructor,
    // no additional allocations required.

The second issue is one of data coherence. In the above example, “wvi” is on the stack, however *_M_instance will be on the heap.

But consider if you do the following:

struct ChildObject {
  ... some data ... ;
  vector<stuff> v ;
  ... more data ... ;

struct AncestorObject {
  vector<ChildObject*> children ;
} ;

int earlyOn(AncestorObject& ao, size_t n) {
  ChildObject* co = new ChildObject[n] ;
  for ( size_t i = 0 ; i < n ; ++i ) {
    children.push_back(&co[i]) ;

int atALaterPoint1(ChildObject& co, stuff& bar) {
  co.v.push_back(bar) ;

int atALaterPoint2(ChildObject& co) {
  co.v.pop_back() ;

The children are in one nice, coherent block of memory, so iterating over them later will be cache-friendly. However, the “v”s will not be coherent, instead it will probably be fragmented and not in a nice segmented order.

Further still, the “v”s will continue to fragment over time whenever they resize. You can avoid this issue by ensuring that you construct them with a good default size, but people tend to try and avoid playing with reserve/resize and the behavior of the default constructor is not well observed: GCC and CLang will treat

std::vector foo(5);

as a request to reserve 5 entries, while other compilers including MSVC will see it as a request to resize, meaning you need to do:

    : v(5)
      v.clear() ;


  auto co = new ChildObject ;

is two calls to new (and thus alloc), not one.

It would be really bad if the compiler amortized these into one…

Wait, let me make that clear: I said it is the same like a dynamic array.
But I think I did missunderstand you. Because you were talking about array and vector, I though you want such a container. But what you really want, seems to be a static size buffer? That isn’t an array or vector. At least not in my terminology :)

Hehe, ok: What I want is a fixed-capacity, dynamic-content std::container-compatible storage. This is:

const size_t capacity ;
mutable size_t size ;
iterator begin() ;
iterator end() ;
void push_back(element) throws std::domain_error() ;
void pop_back(element) ;

I’m tempted to add a

move_to_end_and_erase(iterator pos)

For implementing the tail-swap trick for efficiently deleting vector entries.

I rather suggest:

template<typename C>
void tail_swap_erase(C& container, typename C::iterator iter)
  using std::swap;

  swap(*iter, container.back());

Now you can use it with every container you want.

May I ask, why you made size mutable?

I didn’t – I’m just saying that it can be changed :)

Also, the tail swap erase is more efficient when implemented inside the container :)

– Oliver

Why should it be more efficient when implemented inside the container? What is iterator? typedef of a pointer. What does the function back do? Gives back the last element (one line of code!). What does pop_back do? Decrease size and destroys (calls the destructor) the last element (two lines of code!). With full optimization a compiler should inline all those function calls.
And at the end the question arise, how much would you even gain, when the compiler wouldn’t optimize the code, compared to the situation that you will need to write a separate function for every container you use. And if you want to use tail swap erase technic on a std::vector, you end with a free function again and have a non-consistent interface …

I would say a typical case of PMO :)

Actually, gcc does interpret “std::vector foo(5);” as creating a vector with 5 elements (like the standard says), not just with 5 reserved places. At least that’s what my 4.1.2 just did when I tried it. I’d hope that something that simple would be standards compliant!

I’m also wondering why the tail swap erase would be more efficient inside the container. The only difference would seem to be that instead of “C& container”, you have the implicit “C* this”.

Drave: The only other container you could use it with is std::vector. “How much would you gain”.

      // Fast, unordered erasure.
      // Invalidates iterators.
      swap_to_back_and_pop(iterator pos)
        // You can't erase 'end'
        if ( pos == end() ) return ;
        // ASSUMPTION: You have an iterator that isn't end so size must be > 0
        --_M_size ;
        // If you were deleting end - 1, it will now be end and the swap
        // would no-longer be necessary.
        if ( pos != end() )
          *pos = *end() ;

With both ICC and GCC I found the resulting assembler was significantly more efficient. And, uhm, you may or may not have noticed I tend to need every last clock cycle…

Lutorm: Maybe I had that the wrong way around, I just know that when we tried using the foo(5) method in some of our code we kept running into inconsistencies between platforms/builds. Maybe I haven’t re-tested it since pre-4 GCC.

You could use it with any container you create. And from the std container, at least with std::deque it would also be helpful.
You need every last clock cycle … hummm … and you already optimized everywhere else, so you’re left to this container getting the last single clock cycle out of it?

Well, I don’t know the context of this optimisation. But I made a lot of bad experiences, when people tried to optimise small areas of code and also favored code, that was optimised to the last clock cycle, over maintainability. At the end they ofteh had a lot of more problems (not only bugs) and the code wasn’t really faster, because they only otpimised small areas of code and forgot, that a compiler can also optimise over large areas.

I think you need to rewind a little there. If performance wasn’t an issue, then I wouldn’t be looking away from std::vector.

Switching my function from a friend to a member reduced the cost of the particular loop I’m trying to improve by 81 microseconds per second (on a 3Ghz CPU). And switching from std::vector to my in-place vector freed nearly 20 milliseconds per second.

And given that these loops run continuously, every microsecond counts.

– Oliver

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: