FakeVector

Martini’s benchmarking seemed to be pointing a big hairy performance finger at some std::vector operations putting us in the need for a trivial array-based reference implementation of the vector to play with. Martini was set to go rewrite the surrounding code but I persuaded him to give me 3 minutes to rustle up a FakeVector template he could sub in for the STL version… Et voila…

// Copyright (C) Oliver ‘kfsone’ Smith 2009
// Permission to use or redistribute this code in part or in full
// as-is or modified so long as the original author retains credit. 

template
class FakeVector
{
public:
    FakeVector() : m_size(0) {}
    ~FakeVector() {}

    unsigned int capacity() const
{ return 32768 ; }
    unsigned int size() const
{ return m_size ; }

    void reserve(unsigned int newsize) const {}
    void resize(unsigned int newsize)
{ m_size = newsize ; }
    const Type& operator [](unsigned int position) const
{ return m_data[position] ; }
    Type& operator [](unsigned int position)
{ return m_data[position] ; }
    void clear() { m_size = 0 ; }

private:
    Type   m_data[32768] ;
    unsigned int m_size ;
} ;

It’s evil in many ways, but it provided a handy baseline for Martini’s test which seemed to be pointing the finger at a vector.resize() operation.

 

if ( vector.size() + newEntries >= capacity() )
  vector.reserve(vector.capacity() + 4096) ;
int oldSize = vector.size() ;
vector.resize(oldSize + newEntries) ;
memcpy(&vector[oldSize], &newData, totalDataSize) ;

In this usage, size is merely tracking how much of the space allocated by reserve() we have used. We tried to figure out what the MS-STL implementation of size was doing – if it was actually growing the array or if it was just incrementing the size property … but we got lost about 30 leaps in.

And when we benchmarked against my FakeVector, the difference was barely discernible. Which would seem to suggest something else is happening … perhaps, an instrumentation bug…

10 Comments

Don’t get me wrong – this isn’t supposed to be a std::vector alternative, it is a temporary benchmarking substitute. It doesn’t do bounds checking, and the actual storage size never changes. It could probably use begin and end operators but we didn’t need them for this case.

I wouldn’t suspect vector is the real culprit. Generally STL is pretty decent. One thing you can check is to set the capacity to some big number on initialization and see if it makes any performance difference,

Did you already tried to use instead of std::vector::resize & memcpy the std::vector::insert(pos, first, last) ?
I already heard a lot of problems about reserve and resize one behind the other.

Or could it be, that newEntries was sometimes bigger than 4096? Causing the resize to allocate even more memory than reserve just had done?

Just some ideas that you probably have already checked :)

Drave

Kfs1, if you are trying to incorporate NetCheck into the new Playgate replacement as twitter says…

Don’t forget to include your teultest functionality.

I doesn’t know if you are familiar with windows scripting host, but you can use WMI to check a lot of things about network using pre existing Class and attributes.

For example take a look into msdn about the classes:

Win32_PerfFormattedData_Tcpip_NetworkInterface

and

Win32_PerfFormattedData_Tcpip_TCP

You can take a lot of valuable info and doesn’t matter if the client, is WinXP, Vista or even Windows7… i use them in a box with Windows 7 and it works to retrieve network performance info.

Drave: yes, but as expected it’s less efficient. What we’re doing is taking either 54 or 60 floats (18 or 20 vectors) that describe an object and putting them into a stream (the vector) for dispatch to the GPU. And there are often thousands per stream, and this is done every frame.

What you’re describing would require

for ( object in objects )
{
if ( object.candidate() )
{
for ( v in vectors )
{
object.stream().push_back(v.x())
object.stream().push_back(v.y())
object.stream().push_back(v.z())
}
}
}

which is potentially 90,000 calls to push_back. Where push_back is implemented somewhat like:

template
vector::push_back(T& value)
{
this->resize( size() + 1, value ) ;
}

So we reduce the number of copy operations by either 53 or 60…

Ah! This is why you use memcpy, so the source is a POD array of Vectors with three float elements and nothing else?

But to be honest, the push_back is often implemented in a better way than this. At least in the Dinkumware Library it is ;)
And the compiler also inline it, so there are no more calls. But I still think, that the memcpy will perform better on those PODs than a push_back would and a range insert is impossible because of the different types.

Do you want a BOOST solution? j/k :D

Don’t you want to rename that class at least? Name it FixedBuffer or something like this :)

Drave

my head hurts after reading that, thanks!!!

Drave, it’s not a replacement for std::vector it isn’t intended to be a cool kick-ass new class. It’s purely a quick baseline benchmark that saves you having to swap out your vector code for static array code while optimizing/debugging. The STL code can be hard to read so optimizing it can be fiddly.

As for better classes – as I said, using the FakeVector showed virtually no improvements. As codist said, that eliminates the vector and should put the optimizing focus on the candidate selection/iteration/generation processes. My hunch is that something has broken the candidate code and too many objects are getting queued – especially since Martini had to crank up the size of the static array to 32768*3 floats…

I don’t understand the second code list, the one with memcpy. Is that your code or part of the vector implementation? It looks like what the std::vector should do when it runs out of capacity, except that the increase in capacity should be exponential to preserve the amortized constant insertion time. If you only increase capacity linearly, you get linear insertion time, I thought?

Yeah, the FakeVector is for benchmarking against a particular STL implementation tho — the Microsoft one /seems/ to be doing a bunch more but that’s just debug stuff that gets optimized out. Having a benchmark class made it trivial to diagnose and determine that the profiling code was picking up the wrong thing /or/ we were getting resource contention at the time of the vector operations.

Like I said – FakeVector is not a shit-hot new vector implemtnation – it’s purpose in life is to be trivial, not robust or useful, while saving you from having to replace any code. Once you’ve used it to figure out where your code is bottlenecking, you simply swap out
FakeVector
with
std::vector

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: