FakeVector

kfsone · March 12, 2009 · Coding · benchmarking, c++, performance, stl · 10 Comments

Martini’s benchmarking seemed to be pointing a big hairy performance finger at some std::vector operations putting us in the need for a trivial array-based reference implementation of the vector to play with. Martini was set to go rewrite the surrounding code but I persuaded him to give me 3 minutes to rustle up a FakeVector template he could sub in for the STL version… Et voila…

// Copyright (C) Oliver ‘kfsone’ Smith 2009
// Permission to use or redistribute this code in part or in full
// as-is or modified so long as the original author retains credit.

template
class FakeVector
{
public:
FakeVector() : m_size(0) {}
~FakeVector() {}

unsigned int capacity() const
{ return 32768 ; }
unsigned int size() const
{ return m_size ; }

    void reserve(unsigned int newsize) const {}
    void resize(unsigned int newsize)
{ m_size = newsize ; }
    const Type& operator [](unsigned int position) const
{ return m_data[position] ; }
    Type& operator [](unsigned int position)
{ return m_data[position] ; }
    void clear() { m_size = 0 ; }

private:
Type m_data[32768] ;
unsigned int m_size ;
} ;

It’s evil in many ways, but it provided a handy baseline for Martini’s test which seemed to be pointing the finger at a vector.resize() operation.

if ( vector.size() + newEntries >= capacity() )
vector.reserve(vector.capacity() + 4096) ;
int oldSize = vector.size() ;
vector.resize(oldSize + newEntries) ;
memcpy(&vector[oldSize], &newData, totalDataSize) ;

In this usage, size is merely tracking how much of the space allocated by reserve() we have used. We tried to figure out what the MS-STL implementation of size was doing – if it was actually growing the array or if it was just incrementing the size property … but we got lost about 30 leaps in.

And when we benchmarked against my FakeVector, the difference was barely discernible. Which would seem to suggest something else is happening … perhaps, an instrumentation bug…

10 Comments

kfsone March 12, 2009 at 5:28 pm

Don’t get me wrong – this isn’t supposed to be a std::vector alternative, it is a temporary benchmarking substitute. It doesn’t do bounds checking, and the actual storage size never changes. It could probably use begin and end operators but we didn’t need them for this case.

codist March 12, 2009 at 8:16 pm

I wouldn’t suspect vector is the real culprit. Generally STL is pretty decent. One thing you can check is to set the capacity to some big number on initialization and see if it makes any performance difference,

Drave March 13, 2009 at 7:47 am

Did you already tried to use instead of std::vector::resize & memcpy the std::vector::insert(pos, first, last) ?
I already heard a lot of problems about reserve and resize one behind the other.

Or could it be, that newEntries was sometimes bigger than 4096? Causing the resize to allocate even more memory than reserve just had done?

Just some ideas that you probably have already checked :)

Drave

cid250 March 13, 2009 at 3:19 pm

Kfs1, if you are trying to incorporate NetCheck into the new Playgate replacement as twitter says…

Don’t forget to include your teultest functionality.

I doesn’t know if you are familiar with windows scripting host, but you can use WMI to check a lot of things about network using pre existing Class and attributes.

For example take a look into msdn about the classes:

Win32_PerfFormattedData_Tcpip_NetworkInterface

and

Win32_PerfFormattedData_Tcpip_TCP

You can take a lot of valuable info and doesn’t matter if the client, is WinXP, Vista or even Windows7… i use them in a box with Windows 7 and it works to retrieve network performance info.

kfsone March 13, 2009 at 4:14 pm

Drave: yes, but as expected it’s less efficient. What we’re doing is taking either 54 or 60 floats (18 or 20 vectors) that describe an object and putting them into a stream (the vector) for dispatch to the GPU. And there are often thousands per stream, and this is done every frame.

What you’re describing would require

for ( object in objects )
{
if ( object.candidate() )
{
for ( v in vectors )
{
object.stream().push_back(v.x())
object.stream().push_back(v.y())
object.stream().push_back(v.z())
}
}
}

which is potentially 90,000 calls to push_back. Where push_back is implemented somewhat like:

template
vector::push_back(T& value)
{
this->resize( size() + 1, value ) ;
}

So we reduce the number of copy operations by either 53 or 60…

Drave March 13, 2009 at 8:23 pm

Ah! This is why you use memcpy, so the source is a POD array of Vectors with three float elements and nothing else?

But to be honest, the push_back is often implemented in a better way than this. At least in the Dinkumware Library it is ;)
And the compiler also inline it, so there are no more calls. But I still think, that the memcpy will perform better on those PODs than a push_back would and a range insert is impossible because of the different types.

Do you want a BOOST solution? j/k :D

Don’t you want to rename that class at least? Name it FixedBuffer or something like this :)

Drave

rebel357 March 14, 2009 at 1:18 pm

my head hurts after reading that, thanks!!!

kfsone March 14, 2009 at 5:40 pm

Drave, it’s not a replacement for std::vector it isn’t intended to be a cool kick-ass new class. It’s purely a quick baseline benchmark that saves you having to swap out your vector code for static array code while optimizing/debugging. The STL code can be hard to read so optimizing it can be fiddly.

As for better classes – as I said, using the FakeVector showed virtually no improvements. As codist said, that eliminates the vector and should put the optimizing focus on the candidate selection/iteration/generation processes. My hunch is that something has broken the candidate code and too many objects are getting queued – especially since Martini had to crank up the size of the static array to 32768*3 floats…

lutorm March 16, 2009 at 2:55 pm

I don’t understand the second code list, the one with memcpy. Is that your code or part of the vector implementation? It looks like what the std::vector should do when it runs out of capacity, except that the increase in capacity should be exponential to preserve the amortized constant insertion time. If you only increase capacity linearly, you get linear insertion time, I thought?

kfsone March 16, 2009 at 10:23 pm

Yeah, the FakeVector is for benchmarking against a particular STL implementation tho — the Microsoft one /seems/ to be doing a bunch more but that’s just debug stuff that gets optimized out. Having a benchmark class made it trivial to diagnose and determine that the profiling code was picking up the wrong thing /or/ we were getting resource contention at the time of the vector operations.

Like I said – FakeVector is not a shit-hot new vector implemtnation – it’s purpose in life is to be trivial, not robust or useful, while saving you from having to replace any code. Once you’ve used it to figure out where your code is bottlenecking, you simply swap out
FakeVector
with
std::vector

Name and email address are required. Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong>

	John Taylor on Spread wide before installing…
	Pro-tip: Write Pytho… on Python, the slow of the i…
	Setsuna on C++0x lambdas suck
	Jonathan Witty on Spread wide before installing…
	Jonathan Witty on Spread wide before installing…
	kfsone on Vested Interests
	fsdfsdfsd on C++0x lambdas suck

FakeVector

10 Comments

Leave a comment

Categories

Recent Posts

Recent Comments

Top Posts

Social

Archives

Blog Stats

Meta

FakeVector

Share this:

Related

10 Comments

Leave a comment

Categories

Recent Posts

Recent Comments

Top Posts

Social

Archives

Blog Stats

Meta