QList vs QSet - containment testing with QBENCHMARK

Sources: Qt5 compatible

benchmark/testcontainer.h
benchmark/testcontainer.cpp
benchmark/main.cpp
benchmark/qbenchmark_example.pro

This article compares QSet and QList for the method contains(). It examines the performance of contains() in non-optimized debug mode for different sizes of containers, namely different orders of magnitude of the container sizes (1, 10, 100, 1000, etc.). The performance is examined with the help of the QBENCHMARK macro from QTestLib.

Expected behavior

We expect QList to have a O(n) complexity for containment testing and QSet potential O(1) complexity for containment testing.

Searching through an unsorted sequential container such as QList requires examing every single element and results in a runtime complexity O(n). On the other hand, QSet is a hash-table based container. Unlike sequential containers, sets do not keep elements in the order they were inserted. This has some advantages. For example, containment testing is faster for sets with potential O(1) complexity, as we do not have to perform a linear search.

Sets have this capability thanks to a technique called hashing. A hash function computes an integer value (hash code) from an object. A hash function does not necessarily produce a unique hash code for every object. Normally, it maps a key to a bucket that may contain one or more elements (the number of these buckets can be exposed with QSet::capacity()). The container then iterates over the contents of the bucket until it finds a match. In the best case, with either no collisions and all buckets either empty or with a single element, checking for containment takes constant O(1) time.

Note: The choice of the sequential container (QList, QVector or QLinkedList) will have little effect when compared to QSet for containment testing for the sizes of containers investigated in this article. Nevertheless, we should mention, that QList stores the pointers to the items rather than the items themselves. It is the QVector that stores its items consecutively in memory. You can read more about this in this Qt Quarterly - Inside the Qt 4 Containers article. You are also welcome to see for yourself and change the type of container in the test.

Testing with QBENCHMARK

We are going to measure the performance of our code with QBENCHMARK, which is an extension to the QTestLib framework. To measure the performance of code all we need to do is to place it into curly brackets preceded by the macro QBENCHMARK. The performance can be measured based on walltime (default), CPU tick counter, valgrind/callgrind or an event counter. In this example, we use the default method based on walltime.

QBENCHMARK { <code here> }

Our aim is to identify the size of the container for which containment testing could affect the performance of an application in such a way that it is noticeable to the user. To this end, we create containers with sizes that represent different orders of magnitude such as 1, 10, 100, 1000, 10000, 100000 and 1000000.

Note: To use QTestLib, we have to place the following line into our project file:
QT += testlib

Implementation specifics

In order to perform the test, we create a class called TestContainer. The test itself will be executed by the private slot testContainment(). We perform the test with the use of a data set, which is nothing else than a table of container types (QSet, QList) and the different sizes of containers (1, 10, 100, 1000, 10000, 100000 and 1000000) that we would like to test. The data set is created by the method testContainment_data(). The data method has to have the same name as the test method appended by '_data()'. We also introduce an enum ContainerType to distinguish among different container types.

class TestContainer : public QObject
{
    Q_OBJECT
public:
    enum ContainerType {
        Container_Set,
        Container_List
    };

private slots:
    void testContainment_data();
    void testContainment();
};

Q_DECLARE_METATYPE(TestContainer::ContainerType)

Now let's look at our data method testContainment_data() that constructs the test table. We use the method addColumn() to add a column for the container type and the container size. The for loop creates the individual test rows with QTest::newRow(). The name of the test row, i.e. the argument in the parentheses is created from the name of the container and the size of the container.

void TestContainer::testContainment_data(){
    QTest::addColumn("type");
    QTest::addColumn("size");

    for (int size = 1; size <= 1000000; size *= 10) {

        const QByteArray sizeString = QByteArray::number(size);

        QTest::newRow(QByteArray("QSet" + sizeString).constData())
                << Container_Set << size;
        QTest::newRow(QByteArray("QList" + sizeString).constData())
                << Container_List << size;
    }
}

The data method pretty much creates the table below:

	type	size
QSet1	Container_Set	1
QList1	Container_List	1
QSet10	Container_Set	10
QList10	Container_List	10
QSet100	Container_Set	100
QList100	Container_List	100
...	...	...
QSet1000000	Container_Set	1000000
QList1000000	Container_List	1000000

The test method itself, i.e. testContainment() can be seen below. The QFETCH macro creates a local variable 'ContainerType type' and 'int size'. These local variables get populated with the data from the test table located within the testContainment_data(). Once we have the data, we call the helper function testContains().

void TestContainer::testContainment() {
    QFETCH(ContainerType, type);
    QFETCH(int, size);

    if (type == Container_Set)
        testContains<QSet<int> >(size);
    else
        testContains<QList<int> >(size);
}

The role of testContains() is to populate the container (QList or QSet) with data. Since we want the function to work with any container, we made testContains() a template function. The template function is both declared and defined in the header file.

Once the container is populated with data, we perform the actual test with the macro QBENCHMARK which tests containment of a single value. The integer value that we check is an arbitrary negative value. Since we populated the container with non-negative values only, we can be sure that the negative value won't be present in the container. This represents the worst-case scenario for QList when the entire QList will have to be searched. The assertion that comes after the QBENCHMARK closing braces is there to make sure that 'val' is actually used and not optimized away by the compiler.

const int negValue = -1;

template <typename T>
void testContains(int size)
{
    T container;

    for (int i = 0; i < size; ++i)
        container << i;

    bool val;

    QBENCHMARK {
            val = container.contains(negValue);
    }

    Q_ASSERT(val == false);
}

We run the test from main.cpp with QTest::qExec():

int main(int argc, char *argv[])
{
    TestContainer cTest;
    return QTest::qExec(&cTest, argc, argv);
}

The test produces the following result:

********* Start testing of TestContainer ********* Config: Using QTest library 4.8.0, Qt 4.8.0 PASS : TestContainer::initTestCase() RESULT : TestContainer::testContainment():"QSet1": 0.000038 msecs per iteration (total: 81, iterations: 2097152) RESULT : TestContainer::testContainment():"QList1": 0.000040 msecs per iteration (total: 84, iterations: 2097152) RESULT : TestContainer::testContainment():"QSet10": 0.000038 msecs per iteration (total: 81, iterations: 2097152) RESULT : TestContainer::testContainment():"QList10": 0.000086 msecs per iteration (total: 91, iterations: 1048576) RESULT : TestContainer::testContainment():"QSet100": 0.000049 msecs per iteration (total: 52, iterations: 1048576) RESULT : TestContainer::testContainment():"QList100": 0.00055 msecs per iteration (total: 73, iterations: 131072) RESULT : TestContainer::testContainment():"QSet1000": 0.000049 msecs per iteration (total: 52, iterations: 1048576) RESULT : TestContainer::testContainment():"QList1000": 0.0051 msecs per iteration (total: 84, iterations: 16384) RESULT : TestContainer::testContainment():"QSet10000": 0.000049 msecs per iteration (total: 52, iterations: 1048576) RESULT : TestContainer::testContainment():"QList10000": 0.050 msecs per iteration (total: 52, iterations: 1024) RESULT : TestContainer::testContainment():"QSet100000": 0.000049 msecs per iteration (total: 52, iterations: 1048576) RESULT : TestContainer::testContainment():"QList100000": 0.50 msecs per iteration (total: 65, iterations: 128) RESULT : TestContainer::testContainment():"QSet1000000": 0.000038 msecs per iteration (total: 81, iterations: 2097152) RESULT : TestContainer::testContainment():"QList1000000": 5.1 msecs per iteration (total: 83, iterations: 16) PASS : TestContainer::testContainment() PASS : TestContainer::cleanupTestCase() Totals: 3 passed, 0 failed, 0 skipped ********* Finished testing of TestContainer *********

If we summarize the results in a table, we can see that the QSet has O(1) complexity for containment testing, in other words, the performance of the contains() method is independent of the number of entries in the QSet. On the other hand, QList has O(n) complexity for containment testing, thus the time is linearly growing with the number of elements in the list.

Container size	QSet - walltime [ms]	QList - walltime [ms]
1	0.000038	0.000040
10	0.000038	0.000086
100	0.000049	0.00055
1000	0.000049	0.0051
10000	0.000049	0.050
100000	0.000049	0.50
1000000	0.000038	5.1

Conclusion: To put the results into perspective - in non-optimized Debug mode, checking whether a single integer value is present in a QList container of 1,000,000 entries might take up to around 5 ms, which means that checking 200 integer values in the list of the same size might take up to 1 second! The same check for QSet would take 200*0.000038 ms = 0.0076 ms, i.e. 0.0000076s.

Tagged: Qt