Index64
Index64 is a concurrent key-value store for Intel architectures. It indexes key-value pairs on single and multi core environments using lock free algorithms and allowing simultaneous reads-writes. By 2014, Index64 is the only concurrent key-value store available for download. The underlying technology is based on specific Intel 64 CPU instructions.
History
In 2011, a first version was developed in order to respond to specific needs of a data-intensive program running localization features on a large scale. This localization software developed under iOS was finally never launched.
In 2012, the version 2.0 was released and distributed for free under the name "Data-Parallel". This version was enhanced with on-the-fly data-partitioning capabilities. Although well suited for massive batch operations, this technology was not versatile enough to meet most of the indexing needs.
In 2013, a new multithread approach has been implemented. The data structure of the index was altered to accept concurrent reads-writes. The key-value store has been renamed "Index64 version 3.0".
Features
Some features of the 3.0 Index64 key-value store are:
- Concurrent accesses. An application can update a directory while reading this directory to serve requests.
- Scalability. Any index size is possible.
- Any-core dynamic resizing. The number of cores used to perform bulk operations can be changed from one call to another; this is a way to adapt programs to varying loads.
- Memory management. Since allocation and deallocation are managed from within the key-value store, the application doesn't need to perform regular memory cleanup that could alter performance.
- Scan methods. Index64 allows forward and backward scanning among the sorted data.
- Non unique keys and composite keys. Index64 handles keys with multiple values and keys made up of several concatenated data.
Examples
Here is a trivial example demonstrating basic string indexing:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <index64mt.h>
class I64MtString index;
int main(int, char**, char**) {
// Set maximum strings length in char including trailing 0
index.setBytes(6);
// Index some null-terminated strings
const char* data[3] = { "zero", "one", "two", };
index.insert(data[0], &data[0]);
index.insert(data[1], &data[1]);
index.insert(data[2], &data[2]);
// Retrieve a value
tValue value;
if (index.select("one", value) == eReturnCodeOk)
printf("Key one has value %p", value);
else
printf("Key one is not indexed.");
}In the following example the integrated thread pool is used to make concurrent writes:
#include <index64mt.h>
class I64MtSigned32 index;
struct sData { int K; int Col1; int Col2; } Table[1000000];
// Build the index assuming that the table is loaded
void BuildIndex() {
// Start the integrated thread pool
index.bulkAllocateThreads(6);
// Engage the 6 threads in indexing the table
index.bulkInsert(0, 6, 1000000, &Table[0].K, sizeof(struct sData));
}AbOUT concurrent indexing
Increasing the number of working threads increases the number of operations executed at the same time, however some factors may affect the speed resulting from this parallelization. Let's tackle two of them, which cap the overall performance of concurrent indexing: the memory bandwidth and the tasks' interdependency.
Memory bandwidth
With a memory bandwidth of 10 GB/s for writings, the performance peak of index64 is reached with 8 threads. With more threads the performance stabilizes since the memory bandwidth is already fully used.
Because memory bandwidth has a huge impact on parallel computing, Intel's roadmap includes dramatic memory bandwidth improvements over each generation of processors. For example, the Intel Xeon Phi processor Knights Landing owns a 500 GB/s bandwidth integrated memory. That is a 3 times factor improvement over the previous generation of Xeon Phi processor Knights Corner. As a result, the performance of concurrent indexing should improve over the time.
Tasks interdependency
Parts of program that access to shared data have to be done sequentially to ensure the integrity of the shared data. These sequential parts define the critical path of performance that corresponds to the optimum execution time that could result from parallelization. This performance cap is a direct application of the Amdahl's law.
See also
- Concurrency
- Non-blocking algorithm
- NoSQL
- Compound key
- Memory bandwidth
- Xeon Phi
- Amdahl's law