Skip to content

Threading Model

joe maley edited this page Oct 15, 2020 · 15 revisions

Overview

TileDB allocates two thread pools per context: a compute thread pool and an IO thread pool. As the names suggest, the compute thread pool is used for executing compute-bound tasks while the IO thread pool is used for executing IO-bound tasks. By default, each thread pool has a concurrency level equal to the number of hardware threads. For example: on a system with 12 hardware threads, both thread pools will have a concurrency level of 12. The motivation for this configuration is to ensure that each hardware thread has one TileDB software thread with CPU-heavy work. We keep the TileDB software threads for IO-bound tasks in their own thread pool so that the programmer does not need to worry about overloading a single shared thread pool with IO-bound tasks that may block pending CPU-bound tasks from performing useful work while the scheduled IO-bound tasks are idle-waiting.

The default concurrency level for each thread pool can be modified with the following configuration parameters:
"sm.compute_concurrency_level"
"sm.io_concurrency_level"

We use the term "concurrency level" so that we have the flexibility to adjust the number of allocated software threads in our thread pool implementation independent of the user configuration. At the time of writing, a concurrency level of N allocates N-1 software threads.

Thread Pool Usage

Synchronous Usage

The following example illustrates typical synchronous usage. The code snippet does the following:

  • The calling thread creates a thread pool with a concurrency level of 2.
  • The calling thread executes two tasks on the thread pool.
  • The calling thread synchronously waits for both tasks to complete.
void foo() {
  ThreadPool tp;
  tp.init(2);

  std::vector<ThreadPool::Task> tasks;

  std::atomic<int> my_int;

  tasks.emplace_back(tp.execute([]() {
    my_int += 1;
  }));

  tasks.emplace_back(tp.execute([]() {
    my_int += 2;
  }));

  tp.wait_all(tasks);

  assert(my_int == 3);

  std::cout << "DONE" << std::endl;
}

The above snippet is valid for any concurrency level >= 1. By initializing the thread pool with a concurrency level of 2, the thread pool allows for up to two tasks may execute concurrently.

Asynchronous Usage

The following example illustrates typical asynchronous usage. The code snippet does the following:

  • The calling thread creates a thread pool with a concurrency level of 2.
  • The calling thread executes two tasks on the thread pool, where each task asynchronously invokes a callback bar.
  • The calling thread does not wait for the tasks to complete. Instead, the completed state is reached when the last callback is executed.
std::atomic<int> g_my_int;

void bar(int my_int) {
  int prev = g_my_int.fetch_add(my_int);
  if (prev + my_int == 3)
    std::cout << "DONE" << std::endl;
}

void foo() {
  ThreadPool tp;
  tp.init(2);

  tp.execute([]() {
    bar(1);
  });

  tp.execute([]() {
    bar(2);
  });

  // Assume `tp` remains in-scope until both tasks have completed.
}

Thread Pool Architecture

The ThreadPool class was designed to the following:

  • Prefer throughput to latency.
  • Optimize for synchronous queue-and-wait usage.
  • Allow for recursive usage of a single ThreadPool instance within its own tasks.
  • Allow for arbitrary, recursive nesting among different ThreadPool instances.

TODO

Parallel Routines

TODO

images/todo.png

Clone this wiki locally