Multithreading in C++ for Quantitative Finance

by Clement D.
Multithreading in C++

Multithreading in C++ is one of those topics that every developer eventually runs into, whether they’re working in finance, gaming, or scientific computing. The language gives you raw primitives, but it also integrates with a whole ecosystem of libraries that scale from a few threads on your laptop to thousands of cores in a data center.

Choosing the right tool matters: what are the right libraries for your quantitative finance use case?

Multithreading in C++

1. Standard C++ Threads (Low-Level Control)

Since C++11, <thread>, <mutex>, and <future> are part of the standard. You manage threads directly, making it portable and dependency-free.

Example: Parallel computation of moving averages in a trading engine

#include <iostream>
#include <thread>
#include <vector>

void moving_average(const std::vector<double>& data, int start, int end) {
    for (int i = start; i < end; i++) {
        if (i >= 2) {
            double avg = (data[i] + data[i-1] + data[i-2]) / 3.0;
            std::cout << "Index " << i << " avg = " << avg << "\n";
        }
    }
}

int main() {
    std::vector<double> prices = {100,101,102,103,104,105,106,107};
    std::thread t1(moving_average, std::ref(prices), 0, 4);
    std::thread t2(moving_average, std::ref(prices), 4, prices.size());

    t1.join();
    t2.join();
}


2. Intel oneTBB (Task-Based Parallelism)

oneTBB (Threading Building Blocks) provides parallel loops, pipelines, and task graphs. Perfect for HPC or financial risk simulations.

Example: Monte Carlo option pricing

#include <tbb/parallel_for.h>
#include <vector>
#include <random>

int main() {
    const int N = 1'000'000;
    std::vector<double> results(N);

    std::mt19937 gen(42);
    std::normal_distribution<> dist(0, 1);

    tbb::parallel_for(0, N, [&](int i) {
        double z = dist(gen);
        results[i] = std::exp(-0.5 * z * z); // toy payoff
    });
}

3. OpenMP (Loop Parallelism for HPC)

OpenMP is widely used in scientific computing. You add pragmas, and the compiler generates parallel code.

#include <vector>
#include <omp.h>

int main() {
    const int N = 500;
    std::vector<std::vector<double>> A(N, std::vector<double>(N, 1));
    std::vector<std::vector<double>> B(N, std::vector<double>(N, 2));
    std::vector<std::vector<double>> C(N, std::vector<double>(N, 0));

    #pragma omp parallel for
    for (int i = 0; i < N; i++)
        for (int j = 0; j < N; j++)
            for (int k = 0; k < N; k++)
                C[i][j] += A[i][k] * B[k][j];
}

4. Boost.Asio (Async Networking and Thread Pools)

Boost.Asio is ideal for low-latency servers, networking, and I/O-heavy workloads (e.g. trading gateways).

#include <boost/asio.hpp>
using boost::asio::ip::tcp;

int main() {
    boost::asio::io_context io;
    tcp::acceptor acceptor(io, tcp::endpoint(tcp::v4(), 12345));

    std::function<void()> do_accept = [&]() {
        auto socket = std::make_shared<tcp::socket>(io);
        acceptor.async_accept(*socket, [&, socket](boost::system::error_code ec) {
            if (!ec) {
                boost::asio::async_read_until(*socket, boost::asio::dynamic_buffer(std::string()), '\n',
                    [socket](auto, auto) {
                        boost::asio::write(*socket, boost::asio::buffer("pong\n"));
                    });
            }
            do_accept();
        });
    };

    do_accept();
    io.run();
}


5. Parallel STL (<execution>)

C++17 added execution policies for standard algorithms. This makes parallelism easy.

#include <algorithm>
#include <execution>
#include <vector>

int main() {
    std::vector<int> trades = {5,1,9,3,2,8};
    std::sort(std::execution::par, trades.begin(), trades.end());
}



6. Conclusion

Multithreading in C++ offers many models, each fit for different workloads. Use std::thread for low-level control of system tasks. Adopt oneTBB or OpenMP for data-parallel HPC simulations. Leverage Boost.Asio for async networking and trading engines. Rely on CUDA/SYCL for GPU acceleration in Monte Carlo or ML. Enable Parallel STL (<execution>) for easy speed-ups in modern code. Try actor frameworks (CAF/HPX) for distributed, message-driven systems.

Compiler flags also make a big difference in multithreaded performance. Always build with -O3 -march=native (or /O2 in MSVC). Use -fopenmp or link to TBB scalable allocators when relevant. Prevent false sharing with alignas(64) and prefer thread_local scratchpads. Mark non-aliasing pointers with __restrict__ to help vectorization. Consider specialized allocators (jemalloc, TBB) in multi-threaded apps. Profile with -fsanitize=thread to catch race conditions early.

The key: match the concurrency model + compiler setup to your workload for maximum speed.

You may also like