Mercury Currency Engine
|
Licensed under the Apache 2.0 License:
You may not use this file except in compliance with the License.
Software distributed under the License is distributed on an "AS IS" BASIS and WITHOUT WARRANTY, either expressed or implied.
High Level Cpp Coroutine Concurrency Summary
Summary of Concurrent Operations
Summary of Data Communication Operations
doxygen
documentation can be generated locally for this library. Install doxygen
and graphviz
packages then run doxygen
:
The generated doc/
directory should contain both html
and rtf
generated documentation. For html
, open index.html
by a browser to view the documentation.
C++ concurrency is too complicated. Many modern languages implement similar functionality through simpler and/or more expressive mechanisms, often executed at a comparable, or better, speed.
1) Data Communication: It is too hard to share data between different places in code, threaded or otherwise. This library implements channel mechanisms for communicating data in a simple, threadsafe, and templated manner.
2) Concurrency: It is too hard to set up an efficient and easy-to-use concurrency model where std::async
is insufficient. Similarly, coroutines are not being leveraged in most C++ applications as a generic way to improve code efficiency. This project implements coroutine mechanisms for concurrency that utilize both coroutines and a generic worker thread backend to efficiently execute tasks in parallel.
This library attempts to address the above concerns by providing a concurrency model that:
Definition from wikipedia:
Coroutines are very useful tools in concurrency because they are both lightweight and extremely fast when context switching. In fact, unit testing in this project reveals some edgecases which are greater than ~10x faster compared to threads (in some extreme edgecases, MUCH faster than that). These properties allow for efficient concurrent algorithms which are otherwise very difficult to write cleanly.
This library makes heavy use of coroutines (provided by Boost.Context
and Boost.Coroutine2
) to implement its features.
There are a few reasons to use this instead of other projects like Boost.Fiber, libdill, or c++20 coroutines.
First of all, each of the above frameworks has some very inconvenient drawbacks:
Boost.Fiber
is more of a "base library", intended to be used in other libraries. It requires a significant investment from the user to integrate and use it. It requires a much higher degree of knowledge to begin implementation, with a higher burden of knowledge required for general developers, due to not having a built in solution for blocking code nor a convenient mechanism for communicating between standard system threads and coroutines. It even requires the user to consider writing their own scheduling algorithms.
libdill
, a c
library, does not implement its framework with multithreading in mind, presupposing that programs would be singlethreaded with coroutines providing concurrency. It's documentation does suggest that multithreading is possible but does not provide much assistance in making it work. No doubt this is ideal in many programs which desire to write their code in c
. However, the lack of default multithreaded executors limits the usability and potential scope of the library for many applications (especially ones that don't want to write and debug their own!). Futhermore, because libdill
is a c
library, it eschews much of the convenience and simplicitly possible in a well designed c++
library due to language limitations.
Finally, c++20 coroutines
are... an intensely complicated topic. They are extremely fast and lightweight, but just as extremely low level. They are highly structured in a way that limits what can be done with them. Their requirements force any program that utilizes them to design their program around c++20 coroutines
at some level rather than simply integrating into existing code. Most of all, they are anything but easy to use. Frankly, I consider them, at least in their current form, a topic only approachable by c++
experts, and could not be easily utilized by a broader team.
In comparison, this library attempts to address the various issues from the other frameworks:
The mechanisms in this library are designed for simplicity and usability first, instead of as a framework which must provide maximum design flexibility. Mechanisms make use of c++
templates to simplify user boilerplate as much as possible.
Scheduling coroutines on backend worker threads is simple, fast, flexible, safe and setup by default. No extra work is required to efficiently schedule coroutine tasks in multithreaded pool of workers. Simply use mce::parallel()
(to prefer multi cpu-core scheduling) or mce::concurrent()
(to prefer scheduling on the current cpu if possible): example_009 source
Terminal output:
I have found that writing scheduling algorithms is always a dangerous undertaking. Having scheduling and backend worker thread executors setup by default with a simple API makes coroutines accessible to programmers.
mce::concurrent()
: launch a coroutine, prefering to be launched on the current thread if possible (caller is running on a worker thread) and a background worker thread otherwise. That is, the launched coroutine is concurrent but not guaranteed to be parallel, and typically has the shortest communication latency. This is the recommended default scheduling algorithm.mce::parallel()
: launch a coroutine distributed efficiently across 1 or more background worker threads based on scheduling load. The launched coroutine is concurrent and parallel.mce::balance()
: launch a coroutine distributed efficiently across 1 or more background worker threads based on scheduling load. This is a best effort algorithm which prefers to schedule on the calling thread for best-case communication latency, but will schedule on other threads if CPU workload becomes too imbalanced.As a generalized rule:
mce::concurrent()
)mce::parallel()
)mce::balance()
)It should be noted that in many or most cases the bottleneck is actually not the CPU throughput. In fact, program bottlenecks are not the real reason for many uses of threads, the user may be leveraging threads for asynchronous behavior in general. In those circumstances, best results will probably be found with mce::concurrent()
.
The user can manually create and manage the underlying mce::scheduler
objects to schedule and execute coroutines. If the user desires, they can even create custom mce::coroutine
objects to manually manage.
See summary of concurrent operations for documentation on all of the above features.
All concurrent mechanisms in mce
are designed to behave correctly whether they are being called from a standard thread (std::thread
, generally a pthread
under the hood) or being called from within a coroutine (mce::coroutine
, a managed boost::coroutines2::coroutine
under the hood). This is not a trivial feature, and not all competing c++
frameworks attempt it. In particular, none that I know of correctly synchronize between coroutines
and code running outside of coroutines
like this library does by default.
This makes it very easy to integrate this library into existing c++
code.
A std::thread
calling a mce::buffered_channel
's send()
will behave in a similar way as a mce::coroutine
object (executing inside a mce::scheduler
) calling send()
, even though their blocking mechanisms are completely different underneath.
List of objects and mechanisms which share this interoperability:
A variety of flexible and easy to use communication mechanisms are provided out of the box to handle 99% of communication needs within a single process:
One of the inherent pitfalls of coroutines is that their blocking behavior is managed by the program instead of by the operating system. This means you can cause unexpected deadlock and innefficiency when trying to call any function that blocks a worker thread where coroutines are running. Boost.Fiber
specifies the user should either implement non-blocking IO calls or to setup an asynchronous callback mechanism with Boost.Asio
to deal with this limitation.
I think this is far too much work to expect of the average programmer (OR an architect, who has many other important things to do) when dealing with something as complex as coroutine concurrency, with far too little gained in efficiency in the average case.
Instead, mce
provides a generic mce::await()
function which will provide an std::thread
and execute the blocking code there instead. When user function finishes, mce::await()
will resume the mce::coroutine
back to the original thread it was running on and return the value of asynchronously executed function.
That is, there's no need to have to manage futures and promises when dealing with mce::await()
calls, you just get the result when it completes: example_001 source
Terminal output:
It should be noted that in the above example the function passed to mce::await()
accesses values on the coroutine's stack by reference. This is totally safe because when mce::await()
is executing:
This means it's generally safe to read/write to memory in their stacks by reference. You can think of functions passed to mce::await()
as "executing on the stack" of the caller, even though they are actually executing somewhere else.
If mce::await()
is called outside of a coroutine running on a mce::scheduler
OR mce::await()
was called inside another call to mce::await()
, the argument Callable is executed immediately on the current thread instead of on another thread. Otherwise, the calling coroutine will be blocked an the operation passed to mce::await()
will be executed on another, dedicated thread inside another mce::scheduler
.
Calls (or implicit calls) to this_scheduler()
, this_scheduler_ref()
, this_threadpool()
, or this_threadpool_ref()
made within a call to mce::await()
will return the values they would have if they had been called outside of mce::await()
. That is, code running in mce::await()
will be able to schedule operations as if it was running in its original execution environment. IE, calls to mce::concurrent()
, mce::parallel()
, and mce::balance()
will function "normally".
A common feature in coroutine frameworks is the presence of ipv4/ipv6 support. Said frameworks want to provide a coroutine compatible API for said operations. IE, networking is, by design, a blocking task, and blocking tasks running in a coroutine scheduling environment can introduce latency or deadlock.
However, it should be pointed out that the real problem here is not that networking "functionality" creates problems for coroutines but the fact that networking features are blocking. As discussed in the previous section, blocking tasks can be passed to mce::await()
for execution where they won't affect scheduling of any other coroutines. Additionally, because communication channels work in both coroutines and standard threads, the task executing in mce::await()
(or any other std::thread
) can communicate back to running coroutines using said channels. This allows the user to use whatever standard networking implementation they desire and route the results back into the coroutine environment when they are available.
The main consideration the user may need to make when executing blocking networking operations in mce:await()
calls is the number of default await worker threads they want available by default. mce::await()
will spawn additional threads as necessary, but it tries to keep a preconfigured number of reusable worker threads in existence. If the user is consistently executing many blocking calls they can consider spawning more reusable worker threads to improve efficiency. This count of worker threads can be modified in this library's toplevel CMakeLists.txt
by modifying the MCEMINAWAITPROCS
variable.
This project is designed for a POSIX linux environment.
The following POSIX libraries are required:
This library directly utilizes the following boost libraries:
As well as the following boost header only libraries (and dependencies):
It should be noted that the boost features are automatically downloaded and built from source for this project, as necessary.
If you want to build unit tests you will need to checkout submodules for the googletest framework:
Building on linux:
Which produces the static library:
The library build make
targets will download and build boost
from source if the specified version of boost source code is not available. The version of boost can be configured by modifying the following variables in the CMakeLists.txt:
If the boost download url becomes outdated, consider either updating the MCE_BOOST_DOWNLOAD_URL
CMake
variable.
You can manually download and extract boost in the root of this repository to boost/boost_major_minor_patch
(replace major
, minor
, and patch
with appropriate version values and be sure to update the CMakeLists.txt
MCE_BOOST_MAJOR_VERSION
, MCE_BOOST_MINOR_VERSION
, and MCE_BOOST_PATCH_VERSION
variables).
Doing this will skip the boost
download step.
It is also possible to use linux symlinks to point to a prebuilt boost
version. After making the symlink, you will need to create a file called boost.built
in the project's root (with a command doing something like: touch boost.built
).
WARNING: If you use a prebuilt version of boost
, you MUST ensure that the following features are built:
You can do this by passing the following arguments to b2
when building boost
:
The following features can be configured ON
/OFF
enable or disable compile time defines which configure or disable default background threads:
The general design of a default compilation of this framework is to provide the following:
There will also be the main thread the process spawns on, so the total default threadcount after initialization is N+3.
In many cases, having a minimal count of background worker threads is ideal. This is because many (most?) programs are not bottlenecked by CPU throughput, and will benefit instead from minimal coroutine communication latency.
The mce_minimal
make
target is the equivalent of setting
Which limits the default threadpool to 1 worker thread. When heavy parallelization is not required, and the user only needs the efficiency gains and design simplicity of coroutines, this is a useful target to link against as this will enforce singlethreaded coroutine execution.
Building on linux
Which produces the static library:
Doing this will set the background executor to build with only 1 worker thread. Leaving all other options on their defaults, this results in your process launching with access to 4 default threads:
The 3 non-main threads will block and take no further resources until needed. This pattern is suprisingly useful, as the main thread can launch all other operations on the coroutine background worker thread using mce::concurrent()
. If interprocess communication is necessary, the main thread can begin listening for messages at this point, otherwise it can block.
A program designed for mce
multithreaded execution (IE, explicit calls to mce::parallel()
and mce::balance()
) will still function in this environment. In fact, assuming the user is not manually using custom mce::threadpool
objects, mce::parallel()
and mce::balance()
become nearly indistinguishable from mce::concurrent()
when operating with a single executor thread, they "Just Work".
It should be noted that scheduling exclusively with ccc::concurrent()
will provide similar behavior to compiling against mce_minimal
, so if your program already is written with singlethreaded concurrency in mind, it may be unecessary to change link targets (though doing so will save a small amount of memory).
To link against and use this library you will need to add the boost root (IE, boost/boost_1_85_0/
) and the mce
inc/
directory to your program's include flags.
Similarly you will need to instruct the linker about the boost lib and mce
project root directories to find the relevant libraries (like libmce.a
/libmce_minimal.a
).
Given some shell environment variable named MCE_ROOT
(which points to the mce
project's root directory) here is a trivial Makefile illustrating how to link against the dependencies when using this library:
You can then include #include <mce/mce.hpp>
(or any other header in inc/mce/
). Example yourprogram.cpp: example_009 source
Integrating any framework into an existing codebase is an art unto itself. However, this library is simpler to integrate compared to others of its kind because most code will function as normal when running in a coroutine or can be easily wrapped. For any code you wish to migrate to this framework, the following is suggested:
Based on your program requirements, specify values in the CMakeLists.txt
before building, if necessary:
MCEMAXPROCS
MCEMINAWAITPROCS
For all code that needs to be migrated to this library, complete the following checklist:
std::thread
s or pthread
s with mce::concurrent()
/mce::parallel()
etc.std::mutex
or pthread_mutex_t
with mce::mutex
std::condition_variable
or pthread_cond_t
with mce::condition_variable
mce::await()
mce::threadpool::suspend()
/mce::threadpool::resume()
/mce::threadpool::halt()
and/or mce::scheduler::run()
/mce::scheduler::suspend()
/mce::scheduler::resume()
/mce::scheduler::halt()
to properly pause/unpause/stop coroutine execution during process lifecycle changes. The default threadpool mce::default_threadpool()
only needs suspend()
/resume()
to be called (run()
/halt()
are called by the library in this case).If the user encounters deadlock, that is an indication that some blocking code was not properly addressed. Pay careful attention to the behavior of internal library code, and consider wrapping its blocking calls with mce::await()
. If some code proves particularly difficult to wrap with mce::await()
, consider leaving that code running on a separate system thread.
Assuming the user has relevant stress testing implemented the above should be sufficient to demonstrate categorical improvements to user code efficiency.
Here are other efficiency improvement steps that can be taken when integrating this library (ordered from most to least important):
mce::await()
calls are being made, rebuild this framework with MCEMINAWAITPROCS
set greater than 1
, as needed, so fewer threads need to be regularly spawned & destroyed.mce::timer()
/mce::default_timer_service()->timer()
mce::concurrent()
or mce::this_scheduler()->schedule()
to reduce communication latency introduced by unnecessarily running code on separate CPU cores.From repository toplevel
run unit tests:
Tests are written using gtest and therefore can filter which tests will actually execute using --gtest_filter="your regular expression here"
as an argument provided to mce_ut
.
From repository toplevel
run unit examples (replace NNN with specific example number):
Below is the code for a standalone program from example code included in ex/src
.
The python script continuous_integration.py
is to be run with python3 ./script/continuous_integration.py
. It builds and runs all enabled unit tests and examples, to generally verify everything is working. Running this may take a long time. It may also heavily tax your system's cpu resources.
Continuous integration tests require both gcc/g++
and clang/clang++
toolchains to be installed on the system in their normal locations (/usr/bin/
). Testing attempts to be fairly exhaustive.
If you are having issues with the code that is not solved by the help provided in the documentation:
High Level Cpp Coroutine Concurrency Summary
Summary of Concurrent Operations
Summary of Data Communication Operations
Then you are welcome to report an issue in the Issues tab. In your issue please include:
Contributions to this project are welcome, please read this primer for instructions on how to contribute