Multi-Core and Embedded Software Webinar
- ️Tue Aug 20 2024
Multi-core and Embedded Software: Optimize Performance by Resolving Resource Contention
An on-demand Webinar from McObject.
Deployment on multi-core CPUs should make software faster. But processes running in parallel often contend for system software resources, actually reducing overall performance. Achieving multi-core’s promised linear performance gains hinges on resolving this issue.
This Webinar looks at two such conflicts — contention in updating a shared data store, and threads vying for access to the C/C++ memory manager — in-depth, and addresses them with techniques that can provide the basis to solve other multi-core resource conflicts.
McObject CEO Steve Graves explains the benefits and mechanics of the problems and solutions in the Webinar above.
Problems:
- Threads vying for the standard C runtime memory allocator
- Contention for shared data
Solutions covered in this Webinar:
- Thread-local Allocators
- Multi-version Concurrency Control (MVCC)
Problem: Threads vying for the standard C runtime memory allocator
Solution: Custom Per-thread Allocator
A custom memory manager, or thread local allocator, so that each allocator manages memory independent of the other.
- Based on a block allocator
- Similar to thread local storage
Dramatic performance improvements are obtained by replacing standard allocation mechanism with thread-local in multi-threaded, multi-core applications.
Benefits of being based on a block allocator:
- eXtremeDB is a hybrid storage DBMS based on an IMDS and the block allocator fits that well
- Eliminates fragmentation
- No CPU usage to coalesce free blocks as a list allocator does
- Less overhead that list allocators as all free holes are the same
- Block allocators are faster on allocations than list
Benefits of a thread-local allocator:
- Allocator creates and maintains a number of linked-lists (chains) of same-sized “small” blocks that are made out of “large” pages
- To allocate memory, the allocator simply “unlinks” the block from the appropriate chain and returns the pointer to the block
- When a new large pages is necessary, the allocator uses a general purpose memory manager (standard malloc) to allocate the page.
- As long as all objects are allocated and de-allocated locally (i.e. by the same thread,) this algorithm does not require any synchronization mechanism at all
If the objects are not local
- Pending-free Requests lists (PRLs) are maintained for each thread: when an object allocated in one thread is being de-allocated by another, the de-allocating thread links the object into this list.
- Access to PRLs is protected by mutex.
- Each thread periodically de-allocates its share of objects on the list all at once.
- The number of synchronizations requests is reduced significantly:
→ Often the object is freed by the same thread that allocated it
→ When the object is de-allocated by a different thread, it does not interfere with all other threads, but only with those that need to use the same PRL
Problem: Contention for shared data
Solution: Multi-version Concurrency Control (MVCC)
MVCC is an optimistic model in which no task or thread is ever blocked by another because each is given its own copy (version) of objects in the database to work with during a transaction. When a transaction is committed, its copy of objects it has modified replaces what is in the database. Because no explicit locks are ever required, and no task is ever blocked by another task with locks, MVCC can provide significantly faster aggregate performance and greater utilization of multiple CPUs/cores.
- MVCC has greater overhead, but the greater concurrency for read/write operations quickly outweighs the additional overhead, as proved in McObject tests. Please see results below.