mcobject.com

Multi-Core and Embedded Software Webinar

️Tue Aug 20 2024

Multi-core and Embedded Software: Optimize Performance by Resolving Resource Contention

An on-demand Webinar from McObject.

Deployment on multi-core CPUs should make software faster. But processes running in parallel often contend for system software resources, actually reducing overall performance. Achieving multi-core’s promised linear performance gains hinges on resolving this issue.

This Webinar looks at two such conflicts — contention in updating a shared data store, and threads vying for access to the C/C++ memory manager — in-depth, and addresses them with techniques that can provide the basis to solve other multi-core resource conflicts.

McObject CEO Steve Graves explains the benefits and mechanics of the problems and solutions in the Webinar above.

Problems:

Threads vying for the standard C runtime memory allocator
Contention for shared data

Solutions covered in this Webinar:

Thread-local Allocators
Multi-version Concurrency Control (MVCC)

Problem: Threads vying for the standard C runtime memory allocator

Solution: Custom Per-thread Allocator

A custom memory manager, or thread local allocator, so that each allocator manages memory independent of the other.

Based on a block allocator
Similar to thread local storage

Dramatic performance improvements are obtained by replacing standard allocation mechanism with thread-local in multi-threaded, multi-core applications.

Benefits of being based on a block allocator:

eXtremeDB is a hybrid storage DBMS based on an IMDS and the block allocator fits that well
Eliminates fragmentation
No CPU usage to coalesce free blocks as a list allocator does
Less overhead that list allocators as all free holes are the same
Block allocators are faster on allocations than list

Benefits of a thread-local allocator:

Allocator creates and maintains a number of linked-lists (chains) of same-sized “small” blocks that are made out of “large” pages
To allocate memory, the allocator simply “unlinks” the block from the appropriate chain and returns the pointer to the block
When a new large pages is necessary, the allocator uses a general purpose memory manager (standard malloc) to allocate the page.
As long as all objects are allocated and de-allocated locally (i.e. by the same thread,) this algorithm does not require any synchronization mechanism at all

If the objects are not local

Pending-free Requests lists (PRLs) are maintained for each thread: when an object allocated in one thread is being de-allocated by another, the de-allocating thread links the object into this list.
Access to PRLs is protected by mutex.
Each thread periodically de-allocates its share of objects on the list all at once.
The number of synchronizations requests is reduced significantly:

→ Often the object is freed by the same thread that allocated it

→ When the object is de-allocated by a different thread, it does not interfere with all other threads, but only with those that need to use the same PRL

Problem: Contention for shared data

Solution: Multi-version Concurrency Control (MVCC)

MVCC is an optimistic model in which no task or thread is ever blocked by another because each is given its own copy (version) of objects in the database to work with during a transaction. When a transaction is committed, its copy of objects it has modified replaces what is in the database. Because no explicit locks are ever required, and no task is ever blocked by another task with locks, MVCC can provide significantly faster aggregate performance and greater utilization of multiple CPUs/cores.

MVCC has greater overhead, but the greater concurrency for read/write operations quickly outweighs the additional overhead, as proved in McObject tests. Please see results below.

Call us at +1-425-888-8505, or send us an email.