Transactional memory (TM) offers the promise to simplify concurrent programming. Lock-based programming, the de-facto standard, is indeed hard to get right. Common problems with lock-based programming are deadlock, priority inversion, and lack of composition. Lack of composition refers to the fact that two structures correctly synchronized do not necessary compose to a new structure itself correctly synchronized. Transferring amount between two accounts is an example: even if the account is synchronized correctly, the transfer is not safe.
TM is however not a silver bullet neither. One major open point of TM implementations is how to ensure progress (e.g. no livelock) of the system. TM is indeed like optimistic locking. The problems of contention and starvation are still present. Non-blocking algorithm can offer different level of guarantee regarding the system-wide progress. Also depending on the particular implementation, one process may still read inconsistent state, which could lead to inappropriate decisions. Some TM use multi-version concurrency control to solve this issue.
As TM have become more and more mature, we see now a shift from distributed system area to the language area. How can TM be integrated into existing language nicely? What are the abstractions that are required?
Here are a few questions related to language support:
- Operational – what is the exact semantics of the TM? Does it provide snapshot isolation? What is the effect of exception with transactions? How can we facilitate the usage of blocking IO (normally prohibited) in transactions? How non-transacted and transacted code interoperate?.
- Transaction demarcations – How transactions can be controlled (e.g. start, stop, but also retry)? Are there other parameters to support (e.g. retry timeout)?
- State manipulation – How shared state is accessed and modified? What are the mechanisms and data structure?
State modification – conceptual views
I tend to think there is two ways to see shared state manipulation. One is the Clojure way, were you deal with immutable data but mutable references. Mutating shared data is always done through a reference, and also part of a transaction.
Another view, is the object database view (also the ORM view), were a transaction is a snapshot of the objects which can then be manipulated as usual.
Let’s consider a very simple example: a shared
map <String, Integer> is supposed to count the number of occurrence of a particular word. While indexing, document are split in smaller part and processed concurrently. The map needs then to be updated concurrently.
|immutable state + mutable ref||object|
State modification – contention
As mentioned already, TM is not silver bullet. We still need to think of the intent of our code from a multi-threaded perspective. In the example above, the complete map was considered as one piece of shared data. The concurrent modificatin of the occurence count for two different words would conflict and one of the transactions would be rolled back. To have finer grain control and reduce contention (Disjoint-access parallelism), we can consider the occurrence count not as a primitive, but as a reference to a counter. The map is altered only when a new word is inserted and a new counter starting at zero is created. Incrementing the occurrence of a word is a modification of the counter, which means that two counter can be incremented concurrently.
|immutable state + mutable ref||objects|
TM is indeed a promising way for concurrent programming. There are open points to watch, and more evidence is needed on the performance of TM-based system in real case. Understanding of the concurrency at the applicative level is still required even with TM. Poorly designed application may result in system with high contention even with TM. The transaction facilities are there but the application must still use them in a consistent way to implement its very own logic. Different conceptual views exist to access shared data with TM. Further experiment are needed to decide which one is better.