Previous page | Next page | Contents | Home Obix pragmatics

Design for concurrency!

We speak about concurrency whenever a system can execute more than one task simultaneously. Concurrency can be achieved with hardware or software: by installing several processors in one machine or by swapping processor time between several software processes.

Concurrency is important because it can tremendously increase the performance, up to the point that a system not designed for concurrency could be useless. Today, concurrency is considered as an implicit and state-of-the-art user requirement. Nobody using a multi-tasking operating system like Windows, Linux or Unix, with several applications running at the same time, print and backup jobs executed in the background, and so on, would ever accept to step back and use a system without this comfort, like DOS.

Therefore, most software should be designed to be able to run in a multi-tasking environment. It is really important to keep this in mind from the beginning on, because transforming single-tasking software into multi-tasking is very, very hard in most cases. Concurrency problems can be extremely difficult to explore and repair. The reason is that they typically occur randomly, are dependent on environmental conditions (number of users, available resources, system configuration, platform, ...) and are therefore difficult to reproduce for debugging.

Let's take, for example, a kind of blackboard with different messages shared by different tasks. One message could be:

Next 'customer is king' meeting: 1.7.2003

Suppose now that the task responsible for updating this message wants to communicate the next date: 25.7.2003. It does this by first deleting the '1', then inserting the '2' and '5'. Hence, the message would pass the following states:

  1. Next 'customer is king' meeting: 1.7.2003
  2. Next 'customer is king' meeting: .7.2003
  3. Next 'customer is king' meeting: 2.7.2003
  4. Next 'customer is king' meeting: 25.7.2003

Clearly, a problem would arise if another task accesses the message at step 2 (invalid date) or 3 (incorrect day).

Another example: A list contains the history of accounting movements. Task 1 appends an operation by adding two lines, one credit and one debit amount. Task 2 scans the list in order to create a balance. Scanning is done after task 1 added the first line and before adding the second line. Oops! Our balance will be unbalanced, because only the first line has been included in the scanning! If this happens at production time, good luck for our company's reputation: our accounting package violates one of the most elementary rules of accounting.

While the above problems are easy to understand, they would most likely be difficult to be detected and solved in a real-world, big multi-user application. And their elimination can get extremely expensive. Not only because of the repairing costs (finding the problem's source, correcting the code, testing, deploying, updating), but also because of all indirect costs like our customer's damages and loss of trust in the correctness and reliability of the software.

From the above examples we can conclude that concurrency problems may arise if the following two conditions are fulfilled:

  1. the object's state changes
  2. the object is shared

Knowing the root of concurrency problems we can easily tell the possible solutions:

  1. the object's state doesn't change
  2. the object isn't shared
  3. the object's state changes, but it is shared in a synchronized way, so that concurrency problems are eliminated

Obix supports these solutions in the following ways.

1. Immutable objects

Immutable objects are objects whose state (their attribute values) is defined once at the object's creation. After creation they cannot change anymore. Any attempt to write an instruction that would change the state (e.g. an assignment to an attribute) would be refused by the compiler. Immutable objects are much easier to handle than mutable ones. They are inherently thread-safe and can therefore be shared freely, without any risk of concurrency problems. For that reason, objects in Obix are immutable by default.

Immutable objects are also more appropriate in situations where security plays an important role. Because immutable objects can never change their state, every involuntary or voluntary (hacker) attempt for modification is prohibited. For example, if passing an immutable object as an input argument to a command, we can be sure that the command will not change the object.

Unfortunately, immutable objects cannot always be used, because of two reasons.

First, every time the state changes, a new object has to be created, which can be too much time and space expensive. Consider, for example, a list containing all accounting movements. Every time a new operation had to be added, a new list would have to be created. Obviously, in a real application with thousands or millions of operations, all memory resources would quickly be consumed and the performance drop down to an unacceptable level. In such a case, we are forced to use a mutable object.

Secondly, immutable objects are inherently impossible in the case of mutual dependencies between two objects. Consider, for example, the following type:


type person
   attribute best_friend type:person
end type

Now, suppose that Albert's best friend is Isaac and Isaac's best friend is Albert. When creating the first object (Albert), the second one (Isaac) doesn't exist and therefore no reference of Isaac can be held in Albert. We can only refer Isaac to Albert after Isaac has been created.

Obix provides an immutable version as well as a mutable one for some types in the standard library. For example, in case of a string which changes constantly we can use the mutable version mutable_string instead of the default immutable one, string, in order to get better performance. But the rule is that the immutable version should be preferred, except in cases where efficiency or other constraints prohibit them.


2. Unshared objects

Encapsulation (also called data hiding) is one of the fundamental properties of object oriented programming, and is largely supported in Obix. The idea behind it is very simple: make every object as inaccessible as possible.

Several levels of accessibility can be defined.

For example, attributes defined in a type, factory or service can be public or private. If they are private they can only be accessed by the component itself. If they are public then read and write access for other components can be defined individually. The same principles are valid for commands and events.

The accessibility of variables and constants used in a script can be limited to source code sections.


3. Synchronized access to objects

Synchronized access means that while one task modifies an object, other tasks are unable to modify it, or even unable to read its data. For example, in the above case of accounting history, task 2 could not scan the list until task 1 had finished adding the two lines.

But synchronized access mechanisms working on one object are not sufficient. Synchronization must be able to be applied on several objects of different types. For example, saving a printed invoice in an accounting application would not only require adding lines to the table of accounting movements, but also saving the invoice for later reprinting as well as updating the customer's turnover. Synchronization over the whole update sequence is needed, in order to guarantee consistent data for other tasks. Moreover, modified objects could reside on different machines, so that synchronization between these machines would also be required. Finally, data integrity must be guaranteed in the case of an interrupted update sequence (commit/rollback). Such mechanisms are frequently called transaction processing. Obviously, transaction processing is not trivial to implement, but luckily it is supported by many database products.


Previous page | Next page | Contents | Home October 2004