Object Manager

Description

The object manager module is responsible for keeping a consistent and distributed data partition. Availability is not always possible (see CAP Theorem), but through node/data locality organization, developers can place data at the same location where it is most commonly needed.

Architecture

The object manager module attaches to application nodes, as described in RCAT's application layer. Every application instance should be attached to a local object manager module (OM). Each OM is connected to other OMs, located at the different application nodes across RCAT. The OMs exchanges data in the form objects.

Objects are UUID identified "block" of data, logically identified with an object type. Each object is unique and can only exist (authoritatively, caching is allowed) in one location. This way consistency and distribution is always achieved. To achieve availability, the object would have to be accessed locally (i.e. the application node requests a data that is in the local OM). Naturally that will not always be possible, and the roundtrip to fetch the data from the remote OM node would be necessary.

Concerns and Solutions

For the Object Manager to function properly, it needs to maintain perfect consistency of where each object is currently located and be scalable. For good performance, it should be implemented in a way that requires little or no coordination between nodes (for high availability).

Solution 1: (Currently used, as of May 2013) Use a centralized service to keep track of object locations. That service can be a full fledged software that indexes object locations, or just a database that stores objects locations.

Problem: Service requires replication and may become overloaded, due to its centralized nature. Can be reduced by a good caching policy of object location on the data node.

Solution 2: Broadcast/Multicast requests. In this solution, nodes request the objects they wish to read/modify/own through a message in multicast channel. Multicast channels can be created based on object types. That way, different data nodes could own different types of objects, if it serves the application well.

Problem 1: Multicast is essentially broadcast, and could overwhelm the network with requests. Can be managed by using multicast only for object location, and unicast for requests of read/write/transfer.

Problem 2: Harder to maintain consistency. If two different objects requests transfer simultaneously, it might not be easy guaranteeing data consistency and correct location.

Another issue is transactions. Current OM does not support transactional operations (read and swap, compare and add, etc).

Related Work

Two important works provide good inspiration for an OM protocol: the Darkstar and Sinfonia projects. Both projects are mostly concerned with consistency, atomicity, and data integrity. RCAT's current object manager (May 2013) provides only single operations, complex transactions (read and swap for instance) are still not possible atomically. Merging and adapting ideas from both could provide a protocol that will work under heavy load, and provide the capability of node/data locality to the developer, without loss of consistency.

Darkstar

Darkstar is a meta service that controls database transactions based on unique object ids. Applications submit "tasks" to the darkstar meta service, that attempts to retrieve the data from the database, operate the transaction, and release its lock of the data. If conflicts occur, Darkstar forces one of the tasks to fail and the other carries on. The failed task reruns after a certain of amount time.

Darkstar is highly dependent of the database. If two nodes compete for an object, Darkstar does not provide a clear way to increase application to data locality, caching objects locally. RCAT provides user migration (meaning increase application to data locality), but Darkstar still does not provide a clear migration of object patterns.This centralized approach could greatly increase availability under heavy load.

To provide ordering, Darkstar provides total order based on user events. The order of events of an user will always be followed. Yet there is no guarantee of order between users. This could be an issue, and must be taken care of at the application level.

"In the worst case, multiple lockers of the same objects will cause potential deadlock situations that while RDS will detect and break are computationally more expensive to resolve and can result in processing delays (additional latency)."

"Although the RedDwarf will detect and resolve contentions for write access to objects between parallel events such that your code never actually stops processing, this deadlock avoidance can significantly load the CPU and slow response time to the users. In the worst case, contention can actually prevent parallel processing, seriously impacting the scalability of your code. A classic example of a deadlock is two combatants who have to modify both their own data and that of their enemies when resolving an attack. If all the data is on a combatant Managed Object, then both combat tasks need to lock both objects, typically in the opposite order, which is a formula for deadlock. One solution is to divide the combatants' combat statistics into two groups, the ones they modify themselves and the ones modified by an attacker, and store them on separate Managed Objects. The combatant object would then retain Managed References to these objects, and only lock them as needed. Another practical solution is partial serialization of the problem. This is especially appropriate if you want the combatants attacks to be processed in a fixed order. Here you have a single “combat clock” task (generally a repeating task) that processes in sequence all the attacks from all the combatants in a given battle. Not all contention situations are this easy to spot and design around. The RedDwarf Server gives you feedback at runtime as to what objects are deadlocking. Use this to tune your application."

Sinfonia

Sinfonia proposes a new paradigm for data access in a distributed system. In Sinfonia, there are application nodes and memory nodes. Application and memory nodes may or may not be in the system physical machine.

Application nodes are the RCAT application equivalent: an application node of a distributed system. The memory node is the equivalent of the (currently implemented solution, as of May 2013) OM in RCAT, with some additional features.

Sinfonia proposes a simpler transactional protocol called minitransactions. Minitransactions may only compare,(to constants), read, and write. By restraining the power of the transaction, minitransactions are capable of guaranteeing consistency through a two phase commit. The mini transaction is described as follows:

A coordinator start the transaction by creating a transaction id. It calculates all involving participants in all the compare, read and write operations, and sends all of them the entire transaction.
Each of the participants apply the compare and read items only (if any). If all compares succeed, the participant perform the reads and returns them with a vote of yes (commit) or no (abort) depending on the success of the reads and compares.
If the coordinator receives commit from all participants, it sends a message for them to commit. Each participant then commit the write items and release locks held for the mini transaction.

To illustrate how it can be used, lets describe an example. Lets say, in virtual environment, when 2 particular users are in the same room, they get rewarded with a sum of money. The mini transaction could be as follow:

read u1.money
read u2.money
cmp 50 < u1.x < 100
cmp 50 < u1.y < 100
cmp 50 < u2.x < 100
cmp 50 < u2.y < 100
write u1.money += 100
write u2.money += 100

User 1 will receive the transaction, compare x and y (and lock their values) of u1 and return yes, together with the current value of the money variable. User 2 with do the same to u2. When the coordinator receives ok from both, it sends a commit message to user 1 and 2 and returns the values of the u1.money and u2.money. Both users will then apply money += 100 to their own users.

It is important to observe that there is no order to the transactions of a minitransaction. The order will always be "perform the compares", "read the data", after commit, "write the data". Thus we cannot do a read after a write, in order to keep locking at a minimum and guarantee consistency.

Yet, since the value increase is a constant, by reading the previous values of u1.money and u2.money, it is now possible to know the current values after the transaction commits.

To keep track of where the objects are located, Sinfonia uses a similar mechanism from RCAT's OM: a centralized service that maps physical ids to logical ids. In OM, a database table maps IPs to UUIDs. This centralized approach may prove to not scale, it is still an open question.

A few questions arise from Sinfonia and its minitransactions:

Is such a strict transactional system required for consistency needs of an MMVE? 1.1) Could it be expanded to objects with foreign keys? At what cost?
Is it fast enough?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly