Access Keys:
Skip to content (Access Key - 0)

How-to Write Fault-Tolerant RAs?

Print this page

The following notes are aimed at developers who have already written an RA, but now need to know what to do to make it fault-tolerant.

Making your RA fault-tolerant

To make an RA fault-tolerant, you need to configure:

  • activity handle replication — ensuring activity handle properties, implementing marshaler methods, and setting a config property so Rhino can replicate activity handles
  • activity state replication — replicating state using SLEE Profiles or the ReplicatedStorageFacility, accessing replicated state in a transaction, and cleaning up replicated state
  • activity failover — implementing the ReplicatingResourceAdaptor interface
  • SBB replication — including an oc-service.xml jar file so that SBB entities and ActivityContexts on a failed node can automatically continue on surviving nodes.
Performance and FT
Beware of the trade-off between performance and fault-tolerance. The more resources you need replicated, the worse the performance. You can see some benchmarks showing the HA and FT performance of sample SIP and IN services in Rhino documentation.

In your FT RA you will probably want to use a thread pool, if possible. When a replicated entry is accessed, a distributed lock must be acquired and this does take time, perhaps a few milliseconds or more if there is contention. Using more worker threads means that you can still do work while some threads wait. Also make sure that you are not holding Java locks while accessing replicated state; otherwise this will create a bottleneck as other threads wait for that lock.

Adaptavist Theme Builder Powered by Atlassian Confluence