(Back to the RMI Overview)
Whenever a method on an RMI stub is called, there is always
a possibility that a RemoteException
could be thrown.
There are many reasons that the exception could be thrown; do not assume
that the error is necessarily on the remote system's end! The
reasons include but are certainly not limited to:
- The RMI server machine (callee) has disconnected from the network due to
- networking issues on its end
- RMI application has stopped or locked up.
- Networking problems on the caller's end.
- Network infrastructure problems between the two machines.
Not handling a RemoteException
can cause a cascade of problems,
so RemoteExceptions
must ALWAYS BE CAUGHT and handled in a manner that keeps the system
operational.
The problem of RemoteExceptions
becomes more acute in systems
where multiple RMI clients are relying on communicating with the same RMI
server. This situation occurs in both client-server star
topologies as well as peer-to-peer connected-graph topologies.
The difficult issue here is that any given RMI client does not necessarily know
if any other RMI client is having difficulty communicating with a particular RMI
server. As the above list of possible reasons for RemoteExceptions
illustrates, the problem might not actually be on the RMI server end.
The problem could be isolated to the RMI client end but if a particular RMI
client takes unilateral action, this could cause overall system problems if
other RMI clients are not experiencing the same issues.
There are many possible techniques to handle, all of which have both pros
and cons. Here are few options to consider but in no way
should anyone limit oneself to only these possibilities!
The following options all assume that each RMI client has a list of possible
RMI stubs with which they are all communicating (referred to as a "room" below).
"Authoritative server" refers to systems where there is a centralized controller
of room membership. Not all systems have such centralized control.
- "Hard-core" -- When a single
RemoteException
is caught, the generating RMI stub is completely removed from
the room locally and e.g. a message sent out to all other stubs in the
room to remove the offending stub.
- Pros:
- Cons:
- Does not allow for any sort of temporary error condition.
- Assumes that the error is on the RMI server end.
- Notes:
- The global removal process must be coordinated with ith an authoritative server if it exists.
- This can cause a problem is the authoritative server is
required to define the removal and the offending stub is that
for the authoritative server.
- "Quarantine" --
Offending stub is put into a local quarantine list but left in the room
temporarily. After N exceptions in a row without success, the
offending stub is removed from the room and other clients notified of the
removal.
- Pros:
- Allows for recovery from temporary issues.
- Cons:
- Requires continual checking of all stubs' quarantine status.
- Assumes that the error is on the RMI server end.
- Notes:
- The global removal process must be coordinated with ith an authoritative server if it exists.
- This can cause a problem is the authoritative server is
required to define the removal and the offending stub is that
for the authoritative server.
- "Consensus/Voting" --
A special warning message is sent out to the room when an exception is caught.
When a consensus or majority of RMI clients indicate that a particular RMI
stub is having problems (e.g. a certain percentage of the total), that stub
is globally removed. In authoritative server systems, the
authoritative server would determine when to remove an offending stub from
the room.
- Pros:
- Allows RMI client end problems to be handled.
- For instance, a "repeat complainer" client could be removed
if warnings are unsubstantiated by other clients.
- In authoritative server systems, the more complex operations can be encapsulated in the server.
- Potentially less checking than "quarantined" technique.
- Cons:
- Requires a special message to be added to the system.
- Can be difficult to achieve a robust convergent process for
non-authoritative, i.e. pure peer-to-peer, systems.
- In authoritative systems, it is very difficult to handle the
situation when the offending stub is to authoritative server itself.
- This is a fundamental problem with authoritative server
systems.
- Notes:
- May require quarantined like processes to handle a
non-responsive authoritative server, negating some of the advantages
of this technique.
© 2020 by Stephen Wong