COMP 310
|
Distributed Systems
|
|
Topics:
Server vs. Microservices
[PRELIMINARY DRAFT -- This page is still
under construction! Check back often for updates.]
There are two common ways in which people see the interconnected pieces
of a distributed system (or possibly a hybrid of the two):
- Collection of "servers" -- A server represents a
collection of network connection endpoints
that are accessing a shared operational entity that is coordinating the
endpoint behaviors through shared internal operations, resources and state.
This view of a system that sees the whole as being made up of relatively
large, more monolithic components.
- Collection of "microservices" -- A microservice
represents a single network connection
endpoint that represents a single functional operation. The
microservices are maximally decoupled from each other and are generally
designed to be called independently. This view of the system sees the
whole as being comprised of relatively small, independent functions.
All the major cloud service providers now have both server and microservice
offerings plus some, e.g. Microsoft Azure, offer services that are positioned
between the two ideas.
The classic example of the "server" architecture viewpoint would be Amazon
EC2 virtual servers. The classic microservice example would be
the Google AppEngine Standard Environment.
Pros and Cons
Server architecture:
- Pros:
- Easy to conceptualize because it is most similar to a ground-based
server architecture -- the distributed system in fact may consist solely
of ground-based machines.
- Easy to migrate from ground-based system to a cloud-based system.
- The cloud infrastructure is mostly just adding load balancing
and networking, shared databases and auto-system scaling.
- Shared state is more easily conceptualized and implemented.
- Synchronizing data between operations is thus easier so long as
that data doesn't cross server boundaries.
- Can be faster for operations that involve shared data.
- Cons:
- Has trouble scaling due to shared state.
- That is, in a load-balanced system of multiple identical
servers, the next request isn't guaranteed to go to the same server
and thus will not access the same state.
- This also affects the ability of multiple systems to perform
parallelized operations if shared data is involved.
- Also, the scaling is at the server level, so if there is a high
load on a single operation, the entire server must be scaled, not
just that one operation.
- Maintaining any part of the system requires that the entire system
be redeployed. For instance, if a single endpoint's
functionality needs to be upgraded, the entire server must be updated.
Microservice architecture:
- Pros:
- Well suited for maximum scalability and parallelization.
- Microservices are often designed as stateless functional
processes which maximizes decoupling.
- Microservices can be scaled independently so compute resources
can be more efficiently targeted.
- Easy to maintain because individual microservices can be updated
without disturbing the others.
- Cons:
- More difficult to conceptualize as an operational system than a
convential more monolithic server.
- Needs significant surrounding infrastructure
- Microservices are not typically implemented on ground systems
because of the large infrastructure needed to route calls to
specific endpoint implementations.
- Transitioning from a ground-based system to a cloud-based system
is generally more difficult due to having to re-architect some to
all of the system.
- Accessing shared data is more difficult and potentially slower due
to being restricted to use shared databases and/or memcaches instead of
in-memory shared state.
- Synching of data between operations requires accessing shared
data storage, which may be subject to coherency issues.
Servers and Microservices in ChatApp and the Final Project
The following discussion is typically more applicable to the Final
Project but technically, is applicable to ChatApp as well.
These projects are designed to introduce the students to a wide range of
issues surrounding distributed systems. The architecture being used
in the projects actually involves both server
and microservice architectures.
"Game Server"
The Final Project has a well-defined notion of a "game server" but even in
ChatApp, the "game server" is really just any message sender that is
sending out messages whose type and processing are initially unknown to the
message receiver.
Since all the messages associated with a game are known a priori to
the game server, internally can treat the commands that process those
message as well-known. That is, all the commands installed on the
game server can have intimate access to any shared game state as well as to each
other. (Note: The commands that a game server sends to the game clients
are NOT necessarily the same as those installed on the game server itself!)
Thus, the game server application most closely matches the "server"
architecture described above. It is able to gain the advantages of a
server without most of the disadvantages because there is only a single game
server instance in a typical Final Project implementation.
"Game Client"
The Final Project has a well-defined notion of a "game client" but even in
ChatApp, the "game client" is really just the receiver of any initially
unknown message type.
Since none of the game-specific message types are known a priori to
a game client, the game client must treat all game-specific messages as
unknown types and thus the commands to process those message types (sourced
originally from the game server) are subject to the sandbox created by the
ICmd2ModelAdapter. Each command only processes a single type
of messag and are naturally decoupled from each other unless a connection is
made through the use of shared data storage (e.g. mixed data dictionary
services).
Thus, the game client application most closely matches the "microservices"
architecture described above. Because every game involves many game
clients, it makes sense to take advantage of the scaling advantages of the
microservices architecture here.
Synchronization of Distributed Data
Trying to maintain coherent copies of data across a distributed system is a
very difficult problem. While it tempting to simply put all of the
shared data onto a single, central server, there are many reasons that this may
not be desirable, such as but not limited to:
- Performance -- working on local copies of the data is typically much
faster than having to access it remotely.
- Robustness -- if one holder of the data fails, then the data can be
reconstructed from the remaining holders.
- Consistency -- if processing of the data is potentially unreliable, e.g.
failing hardware, then consistency checks agains other systems can be
performed.
There are different approaches to solving the problem of synchronizing
distributed data
- Purely distributed protocols -- each holder of the data
acts identically, sending messages to/from each other to come to consensus
about the current state of the replicated data.
- Pros:
- A distributed system fundamentally affords a level of fault
tolerance.
- Every holder in the system is on parity which gives the system
more flexibility and extensibility.
- Can be more performant for certain situations where processes
can be parallelized and communications traffic can be spread out.
- Cons:
- Synchronization algorithms are not guaranteed to converge in any
given time frame, though certain limits can be deduced.
- Protocols are often very complicated and difficult to implement
properly.
- Can be slow if a lot of communications traffic is required.
- Very difficult to debug when something goes wrong.
- Can present multiple attack surfaces from a security standpoint.
- Authoritative server protocols -- a single holder
defines the current state of the data and all other holders synchronize to
that single holder.
- Pros:
- Easier to implement
- Less prone to difficult bugs or corner cases.
- Typically requires less communications traffic than distributed
protocols
- Usually more proveably convergent over a given time frame
- Cons:
- Single point of failure at the authoritative server
- Can present a performance bottleneck
- May not scale well to large numbers of data holders
- Hybrid protocols -- Attempt to gain the advantages of
both distributed and authoritative protocols while minimizing the
disadvantages.
- Techiques employed:
- Add fault tolerance to an authoritative server
- Replicating the authoritative server, often behind
encapsulation barriers.
- Add data verification steps.
- Utilize protocols to transfer the authoritative server to
another machine in the event of failure.
- More difficult to implement than a simplistic, single-machine
authoritative server but less difficult than a purely distributed
system.
- Transfer of authority protocols are non-trivial when done in a
robust manner.
- Fault tolerance and data verfication can be difficult to design
and implement.
- Client implementation difficulties can be minimized by making
the system look like a simpler authoritative server system to a
normal data host.
- Complications can be more confined to the authoritative
servers end of the system.
- Performance bottlenecking could still exist unless non-trivial load
balancing infrastructure is included.
- Controlling server -- a single central server
creates and manages all synchronized data. Clients must always access the central server for any operations involving the synchronized data.
- Pros:
- Arguably provides definitive synchronization of the data because the singularity of the data precludes the need for synchronization.
- Relatively easy to set up and run.
- Cons:
- Not scalable because everything must be routed through the central server.
- Tightly couples all of the system's participants together because they must all be tied to the central server.
- This restriction can greatly reduce the flexibility and extensibility of the system.
- Can cause the system to be very slow because all data access must involve a remote network call.
There are many systems, both commercial and open-source, that tackle to this
very difficult problem of synchronizing data and operations across a distributed
system.
Here are a few examples:
- RAFT Consensus
Algorithm -- distributed process with a "strong leader" notion.
- Apache ZooKeeper
-- centralized server
-
PAXOS family of consensus algorithms -- assumes a network of unreliable
processors
© 2017 by Stephen Wong