Distributed Systems

COMP 310	Distributed Systems

Topics:

Server vs. Microservices

[PRELIMINARY DRAFT -- This page is still under construction! Check back often for updates.]

There are two common ways in which people see the interconnected pieces of a distributed system (or possibly a hybrid of the two):

Collection of "servers" -- A server represents a collection of network connection endpoints that are accessing a shared operational entity that is coordinating the endpoint behaviors through shared internal operations, resources and state. This view of a system that sees the whole as being made up of relatively large, more monolithic components.
Collection of "microservices" -- A microservice represents a single network connection endpoint that represents a single functional operation. The microservices are maximally decoupled from each other and are generally designed to be called independently. This view of the system sees the whole as being comprised of relatively small, independent functions.

All the major cloud service providers now have both server and microservice offerings plus some, e.g. Microsoft Azure, offer services that are positioned between the two ideas.

The classic example of the "server" architecture viewpoint would be Amazon EC2 virtual servers. The classic microservice example would be the Google AppEngine Standard Environment.

Pros and Cons

Server architecture:

Pros:
- Easy to conceptualize because it is most similar to a ground-based server architecture -- the distributed system in fact may consist solely of ground-based machines.
- Easy to migrate from ground-based system to a cloud-based system.
  - The cloud infrastructure is mostly just adding load balancing and networking, shared databases and auto-system scaling.
- Shared state is more easily conceptualized and implemented.
  - Synchronizing data between operations is thus easier so long as that data doesn't cross server boundaries.
- Can be faster for operations that involve shared data.
Cons:
- Has trouble scaling due to shared state.
  - That is, in a load-balanced system of multiple identical servers, the next request isn't guaranteed to go to the same server and thus will not access the same state.
  - This also affects the ability of multiple systems to perform parallelized operations if shared data is involved.
  - Also, the scaling is at the server level, so if there is a high load on a single operation, the entire server must be scaled, not just that one operation.
- Maintaining any part of the system requires that the entire system be redeployed. For instance, if a single endpoint's functionality needs to be upgraded, the entire server must be updated.

Microservice architecture:

Pros:
- Well suited for maximum scalability and parallelization.
  - Microservices are often designed as stateless functional processes which maximizes decoupling.
  - Microservices can be scaled independently so compute resources can be more efficiently targeted.
- Easy to maintain because individual microservices can be updated without disturbing the others.
Cons:
- More difficult to conceptualize as an operational system than a convential more monolithic server.
- Needs significant surrounding infrastructure
  - Microservices are not typically implemented on ground systems because of the large infrastructure needed to route calls to specific endpoint implementations.
  - Transitioning from a ground-based system to a cloud-based system is generally more difficult due to having to re-architect some to all of the system.
- Accessing shared data is more difficult and potentially slower due to being restricted to use shared databases and/or memcaches instead of in-memory shared state.
  - Synching of data between operations requires accessing shared data storage, which may be subject to coherency issues.

Servers and Microservices in ChatApp and the Final Project

The following discussion is typically more applicable to the Final Project but technically, is applicable to ChatApp as well.

These projects are designed to introduce the students to a wide range of issues surrounding distributed systems. The architecture being used in the projects actually involves both server and microservice architectures.

"Game Server"

The Final Project has a well-defined notion of a "game server" but even in ChatApp, the "game server" is really just any message sender that is sending out messages whose type and processing are initially unknown to the message receiver.

Since all the messages associated with a game are known a priori to the game server, internally can treat the commands that process those message as well-known. That is, all the commands installed on the game server can have intimate access to any shared game state as well as to each other. (Note: The commands that a game server sends to the game clients are NOT necessarily the same as those installed on the game server itself!)

Thus, the game server application most closely matches the "server" architecture described above. It is able to gain the advantages of a server without most of the disadvantages because there is only a single game server instance in a typical Final Project implementation.

"Game Client"

The Final Project has a well-defined notion of a "game client" but even in ChatApp, the "game client" is really just the receiver of any initially unknown message type.

Since none of the game-specific message types are known a priori to a game client, the game client must treat all game-specific messages as unknown types and thus the commands to process those message types (sourced originally from the game server) are subject to the sandbox created by the ICmd2ModelAdapter. Each command only processes a single type of messag and are naturally decoupled from each other unless a connection is made through the use of shared data storage (e.g. mixed data dictionary services).

Thus, the game client application most closely matches the "microservices" architecture described above. Because every game involves many game clients, it makes sense to take advantage of the scaling advantages of the microservices architecture here.

Synchronization of Distributed Data

Trying to maintain coherent copies of data across a distributed system is a very difficult problem. While it tempting to simply put all of the shared data onto a single, central server, there are many reasons that this may not be desirable, such as but not limited to:

Performance -- working on local copies of the data is typically much faster than having to access it remotely.
Robustness -- if one holder of the data fails, then the data can be reconstructed from the remaining holders.
Consistency -- if processing of the data is potentially unreliable, e.g. failing hardware, then consistency checks agains other systems can be performed.

There are different approaches to solving the problem of synchronizing distributed data

Purely distributed protocols -- each holder of the data acts identically, sending messages to/from each other to come to consensus about the current state of the replicated data.
- Pros:
  - A distributed system fundamentally affords a level of fault tolerance.
  - Every holder in the system is on parity which gives the system more flexibility and extensibility.
  - Can be more performant for certain situations where processes can be parallelized and communications traffic can be spread out.
- Cons:
  - Synchronization algorithms are not guaranteed to converge in any given time frame, though certain limits can be deduced.
  - Protocols are often very complicated and difficult to implement properly.
  - Can be slow if a lot of communications traffic is required.
  - Very difficult to debug when something goes wrong.
  - Can present multiple attack surfaces from a security standpoint.
Authoritative server protocols -- a single holder defines the current state of the data and all other holders synchronize to that single holder.
- Pros:
  - Easier to implement
  - Less prone to difficult bugs or corner cases.
  - Typically requires less communications traffic than distributed protocols
  - Usually more proveably convergent over a given time frame
- Cons:
  - Single point of failure at the authoritative server
  - Can present a performance bottleneck
  - May not scale well to large numbers of data holders
Hybrid protocols -- Attempt to gain the advantages of both distributed and authoritative protocols while minimizing the disadvantages.
- Techiques employed:
  - Add fault tolerance to an authoritative server
    - Replicating the authoritative server, often behind encapsulation barriers.
    - Add data verification steps.
  - Utilize protocols to transfer the authoritative server to another machine in the event of failure.
- More difficult to implement than a simplistic, single-machine authoritative server but less difficult than a purely distributed system.
  - Transfer of authority protocols are non-trivial when done in a robust manner.
  - Fault tolerance and data verfication can be difficult to design and implement.
  - Client implementation difficulties can be minimized by making the system look like a simpler authoritative server system to a normal data host.
    - Complications can be more confined to the authoritative servers end of the system.
- Performance bottlenecking could still exist unless non-trivial load balancing infrastructure is included.
Controlling server -- a single central server creates and manages all synchronized data. Clients must always access the central server for any operations involving the synchronized data.
- Pros:
  - Arguably provides definitive synchronization of the data because the singularity of the data precludes the need for synchronization.
  - Relatively easy to set up and run.
- Cons:
  - Not scalable because everything must be routed through the central server.
  - Tightly couples all of the system's participants together because they must all be tied to the central server.
    - This restriction can greatly reduce the flexibility and extensibility of the system.
  - Can cause the system to be very slow because all data access must involve a remote network call.

There are many systems, both commercial and open-source, that tackle to this very difficult problem of synchronizing data and operations across a distributed system.

Here are a few examples: