Comp405: Data Storage in GAE

COMP 405 Spring 2014	Data Storage in GAE

There are three basic ways to store data in GAE:

Cloud Storage -- a file-based storage facility that is not technically separate from GAE.
Cloud SQL -- a relational database system (RDBS) that uses a variant of SQL call GQL.
App Engine Datastore -- a table-oriented schema-less storage

Consistency issues with distributed DBs....

App Engine Datastore

Here, the data storage is much like a giant, distributed key-value dictionary. The value contains multiple pieces of data all related to the same key. .

What corresponds to a traditional key-value pair is called an Entity.

The key of an Entity has 3 parts:

Kind - a string that defines a class of Entities
Identifier - a string (user assigned) or an integer (GAE assigned) that uniquely identifies this instance of the data
Ancestor path - An Entity that defines a "parent" of this entity. The parent and all its "children" form an "Entity group".

There are several ways to access data from Java:

Low Level Datastore API
Java Data Objects (JDO) -- an underlying DB implementation independent Object Relational Mapping (ORM) standard.
Java Persistence API (JPA) - an ORM standard geared towards use on RDBS, but adapted to be used on GAE's NoSQL Datastore.

An esier approach that the above 3 techniques is to use Objectify, a package makes it easier to use the datastore. Objectify is not an industry standard like JDO or JPA but it is geared towards getting novices up and running faster.

Objectify Home
- Concepts -- nice intro to both Objectify as well as GAE Datastore
- User Manual
  - Setup
    - Download the objectify-X.Y.Z.jar (e.g. objectify-4.0b2.jar) into your war/WEB-INF/lib folder
    - Set your Java build path to include this new JAR file.
    - Note: Objectify's setup directions on how to get Eclipse to auto-generate the static import for ofy() neglects to mention that to enter the class name, you need to click the "New Member..." button.
    - Add the following lines to your app.yaml (this is auto-generate what Objectify's setup instructions say to add to the web.xml file):
  - Defining Entities
    - Registering Entities -- all classes being used for data storage must be explicitly "registered" during the initialization of the servlet that uses them!
    - Annotation Reference
  - Basic Operations
    - Warning: Do NOT retain a reference to the object returned by ofy()! Always call this factory method to insure that you get the proper Objectify object for your program's current context.

Guaranteeing Uniqueness

It is very common to desire that a new Entity to be stored in the DB is unique in the DB, e.g. a new user in a system. In a typical RDBS, one does this by specifying that a key is unique. But GAE's Datastore doesn't quite work that way.

GAE Datastore has two important "features" to be aware of:

There is no built-in operation to test if an Entity with the same Id exists in the database.
There are no checks on whether you are creating a new Entity in the DB or overwriting an existing one.

To test if something is in the DB, one essentially must try to read it and then react if the requested Entity does not yet exist.

Since the process of making sure an Entity is unique involves multiple DB interactions, the use of "transactions" is in order. A transaction is a unit of work on a DB that insures that the operations in that work form an atomic process on the DB.

In pseudo-code:

Start a transaction
Make the new Entity
Try to get the new Entity for the DB -- note that Entity is retrieved on the basis of its Id, not the rest of its data values.
- If no Entity found, save the entity. Success!
- If the Entity is found, then a duplicate exists.
If the transaction fails, then it means that someone saved an Entity with the same Id between when the Entity was unsuccessfully read from the DB and when the new Entity was committed into the DB.

Example: (IResult is an abstract result type used for processing success, fail, etc. processing results)

		try {
			return ofy().transact(new Work<IResult>(){  // Start a transaction

				@Override
				public IResult run() {
					User newUser = new User(name, groups);   // Make the desired Entity
					
					if (null == ofy().load().entity(newUser).getValue()) {   // Try to read the Entity from the DB
						ofy().save().entity(newUser);   // Nothing found, so safe to save
						return new SuccessResult("Successfully created new user "+name);
					}
					else {  // Entity with same Id found in DB, so error.
						return new FailResult("User "+name+" already exists!");
					}
				}
			});
		}
		catch(Exception e){  // Exception will be thrown if concurrent modification occurs.
			return new FailResult("Exception: "+e.getMessage());  
		}

References:

Database transactions -- Wikipedia article
Objectify documentation on transactions
Is there such thing as @Unique to mark field with unique value -- discussion about working with unique values in GAE
Best way to see if entity exist -- longer discussion of how to save unique values.

COMP 405 Spring 2014

Data Storage in GAE

App Engine Datastore

Guaranteeing Uniqueness

COMP 405
Spring 2014