The popularity of the internet has shown us that there are lots of things in the world. Websites, people, pictures, movies – the list goes on. While the internet is notoriously ‘unstructured’, many of these things are identifiable by a simple addressing schema – a URL. The rise of social networking sites is but one example of the value of storing the relationships between things and while these sites use exciting technology to manage these relationships at internet scale, the stores are invariably application-specific.
In a previous blog post we introduced the Windows Azure Graph Store or WAGS as a general purpose store, offered as a managed service and we described the REST interface which serves as the API to the service. This post described the how of WAGS but not the why. In this follow up post, we will attempt to describe why a service like this is important for solving some of the challenges faced by application developers in the cloud.
If you’ve read the previous post you will know that WAGS stores tuples of data. Tuples are a great way of representing relationships between things that can be identified by strings. In the internet of things, all things are identified by URLs and given that URLs are just strings then a tuple in WAGS can represent a relationship between anything in the internet of things.
What can I do with this? Well imagine that I want to write an application that for a set of people tracks the things (other people, photos, interesting websites, etc.) that they’re interested in (sound familiar?). The computer scientists amongst us will recognize that the data structure required to store this kind of information is known as a Graph. A graph is simply a set of nodes or data entities with edges being the relationships between nodes. This kind of data structure is extremely flexible and is helpful for a wide variety of applications.
OrgShare.net – An ‘Interests’ Application
Let’s dive into our ‘interests’ sample application, OrgShare.net. We can see that people, pictures, websites, etc. can be stored as nodes and the interests between them can be stored as edges. I can use WAGS to store the nodes and edges of my graph (see the Quick Start Guide for details on the POST operation to store nodes and edges) and that is sufficient to provide a simple data model for the application. Once the interest relationships have been stored, the application can traverse from a person to the websites that they are interested in and in turn all of the people that are also interested in those same websites.
We can now enhance the data model for orgshare.net by adding attributes to the interests relationships. This style of application may have many different scenarios that demand attributing relationships, but the most obvious is the ability to store if a person ‘Likes’ or ‘Dislikes’ a website. Other attributes may include when the relationship is effective from or to or a rating system. Given that WAGS allows us to store any set of attributes on any node or edge makes enhancing the data model and adding these features very simple. WAGS actually allows me to store any attribute at all on the relationship both of primitive and complex (including collections) types. I may choose to standardize the set of attributes that I store for a given class of relationship or I may apply arbitrary and varying attributes sets. The store won’t know beforehand what the schema of the relationships I want to store is, but it is able to accept these payloads.
WAGS provides the ability to store fully featured relationships between entities located anywhere on the internet, I now want to store a relationship to a relationship. An example of this may be a marriage between two people (the first relationship) and a third person may wish to store that they ‘Like’ the marriage (a relationship to a relationship!). Given that the marriage relationship has a URL that uniquely identifies it and relationships can be constructed between two URLs, we now have the ability to store these types of relationships as well.
Breaking Out of the Data Silos
While there are clearly internet applications that embrace storing their data as graphs, these stores are usually very application specific. Often, they require all possible nodes to reside in the same storage and even if they do provide extensibility for third parties to add their own data, it is usually about furthering the main application and not the vendor of the third party app.
The first constraint requiring all nodes to exist in the same store is particularly restrictive. Given the disparate nature of the internet, entities exist in many, many stores and domains with no converging schema or shared set of attributes (and neither there should be!). Applications that are driven by the relationships between things typically do not own all or even any of these things, they just want to store the relationships. Stores or domains that own things are known as silos. Silos exist for a variety of reasons; data ownership & governance, application performance, a lack of ability to store things in a shareable space… Silos of data will continue to exist for a long time yet and so the ability to ‘break out’ of the silo is an important one if the data is to be leveraged more broadly. Given that the one feature common to all things is a URL that uniquely identifies the thing, the ability provided by WAGS to store a relationship between two URLs allows us to ‘break out’.
Another characteristic of breaking out of data silos is the ability to extend existing entities. An example of this may be the ability to store a payroll number for an employee in the corporate directory. If the employee has a URL that uniquely identifies her in the corporate directory, then instead of taking the traditional approach of paying your directory vendor to extend the schema or pushing central IT to add the extension into the directory, an application developer can simply store the data in a graph store and hoist the relationship between that data and the directory entity. There’s no requirement to synchronize the directory with the graph store and the employee entity is seamlessly extended.
Semi-Structured Data
As we previously mentioned, the relationships in a graph store are semi-structured– they have structure, it is just not well known in advance or consistent. This has traditionally been a problem both for the store holding the data and clients discovering the structure of the data.
Relational stores such as SQL always leverage a well known schema to provide performant query and update capabilities as well as efficient layout of their data storage. The switch to cloud-based storage services means that data layout is less important (to the application writer at least) and the simple keying of graph nodes or edges means that query plans do not require sophisticated analysis by a query engine.
The ability for client applications to consume semi-structured data is greatly enhanced by serialization formats employed by the service. JSON (JavaScript Object Notation) is a very simple and easily discoverable format that is based on an implicit understanding that the client may not have any prior knowledge about the payload that it is receiving. This enables a client application to marshal unknown data structures into its own address space and reflect over them. The client still needs to be able to understand the semantics of any given payload, but at least it can do this without deserialization issues.
What’s Next?
In my next post, I will discuss how we use WAGS to extend entities within Azure Active Directory (AAD) and the Windows Azure Enterprise Graph.
Please note that at this stage WAGS is only available in public preview mode. This means that although we are supporting it as a real service, we are not enforcing any SLAs.
As always we encourage you to give the service a try. Give us your feedback on how simple you find it to use and how well it fits your needs.
By way of a sample app that leverages WAGS, checkout http://www.orgshare.net/premier-lumber.com (sign in as joe@premier-lumber.com, gustavo@premier-lumber.com or ivo@premier-lumber.com all passwords: P@ssword). See what the simplicity of storing relationships between things can do!
Additionally, WAGS has a simple web interface for browsing and managing graphs. Go to https://graphstore.windows.net/default.aspx and start entering queries and issuing updates as described in the Quick Start Guide into the app. Sign on using your AAD/Office 365 credentials to start managing the permissions of your graphs (if you’re a company administrator).