Why world is moving towards Graph Databases
We live in a connected world, and understanding most domains requires processing rich sets of connections to understand what’s really happening. Often, we find that the connections between items are as important as the items themselves.
While existing relational databases can store these relationships, they navigate them with expensive JOIN operations or cross-lookups, often tied to a rigid schema. It turns out that “relational” databases handle relationships poorly. In a graph database, there are no JOINs or lookups. Relationships are stored natively alongside the data elements (the nodes) in a much more flexible format. Everything about the system is optimized for traversing through data quickly; millions of connections per second, per core.
What is a Graph Database?
A graph database stores nodes and relationships instead of tables, or documents. Data is stored just like you might sketch ideas on a whiteboard. Your data is stored without restricting it to a pre-defined model, allowing a very flexible way of thinking about and using it.
Graph databases address big challenges many of us tackle daily. Modern data problems often involve many-to-many relationships with heterogeneous data that sets up needs to:
- Navigate deep hierarchies,
- Find hidden connections between distant items, and
- Discover inter-relationships between items.
- Whether it’s a social network, payment networks, or road network you’ll find that everything is an interconnected graph of relationships. And when we want to ask questions about the real world, many questions are about the relationships rather than about the individual data elements
The Property Graph Model
Graph databases portray the data as it is viewed conceptually. This is accomplished by transferring the data into nodes and its relationships into edges.
A graph database is a database that is based on graph theory. It consists of a set of objects, which can be a node or an edge.
- Nodes represent entities or instances such as people, businesses, accounts, or any other item to be tracked. They are roughly the equivalent of a record, relation, or row in a relational database, or a document in a document-store database.
- Edges, also termed graphs or relationships, are the lines that connect nodes to other nodes; representing the relationship between them. Meaningful patterns emerge when examining the connections and interconnections of nodes, properties and edges. The edges can either be directed or undirected. In an undirected graph, an edge connecting two nodes has a single meaning. In a directed graph, the edges connecting two different nodes have different meanings, depending on their direction. Edges are the key concept in graph databases, representing an abstraction that is not directly implemented in a relational model or a document-store model.
- Properties are information associated to nodes. For example, if Wikipedia were one of the nodes, it might be tied to properties such as website, reference material, or words that starts with the letter w, depending on which aspects of Wikipedia are germane to a given database.
Why graph database is different from relational databases
he data models for relational versus graph are very different. The straightforward graph structure results in much simpler and more expressive data models than those produced using traditional relational or other NoSQL databases.
If you are used to modeling with relational databases, remember the ease and beauty of a well-designed, normalized entity-relationship diagram – a simple, easy-to-understand model you can quickly whiteboard with your colleagues and domain experts. A graph is exactly that – a clear model of the domain, focused on the use cases you want to efficiently support.
Let’s compare the two data models to show how the structure differs between relational and graph.
Relational – Person and Department tables
In the above relational example, we search the Person table on the left (potentially millions of rows) to find the user Alice and her person ID of 815. Then, we search the Person-Department table (orange middle table) to locate all the rows that reference Alice’s person ID (815). Once we retrieve the 3 relevant rows, we go to the Department table on the right to search for the actual values of the department IDs (111, 119, 181). Now we know that Alice is part of the 4Future, P0815, and A42 departments.
Graph – Alice and 3 Departments as nodes
In the above graph version, we have a single node for Alice with a label of Person. Alice belongs to 3 different departments, so we create a node for each one and with a label of Department. To find out which departments Alice belongs to, we would search the graph for Alice’s node, then traverse all of the BELONGS_TO relationships from Alice to find the Department nodes she is connected to. That’s all we need – a single hop with no lookups involved.
To further illustrate, imagine a relational model with two tables: a people table (which has a person_id and person_name column) and a friend table (with friend_id and person_id, which is a foreign key from the people table). In this case, searching for all of Jack’s friends would result in the following SQL query.
SELECT p2.person_name
FROM people p1
JOIN friend ON (p1.person_id = friend.person_id)
JOIN people p2 ON (p2.person_id = friend.friend_id)
WHERE p1.person_name = 'Jack';
The same query may be translated into –Cypher, a graph database query language
MATCH (p1:person {name: 'Jack'})-[:FRIEND_WITH]-(p2:person)
RETURN p2.name
SPARQL, an RDF graph database query language standardized by W3C and used in multiple RDF Triple and Quad stores
- Long form
SELECT ?name
WHERE { ?s a foaf:Person .
?s foaf:name "Jack" .
?s foaf:knows ?o .
?o foaf:name ?name .
}
- Short form
SELECT ?name
WHERE { ?s foaf:name "Jack" ;
foaf:knows ?o .?o foaf:name ?name .
}
- Long form
SPASQL, a hybrid database query language, that extends SQL with SPARQL
SELECT people.name
FROM (
SPARQL PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
WHERE { ?s foaf:name "Jack" ;
foaf:knows ?o .
?o foaf:name ?name .
}
) AS people ;
The above examples are a simple illustration of a basic relationship query. They condense the idea of relational models’ query complexity that increases with the total amount of data. In comparison, a graph database query is easily able to sort through the relationship graph to present the results.
There are also results that indicate simple, condensed, and declarative queries of the graph databases do not necessarily provide good performance in comparison to the relational databases. While graph databases offer an intuitive representation of data, relational databases offer better results when set operations are needed
Types of Graph database available in Market
The following is a list of graph databases:
Types of graph query-programming languages
- AQL (ArangoDB Query Language): a SQL-like query language used in ArangoDB for both documents and graphs
- Cypher Query Language (Cypher): a graph query declarative language for Neo4j that enables ad hoc and programmatic (SQL-like) access to the graph.
- GQL: proposed ISO standard graph query language
- GSQL: a SQL-like Turing complete graph query language designed and offered by TigerGraph
- GraphQL: an open-source data query and manipulation language for APIs. Dgraph implements modified GraphQL language called DQL (formerly GraphQL+-)
- Gremlin: a graph programming language that is a part of Apache TinkerPop open-source project
- SPARQL: a query language for RDF databases that can retrieve and manipulate data stored in RDF format
Add Comment
You must be logged in to post a comment.