Februari 14, 2012

Introduction to Graph Database and OrientDB

Graph database is a kind of NOSQL database which simply a term to show about the database management system software which is not a part of relational database. Since the scalability issues have arise and become a very big concerns for any company, this kind of database is getting more and more attention. People usually call this kind of needs on data scalability as web scale.

You may see from the data needs and inherent characteristics of web application that web is usually comprise of many resources which can link each other. This characteric is inherent in a graph, just like mathematicians usually care about. While the discussion about graph theory is outside from the scope of this article, readers interested in background theory about graph database should find a good book in discrete mathematics and especially graph theory. Graph theory is one prime object study of discrete mathematics and become one prominent theory since the first publishing of Leonhard Euler paper entitled "Seven Bridges of Königsberg". A graph consists of vertices and edges, as explained in this picture:


Graph theory has a very broad range of application, not just in web and / or computer science. That said, I will not repeat the things that already written in Wikipedia, just have a look at the Wikipedia articles and found it by yourself.

There are some companies and open source projects that realize the importance of graph theory and start the development of graph database. To make you confuse, there are some software that you can use, for example Neo4J, OrientDB, Mulgara - a specialized version of Graph Database which deals with RDF, etc. Benchmarking and choosing which graph database is better is not in the scope of this article and I personally do not interested in all of those lies, so I just want to use OrientDB for all my graph database needs. :p

OrientDB comes with small size zipped file. It takes only around 2 MB for OrientDB or you may get OrientDB with all of Tinkerpop "standard" inside only with around 8 MB size.
...
-rw-r--r-- 1 bpdp users 2689843 Feb  3 06:06 orientdb-1.0rc8.zip
-rw-r--r-- 1 bpdp users 8584056 Feb  3 06:07 orientdb-graphed-1.0rc8.zip
...

You will ask then, what is the difference between those two zip files? orientdb-version is the graph database software and only OrientDB, while for orientdb-graphed, is also consists of Tinkerpop thingie beside the graph database. Tinkerpop builds some software (open source) related with graph. You can use this graphed edition if you want to enable Tinkerpop abilities in your OrientDB database. It provides wrapper for OrientDB database so that you can use Tinkerpop related API in your software. There are 4 API supported by OrientDB: Blueprints, Gremlin, Rexster, and Pipes.

In this article, we will use the graphed version. Installation is pretty easy, as long as you have JDK, then it should not be a problem at all. What you need to do is only unzip the file and set some enviroment variables.
$ unzip orientdb-graphed-1.0rc8.zip
$ cd orientdb-graphed-1.0rc8

I put this in my $HOME/.bashrc file:
export ORIENTDB_HOME=/home/bpdp/software/orientdb/orientdb-graphed
export PATH=$PATH:$ORIENTDB_HOME/bin

One more thing, we should also change the configuration in $ORIENTDB_HOME/config/orientdb-server-log.properties to reflect the location of the log file:
java.util.logging.FileHandler.pattern=/home/bpdp/software/orientdb/orientdb-graphed/log/orient-server.log

You may of course change the location to any other place in filesystem that you want. To execute the server, use $ORIENTDB_HOME/bin/server.sh shell script:
$ server.sh 
           .                                              
          .`        `                                     
          ,      `:.                                      
         `,`    ,:`                                       
         .,.   :,,                                        
         .,,  ,,,                                         
    .    .,.:::::  ````                                   
    ,`   .::,,,,::.,,,,,,`;;                      .:      
    `,.  ::,,,,,,,:.,,.`  `                       .:      
     ,,:,:,,,,,,,,::.   `        `         ``     .:      
      ,,:.,,,,,,,,,: `::, ,,   ::,::`   : :,::`  ::::     
       ,:,,,,,,,,,,::,:   ,,  :.    :   ::    :   .:      
        :,,,,,,,,,,:,::   ,,  :      :  :     :   .:      
  `     :,,,,,,,,,,:,::,  ,, .::::::::  :     :   .:      
  `,...,,:,,,,,,,,,: .:,. ,, ,,         :     :   .:      
    .,,,,::,,,,,,,:  `: , ,,  :     `   :     :   .:      
      ...,::,,,,::.. `:  .,,  :,    :   :     :   .:      
           ,::::,,,. `:   ,,   :::::    :     :   .:      
           ,,:` `,,.                                      
          ,,,    .,`                                      
         ,,.     `,                     GRAPH-DB Server   
       ``        `.                                       
                 ``                                       
                 `                                        

2012-02-11 10:34:02:864 INFO [OLogManager] OrientDB Server v1.0rc8 (build @BUILD@) is starting up...
2012-02-11 10:34:03:336 INFO [OLogManager] -> Loaded memory database 'temp'
2012-02-11 10:34:03:372 INFO [OLogManager] Listening binary connections on 0.0.0.0:2424
2012-02-11 10:34:03:375 INFO [OLogManager] Listening cluster connections on 0.0.0.0:2434
2012-02-11 10:34:03:377 INFO [OLogManager] Listening http connections on 0.0.0.0:2480
2012-02-11 10:34:03:402 INFO [OLogManager] Installing GREMLIN language v.1.4
2012-02-11 10:34:03:437 INFO [OLogManager] OrientDB Server v1.0rc8 is active.

From another terminal, we may use $ORIENTDB_HOME/bin/console.sh shell script to access the server:
$ console.sh 
OrientDB console v.1.0rc8 (build @BUILD@) www.orientechnologies.com
Type 'help' to display all the commands supported.

Installing extensions for GREMLIN language v.1.4

> 

The database usually located in $ORIENTDB_HOME/database. Although you may have filesystem and memory as a URL for connection, this time we will use only remote to connect to the database. Here is how you will do it and how you will display information about the database:
> connect remote:localhost/tinkerpop admin admin
Connecting to database [remote:localhost/tinkerpop] with user 'admin'...OK

> info
Current database: tinkerpop (url=remote:localhost/tinkerpop)

Total size: 1.48Mb

Cluster configuration: {
    }

CLUSTERS:
----------------------------------------------+------+---------------------+-----------+-----------+
 NAME                                         |  ID  | TYPE                | RECORDS   | SIZE      |
----------------------------------------------+------+---------------------+-----------+-----------+
 index                                        |     1| PHYSICAL            |        11 |Not supported |
 orole                                        |     3| PHYSICAL            |         3 |Not supported |
 ouser                                        |     4| PHYSICAL            |         3 |Not supported |
 default                                      |     2| PHYSICAL            |         0 |Not supported |
 ographvertex                                 |     6| PHYSICAL            |       813 |Not supported |
 ographedge                                   |     7| PHYSICAL            |      8051 |Not supported |
 orids                                        |     5| PHYSICAL            |      3080 |Not supported |
 internal                                     |     0| PHYSICAL            |         3 |Not supported |
----------------------------------------------+------+---------------------+-----------+-----------+
 TOTAL                                                                               0 |        0b |
--------------------------------------------------------------------------------------- -----------+

CLASSES:
----------------------------------------------+---------------------+-----------+
 NAME                                         | CLUSTERS            | RECORDS   |
----------------------------------------------+---------------------+-----------+
 ORIDs                                        | 5                   |      3080 |
 OUser                                        | 4                   |         3 |
 OGraphEdge                                   | 7                   |      8051 |
 ORole                                        | 3                   |         3 |
 OGraphVertex                                 | 6                   |       813 |
----------------------------------------------+---------------------+-----------+
 TOTAL                                                                    11950 |
--------------------------------------------------------------------------------+

INDEXES:
----------------------------------------------+------------+-----------------------+----------------+-----------+
 NAME                                         | TYPE       |         CLASS         |     FIELDS     | RECORDS   |
----------------------------------------------+------------+-----------------------+----------------+-----------+
 edges                                        | NOTUNIQUE  |                       |                |        89 |
 vertices                                     | NOTUNIQUE  |                       |                |       977 |
 dictionary                                   | DICTIONARY |                       |                |         0 |
----------------------------------------------+------------+-----------------------+----------------+-----------+
 TOTAL = 3                                                                                                 1066 |
----------------------------------------------------------------------------------------------------------------+


> 

To shutdown the server, give shutdown.sh command from the shell:
$ shutdown.sh 
Sending shutdown command to remote OrientDB Server instance...
Shutdown executed correctly

You will see this one from server console:
2012-02-14 10:47:50:750 INFO [OLogManager] Received shutdown command from the remote client /127.0.0.1:57411
2012-02-14 10:47:50:750 INFO [OLogManager] Remote client /127.0.0.1:57411 authenticated. Starting shutdown of server...
2012-02-14 10:47:50:751 INFO [OLogManager] OrientDB Server is shutdowning...
2012-02-14 10:47:51:497 INFO [OLogManager] Shutdowning handler graph...
2012-02-14 10:47:51:498 INFO [OLogManager] Shutdowning handler default...
2012-02-14 10:47:51:499 INFO [OLogManager] Shutdowning handler automaticBackup...
2012-02-14 10:47:51:499 INFO [OLogManager] Shutdowning connection listener 'ONetworkProtocolBinary /0.0.0.0:2424:'...
2012-02-14 10:47:51:499 INFO [OLogManager] Shutdowning connection listener 'OClusterNetworkProtocol /0.0.0.0:2434:'...
2012-02-14 10:47:51:500 INFO [OLogManager] Shutdowning connection listener 'ONetworkProtocolHttpDb /0.0.0.0:2480:'...
2012-02-14 10:47:51:500 INFO [OLogManager] OrientDB Server shutdown complete

To conclude, there are still some areas that maybe need more attention. People in IT love standardization, although sometimes they create more than one standard for one purpose. UnQL, the standard for document-oriented is alredy discussed while there are still a little bit flux in Graph Database world. Someday maybe people start to discuss this issue and come up with the result. Well, hopefully :)

1 komentar:

  1. Take a look on *DEX graphDB http://www.sparsity-technologies.com/dex

    BalasHapus