I thank you all for you replies on my post regarding the choice of a graph database engine for my big data project.
Regarding the replies about Cassandra: Yes, Cassandra is an option as storage back-end for Titan. But Titan and Hbase will be my choice for my prototype because of learning curve limitations. What I hope but didn’t prove yet is that I will be able to query HBase using NoSQL and make sense of the Titan database model in Hbase. Graphs offer great advantages to the relational model, but many times (No)SQL offers simplicity and results.
Continuous learning by adding two learning curves: Groovy and Titan/Gremlin.
Currently I am at the beginning of the learning curve of Titan and Groovy. I haven’t implemented my graph database in Titan yet. I have studied the blogs about importing data. For my database I should write a Groovy script. My RIPL Groovy in a terminal is already working but I would like the assistance of an IDE like IDEA InteliJ. Finding out how to connect my Apple OSX based InteliJ installation to the remote JVM of my Virtualbox/Ubuntu/HortonWorks/java environment is still progressing. It is one of the things on my “to do” list.
Titan Graph database design
Another issue I am thinking off is the design of my Titan based graph database. Labels of vertices cannot be altered. But labels offer advantages such as improved query speeds, etc. So what attributes to choose to implement a label? In my logical database model I have 2 types of entities/vertices: master data and production data. Distinction on label level on “master”/”production” level will not offer me the advantages of labels. Therefore I would like to implement labels on a lower level of granularity but what to use?
Secondly: Suppose I define on master data level two types of organizations, “public” and “private” than the implementations of these organization types in into 2 master data vertices would lead to many edges towards organization in my production data. This might result into a large amount of edges reducing the capacity of the graph database in Titan. I have to give my data model many thoughts.
Still I am convinced about my master data/production data concept. It will offer me the model to implement my functionalities. Even the back traversal possibilities of Gremlin I find interesting regarding the master data/production data option. But although we have a GUID for coding global objects I haven’t found a global classification system. So I am still thinking about my master data/production data model regarding Titan labels. So I have still to implement my own master data/production data model.
Extending my own capacity using AI
I realize that my personal and free time capacity might be not enough to implement a full software stack and an end-to-end process in a prototype myself.
I have therefore invested a few hundred bucks in the first release of an ebrain.technology’s eBrain license. They offer cognitive web browsing. Expanding my own graph database with deeplinking using this AI functionality is worth the investment. Ebrain.technologies promises to offer the user a team of eHumans that will improve the knowledge/capacity of the user. I will point an eHuman to my bookmarks/favorites of my webbrowser regarding big data technologies and software developments. These bookmarks are becoming a database in itself. I hope to develop an experienced software developer eHuman using learning on basis of cognitive computing.
For example java: Java is the lingua franca of the open source software development society. Social Media vendors like Facebook, Twitter and inkedIn offer example code to use their API in java code. Languages such as Scala and Groovy are based on the JVM. Java is a mandatory standard. Period. I hope that I can train a brain.technologies eHuman to a java developer level using “it’s” cognitive capacity. I will pont the eHuman to sites with java syntax and semantics information, but also to sites bringing java software architectures, design patterns, Git best practices and Stackoverflow Q&As regarding their java tag. I hope to train the eHuman to be able to produce java code, at least at CRUD level. If I ain’t able to reach this developer coding level than at least I expect that an eHuman will bring me better and faster answers than myself using Google and the search engines of technology sites regarding java software development.
And if I have implemented an eHuman to a satisfying level I will use the same approach to develop a team of eHumans, one for java but also one for Pig scripting, and one for Hive, and one for Hbase NoSQL and one for… etc.
So I am currently studying brain.technologies eBrain as much as possible,trying to understand their 100+ screenshots they offer as “documentation”currently. Their release of “retail” version 1 is expected on May 25th. Can’t wait to download.
Unfortunately the first release of ebrain.technologies is for Windows. I have invested in an OSX based iMac with a 6 core processor and 32GB of memory and a 3TB disk to assist me in my big data journey, I guess I will have to buy a Windows 8.2 or 10 license/CD that I will have to install in a VirtualBox virtual machine. (emitting a sigh). I do have a company laptop running Windows and I am entitled to install eBrain on 3 interconnected machines but I fear that the capacity of the company laptop is not enough for satisfactory running eBrain.
The journey continues
So my current focus is Artificial Intelligence (AI) and specifically the eBrain implementation of cognitive computing and learning and decision making. I feel that eBrain offers me the next level to improve myself, at least at the level of system design and development.
I will keep you posted. Thanks for your interest. Reply if you feel so.
I’ve been studying big data technologies for one year now. I do this for 3 reasons:
- Big data is the future, starting at my profession ICT but mainly impacting businesses of all other segments, from governance to retailers, from FMCG manufacturers to healthcare. In short: everywhere. Cloud technologies are important, mobile is the most important user interface now and in the future, and social media will gain more importance for consumer and businesses but all applications are or will be driven by big data. If I want to continue my career in ICT I better build knowledge and some experience with big data, right?
- I manage already a graph database myself since 2009. It is build in the mind mapping application TheBrain and it is my own marketing database. One could consider the background technology of TheBrain as a graph database engine, but the functionality of TheBrain is focused…
View original post 1,313 more words