My Big Data journey continues, hopefully with the help of Cognitive Computing and Artificial Intelligence

I thank you all for you replies on my post regarding the choice of a graph database engine for my big data project.

Cassandra

Regarding the replies about Cassandra: Yes, Cassandra is an option as storage back-end for Titan. But Titan and Hbase will be my choice for my prototype because of learning curve limitations. What I hope but didn’t prove yet is that I will be able to query HBase using NoSQL and make sense of the Titan database model in Hbase. Graphs offer great advantages to the relational model, but many times (No)SQL offers simplicity and results.

Continuous learning by adding two learning curves: Groovy and Titan/Gremlin.

Currently I am at the beginning of the learning curve of Titan and Groovy. I haven’t implemented my graph database in Titan yet. I have studied the blogs about importing data. For my database I should write a Groovy script. My RIPL Groovy in a terminal is already working but I would like the assistance of an IDE like IDEA InteliJ. Finding out how to connect my Apple OSX based InteliJ installation to the remote JVM of my Virtualbox/Ubuntu/HortonWorks/java environment is still progressing. It is one of the things on my “to do” list.

Titan Graph database design

Another issue I am thinking off is the design of my Titan based graph database. Labels of vertices cannot be altered. But labels offer advantages such as improved query speeds, etc. So what attributes to choose to implement a label? In my logical database model I have 2 types of entities/vertices: master data and production data. Distinction on label level on “master”/”production” level will not offer me the advantages of labels. Therefore I would like to implement labels on a lower level of granularity but what to use?

Secondly: Suppose I define on master data level two types of organizations, “public” and “private” than the implementations of these organization types in into 2 master data vertices would lead to many edges towards organization in my production data. This might result into a large amount of edges reducing the capacity of the graph database in Titan. I have to give my data model many thoughts.

Still I am convinced about my master data/production data concept. It will offer me the model to implement my functionalities. Even the back traversal possibilities of Gremlin I find interesting regarding the master data/production data option. But although we have a GUID for coding global objects I haven’t found a global classification system. So I am still thinking about my master data/production data model regarding Titan labels. So I have still to implement my own master data/production data model.

Extending my own capacity using AI

I realize that my personal and free time capacity might be not enough to implement a full software stack and an end-to-end process in a prototype myself.

I have therefore invested a few hundred bucks in the first release of an ebrain.technology’s eBrain license. They offer cognitive web browsing. Expanding my own graph database with deeplinking using this AI functionality is worth the investment. Ebrain.technologies promises to offer the user a team of eHumans that will improve the knowledge/capacity of the user. I will point an eHuman to my bookmarks/favorites of my webbrowser regarding big data technologies and software developments. These bookmarks are becoming a database in itself. I hope to develop an experienced software developer eHuman using learning on basis of cognitive computing.

For example java: Java is the lingua franca of the open source software development society. Social Media vendors like Facebook, Twitter and inkedIn offer example code to use their API in java code. Languages such as Scala and Groovy are based on the JVM. Java is a mandatory standard. Period. I hope that I can train a brain.technologies eHuman to a java developer level using “it’s” cognitive capacity. I will pont the eHuman to sites with java syntax and semantics information, but also to sites bringing java software architectures, design patterns, Git best practices and Stackoverflow Q&As regarding their java tag. I hope to train the eHuman to be able to produce java code, at least at CRUD level. If I ain’t able to reach this developer coding level than at least I expect that an eHuman will bring me better and faster answers than myself using Google and the search engines of technology sites regarding java software development.

And if I have implemented an eHuman to a satisfying level I will use the same approach to develop a team of eHumans, one for java but also one for Pig scripting, and one for Hive, and one for Hbase NoSQL and one for… etc.
So I am currently studying brain.technologies eBrain as much as possible,trying to understand their 100+ screenshots they offer as “documentation”currently. Their release of “retail” version 1 is expected on May 25th. Can’t wait to download.

Unfortunately the first release of ebrain.technologies is for Windows. I have invested in an OSX based iMac with a 6 core processor and 32GB of memory and a 3TB disk to assist me in my big data journey, I guess I will have to buy a Windows 8.2 or 10 license/CD that I will have to install in a VirtualBox virtual machine. (emitting a sigh). I do have a company laptop running Windows and I am entitled to install eBrain on 3 interconnected machines but I fear that the capacity of the company laptop is not enough for satisfactory running eBrain.

The journey continues

So my current focus is Artificial Intelligence (AI) and specifically the eBrain implementation of cognitive computing and learning and decision making. I feel that eBrain offers me the next level to improve myself, at least at the level of system design and development.

I will keep you posted. Thanks for your interest. Reply if you feel so.

Luc Bartkowski's Blog

Introduction

I’ve been studying big data technologies for one year now. I do this for 3 reasons:

  1. Big data is the future, starting at my profession ICT but mainly impacting businesses of all other segments, from governance to retailers, from FMCG manufacturers to healthcare. In short: everywhere. Cloud technologies are important, mobile is the most important user interface now and in the future, and social media will gain more importance for consumer and businesses but all applications are or will be driven by big data. If I want to continue my career in ICT I better build knowledge and some experience with big data, right?
  2. I manage already a graph database myself since 2009. It is build in the mind mapping application TheBrain and it is my own marketing database. One could consider the background technology of TheBrain as a graph database engine, but the functionality of TheBrain is focused…

View original post 1,313 more words

Tagged with: , , , ,
Posted in Big Data

Why I left Apache Spark GraphX and returned to HBase for my graph database

Introduction

I’ve been studying big data technologies for one year now. I do this for 3 reasons:

  1. Big data is the future, starting at my profession ICT but mainly impacting businesses of all other segments, from governance to retailers, from FMCG manufacturers to healthcare. In short: everywhere. Cloud technologies are important, mobile is the most important user interface now and in the future, and social media will gain more importance for consumer and businesses but all applications are or will be driven by big data. If I want to continue my career in ICT I better build knowledge and some experience with big data, right?
  2. I manage already a graph database myself since 2009. It is build in the mind mapping application TheBrain and it is my own marketing database. One could consider the background technology of TheBrain as a graph database engine, but the functionality of TheBrain is focused on personal or group based knowledge managent. It doesn’t offer query functionality, it cannot be extended, it’s a mind mapping tool, not a scalable graph database engine. So I was playing around a long time with the idea to migrate to a real graph database. To support my first objective, studying big data, I decided to migrate my graph database to big data technology.
  3. You never know, maybe I am going to launch a startup myself. I have this business idea that could grow to another social media platform. The business model is scalable, it can grow towards multiple target audiences and multiple geographic regions. Most important: I haven’t seen an implementation of my business idea yet. But it needs big data technologies, another reason for my study.

In search of a Graph Database Engine

The core of the applications described in the previous chapter is based on a graph database. I am an enthusiastic regarding graph databases. I dare to say that graph databases are one of the most important developments in the NoSQL arena. All mayor social media organizations, including Google, Facebook and LinkedIn are using graph databases to support their business. My TheBrain database is also a graph database. I therefore want to store my information in vertices and edges and I was therefore in search of a graph database engine.

Neo4j, awesome graph database functionality but limited to one server

When I Googled “graph database” Neo Technologies with their Neo4j graph database immediately popped-up.  I became acquainted with Neo4j when they were leaving their graph traversal API in favor of their Cypher query language, so I encountered twice a learning curve. But I noticed some excellent example graph database applications of Neo4j, including graph database visualizations. I learned to value Neo4j as a benchmark for storing and querying information in graph databases. Their textbook “Graph Databases” published at O’Reilly is still lying in my restroom. To consider Neo4j as a benchmark turned out to be a mistake as I learned later whilst studying Apache Spark GraphX. But although Neo4j is NoSQL, it is not big data. It is not horizontally scalable. Sure, I tried an implementation of Neo4j at Heroku, and I was able to scale vertically by growing the virtual machine. But big data is all about horizontal scalability and Neo4j couldn’t offer that flexibility. Secondly, Neo4j uses an ID formatted as a 64 bits Long to identify vertices and edges. 64 Bits offers a large addressing space but for my application I rather use an 128 bit identifier such as a GUID because I don’t know yet what addressing space I will need in the future. Later I discovered that also Apache Spark GraphX is using Long’s to identify vertices and within their edge definitions, limiting GraphX adressing space equivalent to Neo4j. However, within the Apache Spark development community an issue request has been registered to be able to use any field definition as a vertex identifier, including GUIDs, so my future addressing space problem would have been solved by apache Spark GraphX. Third, I want to stick to open source technologies and industry standards. I consider big data query languages such as Pig Latin and Hive as industry standards, they have been widely adopted by the big data community. Unfortunately Cipher hasn’t reached that status yet. Neo4j is not an option for my application.

Apache Spark

Whilst I was in search of a graph database engine for big data applications I ran into Apache Spark. I became acquainted with Apache Spark when I was looking at Apache Mahout for machine learning. Apache Mahout stopped supporting Map/Reduce on Hadoop: Mahout turned to Apache Spark as their data storage layer. Apache Spark seemed to offer all functionalities I needed: SparkSQL, a Hive lookalike in Spark to query rows and columns, GraphX as a graph database engine and MLlib for machine learning. And with their latest release Spark Streaming is also included. I use already streaming in my application by an implementation of SpringXD. And Spark is horizontally scalable, suitable for big data, it offers fast response times because of in-memory computing and, most important of all, is it growing very fast as a coming industry standard in big data technologies. It sounded all very promising, what could I need more? So I decided to go for Apache Spark as the base technology for my application. So I managed to migrate and store my graph database in Apache SparkX, running on top of YARN in my Hadoop system. But I learned that Apache SparkX would not be suitable for my application for 2 reasons:

  1. Apache Spark is heavily based on transformations of data. It is still a batch process although it runs very fast in-memory. But my application will include an interactive web based user interface. Response times have to be fast and limited. I doubt if Apache Spark GraphX could offer me the necessary performance, it’s another ballgame. Publications on the Web confirm my doubts.
  2. Apache Spark GraphX is ment to execute graph statistical functions such as PageRank. It is not ment to store and query information such as names and addresses. It doesn’t offer a graph query language such as Neo4j’s Cipher. This was the mistake I made when I left Neo4j looking for a big data graph database. I made the assumption that all graph databases would support graph traversal and/or querying graph databases. It was a mistake: graph databases One can traverse a GraphX graph but the implementation is using sending and testing messages to adjacent vertices. I consider it as too complex for my application. I studied also other GraphX functionalities such as subgraph, map and join. They all looked promising for my application but I am missing in GraphX a type of “union” function to join 2 subgraphs.
  3. Apache Spark GraphX is based on Spark’s RDD’s. RDDs are immutable. I need mutable vertices and edges in my application. My thoughts were to solve this issue modifying the source data of the graph, for example in HBase because HBase can be used in Apache Spark as a data source. But still, immutable RDD’s do not fit in the architecture of my application.

Back to the future: Apache HBase

For purposes of my big data study I looked already at HBase so I did know already that major social media organizations are using HBase to store and query their graphs. And originally he only information in my graph is a list of vertices including their properties and a list of “from-to”edges including their properties. I could imagine that I could store this information in a HBase database. Why I didn’t store my information in a HBase database yet? Because my ultimate goal was Apache Spark, including GraphX, and Spark doesn’t make a distinction between source data. For my application HCatalog, on top of my Pig Latin ETL implementation, offered me enough functionality to store data for usage in Apache Spark and inherit information such as field names and types. My learning curve was already steep enough for learning HBase as an additional big data technology. Apache HBase offers me the graph functionality I need:

  1. It is able to store graph databases, others have proven it successfully.
  2. It supports mutable data, including multiple, time related, versions of that data which I could use in my application.
  3. It suits big data including online response times. As you know I need response times suitable for online applications on web and mobile.
  4. It’s an industry standard that is widely adopted.

I haven’t given thought yet to my HBase schema design, e.q. what I will store in the column families and how I will implement (indexed) row keys. Logically I also have to develop the logic to store and query data in my HBase table. I hope to find some example implementations in GitHub. At least I have confidence enough in HBase that it will fulfill my graph database requirements, including a further implementation of Apache Spark GraphX as a graph computing engine.

Tagged with: , ,
Posted in Apache Spark GraphX, HBase

Gephi 0.8.2 on Apple OSX Yosemite

Intro

To my opinion big data is the most important development in the ICT industry. It was time for me to get a grasp on big data.

I want to get acquainted on the whole big data stack, “vertically” from HDFS via Yarn to applications such as Hive and HBASE and “horizontally” from ETL to Analytics and Visualization in an end-to-end process. To my opinion graph databases are offering much value in the NoSQL domain escaping the constraints of the relational database. So I started my NoSQL and big data journey experimenting with Neo4j. Neo4j graph database visualization is supported in Gephi natively. So that’s how I got acquainted with Gephi.

How I installed Gephi 0.8.2 on OSX Yosemite:

Before I executed the following 1-6 steps I had already installed Oracle’s Java 8 JRE and JDK on my Apple iMac running OSX Maverics and later Yosemite. After downloading the latest version of Gephi (0.8.2) I suffered the same problems as many others: the Gephi java app starts, stops and disappeared. At a sudden moment I had Gephi 0.8.1 running on OSX Yosemite including a fully functioning UI. But I wasn’t satisfied. Using the following procedure I managed to get Gephi 0.8.2 working on OSX Yosemite:

I used as a starting point the post of Sumnous at http://sumnous.github.io/blog/2014/07/24/gephi-on-mac/

But:

  1. I removed all (Oracle) Java 7 and 8 JRE and JDK installations on my iMac before I installed Java 1.6 using the link in Sumnous’ post.
  2. I removed all previous Gephi installations and all related directories and files including .dmg downloads using http://www.freemacsoft.net/appcleaner/ in conformance of an advice I found on StackOverflow.
  3. After the download and installation of Java 1.6 I downloaded Gephi 0.8.2 once again.
  4. I started the .dmg installation file and I dragged and dropped Gephi.app to the Applications folder in OSX Finder.
  5. In conformance of another StackOverflow post on the subject Gephi on Maverics I was advised to remove an “Gephi application support directory” on my iMac. I couldn’t find this directory. I guess this directory was already removed by AppCleaner in step 2.
  6. I edited the OSX file /Applications/Gephi.app/Contents/Resources/gephi/etc/gephi.conf in conformance of the advice of Summous. I use TextWrangler or Xcode to edit such text based configurations files on my iMac. For non-experienced *nix and OSX users: please note the double quotes in Summons post: The jdkhome entry in my gephi.conf file looks like this:

# default location of JDK/JRE, can be overridden by using --jdkhome <dir> switch
jdkhome="$(/usr/libexec/java_home -v 1.6)"

Then I started Gephi 0.8.2 double clicking the app Gephi. It started normally resulting in a full functioning UI including menus.

I also installed Oracle Java 8 JDK using the link http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html. A “java -version<enter>” command in a terminal window results into:

iMac:~ Luc$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

I am using the Java 8 JDK within Spring Tool Suite (STS) which is my default (Java) IDE. I didn’t ran into issues or problems developing/building java applications using STS.

Hope this helps.

Tagged with: , ,
Posted in Big Data

Lack of industry standards for Big Data

To my opinion big data is the most important development in the ICT industry that will have the biggest impact on businesses and society. I dare to say that big data will be as impactful and disruptive as the introduction of Internet at the end of the last century. But the longer I dig into the world of Big Data, the more it is clear to me that industry standards aren’t settled yet.

One might say “that statement is an open door you are trying to kick in”. That might be true but let me take you back to the early days of RDBMS engines from organizations such as Oracle, Sybase, Informix, Microsoft, Ingress and Progress. All vendors supported from day 1 the same relational model including a dialect of ANSI/ISO SQL. The SQL “select” statement is universal, also in those days. I remember also that one vendor supported triggers or stored procedures in their database whilst others not. Still there was a kind of industry standard which is still applicable today. This is not the case regarding big data.

Sure, Hadoop is mainstream technology regarding big data, including distributions like from Hortonworks, Cloudera and Pivotal. But there is new life beyond Hadoop like Apache Spark: Same kind of distributed big data computing, offering features like streaming, machine learning, a SQL interface, and even a graph database. Hadoop’s functionality might be reduced in the future to one of its core functionalities: HDFS, large file storage.

Personally I love graph databases, because it manages relationships between entities. Apache’s Spark GraphX is using Bagel, a Pregel dialect. Mind you, graph database companies like Neo4j offer a useful noSQL graph database engine, although their query language Cypher is yet another implementation of a DDL and DML for graph databases, let go Facebook’s API for their own graph database. There is currently no standard for a noSQL graph databases or related languages.

To support streaming data for big data applications multiple solutions exists such as Apache Spark and SpringXD. Also in this case: same kind of streaming data capture, (pre)processing and output options but completely different implementations of the same functionality.

Regarding machine learning (ML): Apache Spark brings MLib, whilst I was focussing to learn Apache Mahout. MLib on Sparc appears to run faster than Mahout on Hadoop using Map/Reduce. But today Mahout is also ported to Spark. So which ML implementation should one use, which architecture is future proof?

My conclusion: The open source big data projects and big data technology vendors are still striving for market dominance and haven’t settled yet for industry standards. Big data is still too much in a development stage although many applications are already running in production.

Regarding graph databases: I forecast that technologies like GraphX will eventually conquer the graph database market because the data doesn’t need to be Structured, which is to my opinion the most important feature of noSQL.

Posted in Uncategorized

Disruptive times

Disruptive times

I believe we are living in disruptive times. Things will never be the same anymore due to what Gartner is calling the Digital Industrial Economy and my employer Capgemini the Digital Transformation of our society in a global economy. During this era a lot of household incomes are already at stake. I forecast that many household incomes will follow. Other journalists and bloggers have published about the vaporizing middle class in the US but these publications are related to the export of manufacturing from the US to countries like China. Or, like in the EU, to the economic crisis. You haven’t seen anything yet: we will have to cope with the Digital Transformation of society.

The pace of Digital Transformation

Mind you, I do not wish to sound pessimistic. Society will survive, it has survived other disruptive events like manufacturing on a large scale using steam engines and the invention of the computer. In the first case society changed from agricultural to industrial. In the second case a lot of administrative clerks were not needed anymore. These changes took a while. This time it will be different. We not only have to adopt and digest the Digital Transformation, we have to do it in a pace we have never experienced before.

What is Digital Transformation?

Well, the Digital Transformation is the result of what is so popular today: Social Media, Mobile, Big Data and Cloud technologies. These technologies are changing the world for a lot of professions. In many cases these professions are not needed anymore. It’s a matter of time. Let me give you some examples.

Automotive

Being a petrol head myself I am well aware of the fact that during my life I have experienced the growth of the car for the masses based on petrol engines. And hopefully I will experience also the result of oil wells drying up. Sure, we will exchange our petrol cars with cars with other, environmental friendly engines that can run without petrol. But there’s more: Google and some car manufacturers are testing driverless cars. These tests will be successful. Cars with driverless technology will be sold and become mainstream. And because these cars will have a positive impact on the amount of road accidents governments will forbid by law human drivers in the end. These laws will not only be applicable for personal cars but also forbid human truck drivers. Transportation companies will love this development: Driverless trucks don’t have to rest. They will keep rolling on offering a better business case. But a complete profession, including truck drivers, bus drivers, taxi drivers will not be necessary any more. And all hotels, motels, restaurants, convenience stores and other facilities supporting the current driver population will also be out of customers. Also passengers, including “driving dad”, will take less use of these facilities. They can take a nap or have a simple dinner while they are on their way. Driverless technology’s is a result of the Digital Transformation. It’s foundation is Big Data.

Central and local government

When a Dutch citizens needs a new passport, driver license or birth certificate today, he or she will have to go to a governmental office to obtain such official document from a civil servant. When do you think that these documents will be electronically distributed? We have already for some time the necessary technology like public key encryption, XML and chipcard. My guess is that it won’t take long. Dutch government official policy is to become a Digital Government. Many Dutch citizens do have already a DigiD, a Digital Identity, to identify themselves  submitting there income tax form. One of the results of the Digital Transformation of governments will be that the profession of civil servants will be decimated. They will not be necessary any more.

Retail

In the Netherlands Retail is experiencing loss of revenues for some years. May shops are already closed. Some argue that this is a result of the economic crisis hitting the Netherlands the hardest in the EU. That might be true, but Etail is still growing, also in the Netherlands. Retail revenues are obtained as Etail revenues in 2013. I forecast that the Etail percentage will grow. Let’s assume the growth of Etail will stop at 50% of all Retail revenues. What are we going to do with 50% of the shops? This development is already visible in the Netherlands. Cloud services like Spotify killed the business of a local but famous retail chain of CD stores. A lifestyle department store decided in 2013 to close many of their Retail locations and withdraw to their initial footprint in the largest cities in the Netherlands. Some forecast that 30% of all Retail outlets in the Netherlands will close in 2015. To my opinion again a profession is at stake, from the shop manager to the supermarket shelf filler. We will not need their services in the same amount any more like we used to need.

ICT

A lot of people are employed in the ICT industry, also in the Netherlands. Many are still employed at an information technology company or in a ICT organization of companies and organizations in other branches or they are self employed. They still fulfill roles such as system manager, hardware engineer, software developer or project manager. But we will migrate most systems to the cloud in the next era. We will drop local or server installations of Microsoft Office and migrate to cloud based office solutions. Next decade we will look at Microsoft Office like we look today at WordPerfect, or Wang, or Windows 98 or the Macintosh. The result? The ICT profession will be at stake. We will not need those engineers, consultants and managers any more to install, implement and maintain local systems like we use to do. Next decade we will  obtain ICT services like we obtain energy or telecommunication services today: from a socket in the wall or via the air.

Education

Obtain any knowledge, location independent, and whenever you want it. College classes will be presented like TED presentations. You will execute your exams online and you will immediately know if you have passed the test, a fact that will be automatically published on your LinkedIn CV if you authorize the app. Cloud, Mobile and Big Data will take care of the distribution of knowledge. Coursera is a pretty good start. It will be common education practice in, again, the coming era. Universities have already started with distance learning. Colleges will follow. Students will select their favorite teachers online and on the fly because there’s no limitation to a digital classroom. We will not need many teachers anymore.

Press and publishing

Anybody can be an online publisher. WordPress and Google+ will grow into global newspapers. Advertisers will choose in which posts of which bloggers, including professional journalists, they want to advertise. Recently a Dutch publisher decided to spinoff or stop their business regarding many paper magazines including titles such as Playboy. We will not need paper publishing anymore as we used to as a result of the Digital Transformation. No print, no physical distribution, no retail sales. We don’t need it anymore.

Aging society

You might argue that the Digital Transformation of society will only impact the professions mentioned before. Again: I don’t think so. I believe that the Digital Transformation of society will impact all countries, governments, businesses, citizens, professions, jobs and finally every live. As a result a large portion of the middle class will vaporize.

One of the Dutch government’s priorities is focusing policy on the aging society. Many fear that when the baby boomers will retire within the next decade we will not have enough youngsters to fulfill the baby boomers jobs and productivity. I don’t think so. Most of these jobs will be made obsolete by a Digital Transformation to Cloud, Mobile and Big Data. Baby boomer teachers will be replaced by online conference meetings. Baby boomer civil servants and their managers will be replaced by Mobile application forms.

I don’t see any “Digital Transformation” awareness at our western governments. They don’t make and implement any policies anticipating future impact of the Digital Transformation.My questions is: will Digital Transformation wait for the retirement of the baby boomers? What do you think?

Tagged with:
Posted in Big Data, Cloud, ERP market developments, Mobile, Social Media

Does personality prediction using big data makes sense?

This week I watched a television documentary regarding Big Data from the Dutch broadcaster VPRO. In this Big Data documentary thought leaders like Alex (Sandy) Pentland, Brian Dalessandro, Jaron Lanier, Matthew Hogan, Michal Kosinski, Stephen Wolfram and Viktor Mayer-Schönberger were interviewed. Their insights and thoughts were very interesting.

In this Big Data documentary Michal Kosinski, deputy director of The Psychometrics Centre of the University of Cambridge, stated that they could predict The Big Five factors of a Facebook user on basis of his or her Facebook’s Likes. He even stated that they could also predict characteristics such as gender, sexual orientation and if the parents of these users are divorced.
Stiff statements, don’t you think so? If our “Likes”, in relationship to each other and in relationship to the “Likes” in our network and the “Likes” in the rest of Facebook, disclose these personal details then it’s time to use this knowledge.

Luckily Michal Kosinski gave us a tool to play with. The tool predict your personality expressed in the Big Five Factors on basis of your Facebook Likes. So I used the tool and this is my result: YouAreWhatYouLike.
I must say that I disagree with 3 out of 5 factors. I am a strong believer of the Myers-Briggs Type Indicator. My Type Indicator is ENTJ so you might understand why. My partner even disagrees with an additional factor regarding my personality, do I need to say more?

Of course this mismatch doesn’t proof anything. I have submitted no more than 25 “Likes” on Facebook. Maybe this amount is too small in order to generate a solid prediction. Maybe there is another reason.

So I would like to give Michal Kosinski some empirical proof. I therefore would like to ask you to run YouAreWhatYouLike using your own Facebook login. Do you agree with the result? Please answer the following poll. And please invite your Facebook friends also to run YouAreWhatYouLike and to answer the poll. The results of the poll are displayed in this post. Have fun.

Posted in Big Data, Social Media

What is the future partner business model of SAP Business ByDesign?

SAP Business ByDesign (ByD) hasn’t been launched in the Netherlands yet. Although ByD has been presented by SAP during the most recent Dutch SUG (SAP User Group) VNSG conference in April 2012, to my knowledge ByD is not officially available with a Dutch user interface and it is not listed in the Dutch SAP price list. My peers at SAP stated that ByD will be launched in the Netherlands in Autumn this year (2012).

I see already a lot of opportunities for ByD in the Netherlands, both in the SME (Small and Medium Enterprises) and the Enterprise market segments. As I stated before in a Tweet: I forecast a prosperous life of ByD in the Netherlands. I have 3 main arguments to support this forecast:

  1. CFO’s, in general a dominant group of decision makers for ERP investments, will be attracted by the financial benefits of an ERP solution delivered from the Cloud. They will like the opportunity to obtain an ERP solution that will offer a “pay for use” charging model that will enable them to convert CAPEX (Capital Expenditure) for such ERP solutions into OPEX (Operational Expenditure). This financial advantage is increasing in importance in this era in which investments are hard to finance. After the Credit Crunch late 2007 and “Double Dip” monetary developments in the EU, investment money, like Bonds, is expensive for corporations and even more hard to obtain.
  2. Dutch Enterprise grade companies, running on SAP’s enterprise solutions such as R/3, Business Suite 7 and/or ECC6, are currently already running projects that could be described as “process harmonization and system rationalization” projects. The decision makers at these companies don’t believe anymore that they can realize a positive business case by implementing their own variants of generic (secondary) business processes such as finance, supply chain management, logistics and human resource management. Neither they recognize that they will be able to realize a competitive advantage by maintaining there own implementations of these business processes. The costs to implement and maintain these specific business processes are to high in comparison with the tangible business benefits that these specific business process implementations will bring. That’s why these decision makers all starting projects to re-implement there ERP system with business processes based on best practices implemented by vendors such as SAP and system integrators like Capgemini. ByD is to be regarded as a natural development of this trend.
  3. Cloud and SaaS  (Software as a Service) solutions are already being used in corporations. Examples are Ariba, Capgemini IBX, Hubwoo, SuccessFactors, SalesForce.Com and others. I expect that corporations will eventually move other business processes to the Cloud when they gain confidence in the SaaS based delivery model.

To my opinion it is only a matter of time before almost all decision makers in SME and Enterprise grade corporations will come to the same 3 conclusions.

So the future is to ERP from the Cloud on basis of a SaaS delivery model. I forecast that in the next era, 2020 and beyond, ERP from the Cloud will be “business as usual” and the dominant ERP solution for SME and Enterprise grade corporations. Anybody who is defending another view on this matter, because of reasons such as security or resilience and risks for business continuity, I would like to present them a mirror from the past: These arguments were also used last Century when enterprises were using mainframe based systems running ERP like functionality. In 2012 hardly anybody is discussing generic ERP solutions, client/server and Internet based architectures and so on as a thread to their business continuity or risk. Time has proven that the arguments of the defenders of the mainframe based architectures in those days are not valid any more. History will repeat itself: The same will be applicable for Cloud and SaaS based ERP solutions such as SAP ByD.

So will SAP ByD be the “Holy Grail” ERP solution for every enterprise? Currently I must admit that this is not the case. Why? ByD does not offer yet a solution yet for all business processes, especially primary business processes. I will give you 2 examples:

  1. The business of a, in terms of annual revenues, major Dutch company is based upon large service contracts with their clients. The management and fulfillment of these contracts is one of the most important business processes within this company. They won’t be able to manage these contracts in ByD. Either ByD has to be extended in functionality or they should integrate a third party solution if they decide to base there next generation ERP on ByD.
  2. Another company is using specific equipment in order to fulfill their services at their clients. Management of this equipment is to be considered as a critical success factor for their business. At least their profitability is related to equipment management. They cannot manage their equipment with ByD. ByD doesn’t offer the required functionality.

So there’s room for improvement. In other words: ByD can be extended in order to fulfill the functional requirements of these companies. But who will extend ByD? SAP and/or SAP’s business partners such as system integrators like Capgemini? What will be SAP’s strategy and business model for such extensions and enhancements?

SAP’s current SME ERP solution is Business All-in-One. This ERP solution is based upon SAP’s enterprise ERP flagship R/3 / Business Suite 7 / ECC6 technology but with another license model which enables system integrators to resell this ERP solution together with industry or market segment specific enhancements. To my knowledge such partnership business model does not exist yet for ByD. Mind you, I am not referring to standard FIRE/RICEF (Forms, Interfaces, Reports, Enhancements) customizations that will be company/client specific. In this context I am referring to large functional enhancements that should be considered as IP (Intellectual Property) of the implementing ByD partner that can be reused and sold to other organizations in a specific market segment. Organizations that develop such IP and solutions do want to realize a ROI (Return On Investment). But how do these organizations will be enabled to charge their clients in order to realize their ROI?

  • Will these organizations/ByD partners charge their clients on basis of a traditional model e.g. an one time license fee and additionally a maintenance contract for their solutions? This model undermines the benefits of converting CAPEX into OPEX and will result into a barrier for the success of ByD and industry/market segment specific enhancements.
  • Will SAP start a ByD pricelist in which ByD partner specific solutions will be sold and charged by SAP within their ByD license payment scheme with kickback revenues towards the originating ByD partner? In other words: will SAP introduce a pricelist for such solutions like currently exists for OpenText technology that is an integral part of SAP’s Invoice Management solution for ECC6 but than based on a “pay per use” charging scheme? But partners will want to realize their ROI and a positive Business Case for their investments as soon as possible. Time is a critical factor within a Business Case. Are the partners willing to wait for their revenues as a result of a “pay for use” payment scheme? Or will SAP act like a investment bank and pay the ByD partner in advanced for their IP?
  • Will SAP enable ByD partners/resellers to charge their clients directly, including the SAP ByD license fee and a license fee for their own IP including kickback revenues to SAP?
  • Running additional software from ByD partners will require extra computing power. In the current ECC6 ERP world the addition of extra software will require extra “SAPS”, storage space, the lot. The ByD computing power is delivered by SAP. How will SAP charge the additional costs for extra computing power and the necessary maintenance to their clients and/or partners?
  • Will SAP introduce a ByD enabled PaaS (Platform as a Service) solution in order to enable ByD partners to develop, test, run, maintain and resell their enhancements without any architectural boundaries for their ByD partners and clients?
  • Last but not least a question related to the architecture of ByD: Is SAP running/maintaining a separate system instance for every ByD client that will enable ByD partners to load extra functionality e.g. software within the client’s specific ByD system instance? Or is SAP running some form of a single instance for all their ByD clients that prohibits client specific enhancements? I couldn’t find any SAP ByD documentation yet how to customize implemented business processes within ByD like we are used to customize business processes in ECC6 modules such as MM (Material Management) beyond the previously mentioned FIRE customizations. The lack of customization possibilities of implemented ByD processes could explain that SAP is doing the latter: running and maintaining a single system instance for all their ByD clients. In that case: will SAP run a certain type of “Apps Store” with pre-loaded and pre-configured ByD partner enhancements that will be available to all ByD clients on basis of a certain Cloud based “Switch” architecture like currently is implemented in ECC6 for on-premise SAP ERP systems?

Do you have the answers on these questions or a vision regarding this subject matter? Please let me know.

Luc Bartkowski

Tagged with: , , , , , , ,
Posted in ERP market developments