Insight Vector: Reimagining the Database for the Machine Era, Christian Lutz - Crate.io
In this interview, we talk to Crate.io, a Momenta ventures portfolio company. Their core product CrateDB is a SQL database for managing machine data and analyzing it in real time. It has the versatility and scalability to handle the variety, velocity and volume of machine data–millions of structured and unstructured data points or log entries per second. Its use cases include smart factories and manufacturing, smart cities applications, geospatial tracking, and cybersecurity. We discuss the genesis of the company, the challenges of entering the database market as a young company, and the customer opportunities in the machine data era.
CEO and Co-Founder
Can you share a bit of your background?
My co-founder and I have backgrounds in enterprise software and experienced the challenges that come with the machine data explosion from the businesses we were working with. I started mechanical engineering in school in Vienna and started a software company right out of school. When my co-founder Jodok Batlogg was the CTO of studiVZ (the Facebook of Germany) he had to deal with 20 million consumers that were uploading photos to the site. This is where the idea emerged to create a database to deal with these types of volumes in a simple way.
There are a lot of DBs that are out there that do not support SQL, which is the lingua franca of databases. Roughly 87% of all apps are using SQL.
The idea was to take the concepts from NoSQL, such as Vertica, BigTable and other more pure companies like MongoDB and Elastic. While these are good technologies, you need to learn to use different apps and interfaces. We saw that the NoSQL scale out capabilities could merge with a SQL front end and manage the growth of machine data.
Entering database market? How did you think about the competition
This is the biggest challenge for a young DB company. We saw there were a number of companies that were dying, and we had super strong confidence that the market would have many databases for specific needs and this becomes more important over time. We saw there was not a distributed SQL query engine. Companies that focus on Hadoop translate SQL to Hadoop and run on top of it.
What’s the technological foundation for Crate.io?
We are standing on the shoulders of 4-5 open source projects. We are using Luzeme for indices and storage, Elastic Search for search and cluster management, FB Presto for the parser, Netty for the network communication protocol. We took all of these components and packaged them into a library and built a distributed SQL query engine on top of it. McAfee is using a 175 node cluster for 2 years in production.
What are the most relevant use cases?
The most interesting market is machine data use cases, and this is of course very diverse, and industrial IoT, which we use for manufacturing. The challenge is that they need to collect sensor data, high rates and connect with other data – operational data, ERP systems, quality systems – and you need to enrich this data in the data base to support very different types of data formats.
You have non-relational data from devices, relational data from business systems and geospatial data that we can combine together. We even support BLOB (Binary Large Object) data from large images. This required using three or more simultaneous systems to support all these different data types, and Crate is a single system.
Do your customers typically replace existing systems or focus on new projects?
It’s both. Very often an existing SQL application can reach its limits in scalability. In the case of industrial IoT use cases, we may replace a stew of technologies – there will be and RDBMS for firmware or topologies like MySQL, then there’s a document store like Mongo or Cassandra for sensor data, maybe even a third one for real-time analytics, Crate can replace all three systems.
In terms of scalability, compared to something like Cassandra, scaling is just a matter of adding nodes in a scale out fashion. It’s important that a database doesn’t get stuck to a platform. All you need is Linux and Java and you can run Crate anywhere – on AWS, Azure, on a laptop or other environment.
Can you talk about some of the customers you are most excited about?
ALPLA is a $4bn company that produces plastic packaging – such as every plastic Coca Cola bottle, or other bottles for shampoo and cosmetics. They built a manufacturing platform powered by Crate. There are already 17 factories connected, collecting RT sensor data from all of the equipment on the shop floor. The data is collected in the Crate Platform services powering an app called Mission Control that provides them the ability to monitor what is happening in real-time, so they can react faster than the factory itself might even realize. There is a real-time connection between the system and the platform. They have 180 factories in 45 countries and we are working to roll out on global basis. All of the required data is stored and there is enrichment, time series is built up so they can develop analytics over time. They can look not just at the stream with condition-based analytics, and also run queries on the real-time stream as well as the history. This allows them to compare a current condition with historical data at the same time. Crate is doing away with the separation between the operational and analytical data stores – we think these things have to be together. This makes it really easy for the app developer to work as well.
McAfee is another big customer – they have a cloud business unit that was the former SkyHigh. They have 40% of the Fortune 500 as customers with 700 customers on the platform. This is a security platform where the devices look on the network level analyzing who is sending packets to whom on the network. They are storing 10bn records per day with over 150 nodes. There are log files and security records – the app framework was built on MySQL and we replaced MySQL and Elastic Search.
What do you see coming up?
I think we are at the beginning of the machine data era. A lot of companies are talking about it but it’s still very early. We think that every company regardless of size will need to deal with data, especially on the manufacturing side, a lot of companies will wake up and realize they will lose big to their competition. This is a big challenge for smaller companies, this will enable companies in Europe with specialized knowledge to compete against competitors in cheaper labor countries. There will be tens of thousands of companies that can take advantage of these technologies that are using SQL. Our partnership with Microsoft Azure provides a great way to help these companies solve their challenges.
Momenta Partners encompasses leading Strategic Advisory, Talent, and Investment practices. We’re the guiding hand behind leading industrials’ IoT strategies, over 100 IoT leadership placements, and 17+ young IoT disruptors. Schedule a free consultation to learn more about our Connected Industry practice.