Oct 15, 2018 | 9 min read

Portfolio Insights: Crate.io IoT Data Management Myths & Realities

At Momenta we bring a unique combination of experience to help founders and management teams meet the challenges of building and sustaining start-ups, enabling companies to realize their full potential. We're proud of our portfolio companies and in the coming months we will be featuring the insights and experiences of a portfolio company. It's a great opportunity to share their first hand knowledge with the Momenta Community. 

This post is guest written by Christian Lutz, CEO of Crate.io. Their core product CrateDB is a SQL database for managing machine data and analyzing it in real time. It has the versatility and scalability to handle the variety, velocity and volume of machine data–millions of structured and unstructured data points or log entries per second. Its use cases include smart factories and manufacturing, smart cities applications, geospatial tracking, and cybersecurity. CrateDB has been designed from the ground up to support the huge scale of web, mobile and IoT applications. Crate.io is a global company with offices in San Francisco, Berlin, New York, and Austria. The IoT Innovator Awards named Crate.io Best IoT Open Source Software in 2017, and they won the Disrupt Europe 2014 Battlefield.

christian_lutz

Christian Lutz

orange_line_emailCEO 
Crate.io

 

Smart Systems Have a Huge Appetite for Data

Recent technology innovations in artificial intelligence (AI), machine learning, and IoT are enabling the development of Smart Systems. Smart Factories, Smart Buildings, Smart Vehicle fleets operate more efficiently and more securely, enabling businesses to increase profitability and value for customers.

Smart Systems have a huge appetite for data. Pipelines of sensor data--often millions of readings per minute and dozens of message formats--are integrated and analyzed in real time in order to monitor, predict, or control the behavior of “things.” This allows businesses to make efficiency-improving course corrections many times per day, instead of a few times per quarter.

pasted image 0

Crate.io is in the business of helping companies meet the data management challenges that Smart Systems pose. Over the course of many customer engagements, we’ve observed a recurring set of IoT data management myths, which often lead smart system developers down the wrong path. I’ll share those with you now, and the realities, and I hope the lessons we’ve learned help you get your Smart System projects up and running and creating value sooner.

 

Myth #1. “This is not a new data problem”

Simply put, Smart Systems must analyze a fire hose of complex data in real time. At one Crate.io customer, a bottle manufacturer, their Industry 4.0 workload looks like this:

  • 1500 production lines, each producing up to 500,000 bottles per day
  • Multiple data points per bottle via 900 different sensor reading formats - bottle weights, photographic dimensional measurements, mold temperatures, machine state, and others
  • Real-time dashboards informing factory operations experts in “mission control” via time series, text search, and machine learning database queries.

The mistake some companies make, (including this manufacturer) is assuming their existing SQL database standard, (e.g., Microsoft SQL Server, Oracle, et al) can handle the requirements.

The reality for this manufacturer was that their SQL Server was unable to analyze the data fast enough, taking about 30 to 60 minutes to refresh the dashboards in “mission control.” The goal of the system was to discover production line inefficiencies (e.g., rising defect rates) and eliminate them within 5 minutes, so the long query time was unacceptable.

On top of that, the company ended up creating 900 different tables in SQL Server--one per sensor message structure type. This is time consuming, adding weeks to the development schedule, and contributes to slow query performance. 

Traditional SQL databases, while easy to use, are not built to query streams of machine data in real time, unless deployed on very expensive hardware.

 

Myth #2. “IoT requires a NoSQL database”

For many database technologists, smart systems’ extreme data requirements might feel like a “NoSQL” use case. NoSQL databases like Cassandra, Hadoop, or MongoDB may in fact be able to handle machine data use cases; they are renowned for their scalability and ability to deal with complex and wide-ranging data structures.

The reality is NoSQL databases can be harder to use and integrate than SQL. Due to lack of standards for NoSQL database access, any NoSQL choice results in a kind-of-lock-in, due to proprietary languages and storage formats, which makes it harder to work with the data, compared to a fully transparent and open ANSI SQL interface.

The other reality is that NoSQL databases are already being outmoded by newer SQL databases, like CrateDB, which combine the familiarity of SQL with the scalability and flexibility of NoSQL. There is no need to sacrifice SQL to meet real-time IoT data requirements.

 

Myth #3. “IoT is a time series data problem”

Recently, specialized time series databases like InfluxDB (also a NoSQL DB) have come into vogue. They excel at charting data over time, especially intense streams of data such as those seen in smart systems. The mistake companies make is choosing a time series database as their IoT data platform.

The reality is time series databases are limited in functionality. They reveal how data is changing over time, but are challenged to support a variety of analyses and data model changes that help you understand why data is changing over time.

For example, you might want to integrate HR or ERP data to learn whether production anomalies are linked to factory personnel on duty, or raw materials from certain suppliers, etc. This should be a simple matter of adding some columns to your database, or a table and joining it; in actual fact, data model changes often require users to have to recreate their time series databases entirely, which reduces your agility.

The work around is to use two (or more) databases...time series database, plus a separate (typically) relational database for non-time series data. The reality is, It’s a quick fix, but over time, as the database grows, it becomes expensive to duplicate data in multiple databases and keep them synchronized.

 

Myth #4. “We Can’t Start AI Until We Cleanse Our Data”

In some cases, people make the assumption that they lack the data or the data hygiene required to inform AI algorithms in Smart Systems. There’s a fear that bad data will lead to poor AI-driven automation.

In reality, most companies we work with use AI and machine learning to augment human decision making, not to replace it. So the fear that bad data will lead to factories running amok, is unwarranted. The best practices are to “cleanse as you go” by monitoring analytic outcomes and ensuring the data feeding them is sound. Trying to cleanse all data completely as a prerequisite project before Smart Systems development risks leading to analysis paralysis and delaying the benefits of IoT.

 

In Summary: Data Layer Choice is Critical to Smart System Success

Machine data analysis and data-driven automation is the key to smart system success. There are dozens of potential data management technology choices. SQL (Oracle)? NoSQL (MongoDB or Hadoop)? Time Series (InfluxDB)? Text/Log Search (Splunk or Elasticsearch)? Or some combination of databases?

The best choice will enable:

  1. Fast development and time to value
  2. Fast (real-time), actionable data analysis
  3. Constant up time
  4. Low IT operations (hosting, integration, and admin) costs

Smart system data management platform alternatives should be compared based on their ability to satisfy those four requirements. I hope the learnings shared in this post help you achieve your Smart Systems objectives quickly and fully.

If you have questions about IoT data platforms, or have other myths and realities you’d like to share, please visit us at crate.io or follow us on Twitter @CrateIO. 

orange-line

Momenta Partners encompasses leading Strategic Advisory, Executive Search, and Investment practices. We’re the guiding hand behind leading industrials’ IoT strategies, over 100 IoT leadership placements, and 17+ young IoT disruptors. Schedule a free consultation to learn more about our Connected Industry practice.