MSU VISION 2020

View Original

Renaissance In Cloud Data Management

Haresh Kumbhani

During my recent business trip to the Netherlands, I began to ponder over the profound impact of Baruch Spinoza. Spinoza, a renaissance man and great philosopher, believed that nothing is intrinsically good or bad except relative to a particular situation. That set me thinking about databases, how they've evolved from the dark pre-cloud ages and how despite the wide variety of databases available in the market today, each has its own use.

Before Cloud (BC)

Enter the legacy of how we have viewed databases: The early days of computing, where the virtue of SQL databases ruled the data management field. In those lumbering times, databases were considered huge if the data grew to a few gigabytes. Then came the middle ages, when MySQL came up with an open source licensing model in 1995 and created the first ripple in the universe of data management. As data became larger, and more applications began to move to the cloud, a new movement in database technologies emerged. The march toward enlightenment in data management accelerated with big data analytics and cloud adoption. Compared to the MySQL era, the current state of affairs -- with the advent of cloud-first strategies, massive data objects and distributed data management -- truly feels like a renaissance.

Cloud Era (CE)

In our cloud era, data management is a complex web of database data stores and tiering.

Google, Facebook, Amazon and other companies have brought numerous NoSQL databases to the forefront with the popularity of MongoDB, HBase, Cassandra, CouchDB, DynamoDB, etc. Getting a handle on each of these, and figuring out why they are useful, is a huge challenge. To comprehend the underlying technologies and gain a broad understanding of which NoSQL database to use, CAP Theorem is a handy tool.

Aside from the ones listed above, newer types of databases such as InfluxDB and the TICK stack provide amazing new ways to store time-series data, which are great for IoT data.

Modern Cloud Management Strategies

In today’s times, data management needs to be broken up into many different dimensions. We should carefully consider the use of proven SQL databases in an agnostic way before jumping into selecting shiny new NoSQL databases. It is vital to understand the short- and long-term business strategy of data management and weigh competing priorities before selecting specific technologies. Whenever I want to evaluate a data management strategy, I use the following checklist to come to a decision:

• What are the security and compliance considerations of data?

• What is the short and long-term scalability of data?

• What are the types of data and its uses?

• What would be the frequency of schema changes?

• What should be the latency of data retrieval?

• What is the velocity of data?

• What is the variety of data?

• What are the requirements of data availability?

• What are the search requirements of data stored?

• How is data processed into information and insights?

• How is data analyzed and reported?

• Is data stored in a multi-tenant environment?

• What is the optimal cost of data management?

• What are the tiers of data management?

• What are the data management lifecycle requirements (backup/recovery)?

Technologists are comfortable with the use of more than one database in their cloud applications. This trend has accelerated with the use of microservices and containerization. Furthermore, most cloud applications are recognizing the need for separating tiers of data management. Such tiers include a UI caching tier, CDN tier, graph analysis tier, business tier, business analytics tier, security tier, reporting tier, IoT device tier and much more. Each tier can have its own data management strategy -- as long as data is protected, access is through REST APIs, etc.

Database As A Service (DBaaS)

These are exciting times in which a mature set of DBaaS options are emerging for SQL and NoSQL databases. For example, Amazon AWS Aurora provides MySQL and PostgreSQL database on the tap, and Instaclustr offers Cassandra as a service hosted on AWS.

Analytics As A Service (AaaS)

All of the big three cloud providers offer analytics as a service. The biggest hurdle to cloud analytics platform adoption is the fear of data security. AWS and Azure provide a robust set of data analytics services to mitigate this fear. Azure Analysis Services is SQL focused and provides Power BI for powerful visualization.

Graph DB Uses

The GraphDB revolution is enabling faster solutions that benefit from graph query, while helping accelerate the search of relationships based on adjacency principals such as in cybersecurity, recommendation engines, IT operations, networking, etc. For example, in a client’s IoT cybersecurity product, we used Apache Spark and Cassandra DB for the analytics tier, but the resultant data was organized in Neo4j Graph DB to allow further analysis of cybersecurity threats, followed by a MongoDB-based cybersecurity orchestration. This is a good example of tier-wise separation of data management, where best-of-breed databases were used for solving a very complex problem for a cybersecurity product.

IoT Database

With the advent of IoT-enabled applications, the sheer volume of data collected and processed from the device tier needs to handled in a specialized way. We have successfully used a relatively new and exciting open source database called InfluxDB that handles time-series data in a most efficient way. So appropriate applications can use InfluxDB and the associated TICK stack for data management: 

TICK Stack

ZYMR, INC.

Final Thoughts

Unlike his peers at the time, Spinoza was strongly opposed to traditional theological views. One of his famous quotes is, “Be not astonished at new ideas; for it is well known to you that a thing does not therefore cease to be true because it is not accepted by many.” Similarly, we are free to choose from multiple databases, select various tiers, break monoliths into microservices, and innovate by leveraging a variety of cloud data management tools and techniques in building modern cloud applications.