The Future of Databases: Elasticsearch, Data Lakes, and Time Series

Article Summary

The future of databases is shifting fast, moving beyond relational models toward specialized architectures built for scale. This article covers four key trends: evolving CAP theorem trade-offs, cloud-based data lakes, Elasticsearch, and time-series databases. You'll gain a clearer picture of where database technology is headed.

When data professionals ask me how to keep their skill sets competitive, I direct them towards the horizon. How is the industry moving? Which technologies do they see less frequently on projects? Which technology stalwarts remain? A foundational piece of big data is the database, which stores, organizes, queries, and analyzes an organization’s data. Database management systems have long been ruled by relational models that run structured data (e.g. numbers and columns) and non-relational models used for unstructured data (e.g. videos, social media posts, etc.).

In Choosing Right Database for Your Enterprise, a new ebook I created with Udemy for Business, we look at why the database piece of your data puzzle requires robust data architecture knowledge, how to make the right database choice for your application, and the skills managers should prioritize for their team to gain expertise in building and scaling a data infrastructure.

As part of the ebook, I share 4 database trends I’m keeping a close eye on. While the relational versus non-relational choice will likely be around for some time to come, it’s always a good idea to have an eye toward the future when making technology decisions in the present. Download Ebook: Choosing the Right Database for Your Enterprise.

1. The CAP theorem is getting fuzzy

Can you have your cake and eat it, too? Recent advances mean you don’t necessarily need to make the usual trade-offs in the CAP theorem triangle. The CAP theorem states that a database system can only reliably support two of three properties — Consistency, Availability, and Partition Tolerance. A team must determine which property to compromise for the other two. For more information on the CAP theorem and how to use it when choosing a database for your organization: Download ebook here.

However, some big data experts have proposed replacing the CAP theorem with the PIE (platform flexibility, infinite scale, and efficiency) theorem as a better reflection of the trade-offs modern system architects must make. Amazon Redshift, for example, is a relational data warehouse that is fully distributed, horizontally scalable, and highly reliable. MySQL and PostgreSQL offer sharding mechanisms to access the benefits of non-relational databases. And already, most database systems can provide high availability, even if it’s due to a nominal trade-off with consistency.

2. The growth of data lakes: Making data structured

Increasingly, pools of unstructured data (such as CSV or TSV files) are being stored in large cloud repositories. These are called “data lakes.” Systems, such as AWS Glue, can impart structure and offer data queries by relational databases without making a copy of the data in the process. This approach offers the benefits of massively scalable, unstructured data, together with the ability to query that data as you would from a relational database.

Certify your team as big data experts with the AWS Certified Big Data Specialty 2019 course and practice exam.

3. Elasticsearch: Search-engine based databases

This type of non-relational database uses indexes to categorize data by its similar characteristics. A popular example of this is Elasticsearch, an efficient, scalable data store in addition to a capable search engine. Many organizations use Elasticsearch to store numerical data while using its “Elastic Stack” tool to visualize and analyze the data. The tool also includes machine learning capabilities to automatically identify anomalous data and tools for transferring data into Elasticsearch at massive scale.

Stay ahead of the competition. Learn more about Elasticsearch: Elasticsearch 7 and the Elastic Stack – In Depth & Hands On!

4. Time-series databases

Analyzing data for trends over time dictates the need to index your data by time, in time order. Like graph databases, time-series databases serve a specialized need — but it’s a common one. These systems are still emerging, but it speaks to a larger trend of using many different, specialized databases for the many different, specialized challenges your organization faces. Some time-series database vendors include InfluxData, kdb+, and Prometheus.

These emerging technologies are worth investigating to understand how they may best help your organization manage complex data and grow its data center capacity exponentially. Get a better understanding of foundational data architecture by downloading my latest ebook: How to Choose the Right Database for Your Enterprise.

eBook

Choosing the Right Database For Your Enterprise

The “Global Datasphere or the sum of all data created, captured, and replicated will reach 175 zettabytes by 2025. To …

Download eBook

Frank Kane

Udemy Instructor

View instructor profile

4.6 instructor rating

187,562 reviews

1,320,121 students

28 courses

Frank Kane is a Udemy Instructor and founder of Sundog Software. Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning.