The Future of Databases: Elasticsearch, Data Lakes, and Time Series
When data professionals ask me how to keep their skill sets competitive, I direct them towards the horizon. How is the industry moving? Which technologies do they see less frequently on projects? Which technology stalwarts remain? A foundational piece of big data is the database, which stores, organizes, queries, and analyzes an organization’s data. Database management systems have long been ruled by relational models that run structured data (e.g. numbers and columns) and non-relational models used for unstructured data (e.g. videos, social media posts, etc.).
In Choosing Right Database for Your Enterprise, a new ebook I created with Udemy for Business, we look at why the database piece of your data puzzle requires robust data architecture knowledge, how to make the right database choice for your application, and the skills managers should prioritize for their team to gain expertise in building and scaling a data infrastructure.
As part of the ebook, I share 4 database trends I’m keeping a close eye on. While the relational versus non-relational choice will likely be around for some time to come, it’s always a good idea to have an eye toward the future when making technology decisions in the present. Download Ebook: Choosing the Right Database for Your Enterprise.
Empower your team. Lead your industry.
Find out how Udemy for Business can help your team keep up to speed on the latest database skills.
1. The CAP theorem is getting fuzzy
Can you have your cake and eat it, too? Recent advances mean you don’t necessarily need to make the usual trade-offs in the CAP theorem triangle. The CAP theorem states that a database system can only reliably support two of three properties — Consistency, Availability, and Partition Tolerance. A team must determine which property to compromise for the other two. For more information on the CAP theorem and how to use it when choosing a database for your organization: Download ebook here.
However, some big data experts have proposed replacing the CAP theorem with the PIE (platform flexibility, infinite scale, and efficiency) theorem as a better reflection of the trade-offs modern system architects must make. Amazon Redshift, for example, is a relational data warehouse that is fully distributed, horizontally scalable, and highly reliable. MySQL and PostgreSQL offer sharding mechanisms to access the benefits of non-relational databases. And already, most database systems can provide high availability, even if it’s due to a nominal trade-off with consistency.
2. The growth of data lakes: Making data structured
Increasingly, pools of unstructured data (such as CSV or TSV files) are being stored in large cloud repositories. These are called “data lakes.” Systems, such as AWS Glue, can impart structure and offer data queries by relational databases without making a copy of the data in the process. This approach offers the benefits of massively scalable, unstructured data, together with the ability to query that data as you would from a relational database.
Certify your team as big data experts with the AWS Certified Big Data Specialty 2019 course and practice exam.
Last Updated August 2020
Practice exam included! AWS DAS-C01 certification prep course with exercises. Kinesis, EMR, DynamoDB, Redshift and more! | By Sundog Education by Frank Kane, Stephane Maarek | AWS Certified Solutions Architect & Developer Associate, Frank KaneExplore Course
3. Elasticsearch: Search-engine based databases
This type of non-relational database uses indexes to categorize data by its similar characteristics. A popular example of this is Elasticsearch, an efficient, scalable data store in addition to a capable search engine. Many organizations use Elasticsearch to store numerical data while using its “Elastic Stack” tool to visualize and analyze the data. The tool also includes machine learning capabilities to automatically identify anomalous data and tools for transferring data into Elasticsearch at massive scale.
Stay ahead of the competition. Learn more about Elasticsearch: Elasticsearch 7 and the Elastic Stack – In Depth & Hands On!
Last Updated July 2020
Search, analyze, and visualize big data on a cluster with Elasticsearch, Logstash, Beats, Kibana, and more. | By Sundog Education by Frank Kane, Frank Kane, Coralogix Ltd.Explore Course
4. Time-series databases
Analyzing data for trends over time dictates the need to index your data by time, in time order. Like graph databases, time-series databases serve a specialized need — but it’s a common one. These systems are still emerging, but it speaks to a larger trend of using many different, specialized databases for the many different, specialized challenges your organization faces. Some time-series database vendors include InfluxData, kdb+, and Prometheus.
These emerging technologies are worth investigating to understand how they may best help your organization manage complex data and grow its data center capacity exponentially. Get a better understanding of foundational data architecture by downloading my latest ebook: How to Choose the Right Database for Your Enterprise.