Modern Data Stack: Use Cases & Components (2024)

What is the Modern Data Stack?

The Modern Data Stack (MDS) is an ecosystem of data tools that emerged as a result of the rise of the cloud data warehouse.

For basic use cases, the MDS allows data teams to replicate data into a data warehouse, transform the data, and visualize insights for data-driven decision-making.

Given the modular nature of the MDS architecture, the Modern Data Stack can also be used to power the most complicated of data pipelines - machine learning models, real-time production systems, and even client-facing products for end-users.

For both startups and enterprises, the opportunities are limitless if you have the right technology and people in place.

Use Cases for the Modern Data Stack

There are 3 use cases for the Modern Data Stack:

1. Data Analytics

2. Process Automation

3. Product Development

Data Analytics

Centralize data to empower business users with dashboards for better decision-making

Process Automation

Save time by automating time-consuming tasks and business processes for end-users

Product Development

Turn raw data into valuable data products that clients can purchase

How Is the Modern Data Stack Different From a Traditional Data Stack?

Unlike traditional data stacks that were built by in-house engineers, deployed on-premises, and involved custom components, Modern Data Stacks leverage a cloud data warehouse for processing combined with off-the-shelf components for connectivity, governance, transformation, and visualization.

Cloud services started to pick up steam in the 2010s, and as the tooling matured, data workloads started migrating from on-premises online analytical processing (OLAP) environments to cloud data warehouses.

With this shift, an entire ecosystem of components (i.e. tools in the Modern Data Stack) emerged to help companies move their analytics initiatives to the cloud.

Here is a detailed side-by-side comparison of a traditional data stack and a Modern Data Stack:

Capability	Traditional Data Stack	Modern Data Stack
Deployment	On-premises	Cloud-native
Scalability	Limited based on hardware	Big data (infinite scalability)
Architecture	Storage and compute are coupled	Separation of storage and compute
Use Cases	Analytics	Analytics, automation, machine learning, data products
Users	Engineers, data engineers	Engineers, data engineers, data scientists, data analysts, analytics engineers, business analysts

Components of the Modern Data Stack

The Modern Data Stack includes the following components:

MDS Component	Use Case	Example Tools
ETL / ELT	Extract data from databases and applications	Fivetran, Portable
Data collection	Create data from sites and mobiles apps	Snowplow, Segment
Real-time	Move and process data in real-time	Confluent, Striim
Data processing	Provide processing power for data pipelines	Snowflake, Databricks
Data transformation	Version control and structure complex query logic	DBT, Coalesce
Orchestration	Schedule jobs and handle dependencies	Airflow, Dagster
Reverse ETL	Sync data from warehouses to business apps	Hightouch, Census
Data visualization	Turn raw data into dashboards	Power BI, Tableau
Data governance	Measure and improve data quality	Collibra, Monte Carlo

Let's walk through each piece of the tech stack in more detail.

ETL / ELT

Job to be done: Data ingestion solutions (ETL, ELT) include connectors that extract data from data sources (i.e. PostgreSQL, LinkedIn, etc.) and load the data into a data warehouse. Instead of writing code yourself, no-code solutions offer more reliable, scalable, and simpler data pipelines.

ETL / ELT tools: Portable, Fivetran, Stitch, Hevo Data, CData, Matillion, Airbyte, Integrate.io, Blendo, Data Virtuality, Etleap, Precisely, Gathr, Skyvia, Dataddo, Kleene.ai, Rivery

Data Collection

Job to be done: Data collection tools make it simple to collect or create data from websites and mobile apps. Collection tools typically create schematized event streams that are delivered to your warehouse or data storage location (i.e. AWS S3, GCS, etc.) in real time. When your data analysts need data from your first-party platforms, it's probably time to evaluate data collection solutions.

Data collection tools: Snowplow Analytics, mParticle, RudderStack, Segment, Freshpaint, Heap, Piwik PRO, Amplitude, Tealium, Rakam, SnowcatCloud

Real-Time

Job to be done: Real-time data platforms transfer information from one system to another in a matter of milliseconds instead of minutes or hours. With stream processing, aggregations, joins, and advanced processing can take place while data is in motion.

Real-time processing tools: Confluent, HVR, Materialize, Striim, Meroxa, StreamSets, Decodable, Popsink, Qlik Replicate, IBM Infosphere, Amazon Kinesis, AWS DMS, AWS Glue, Google Cloud Dataflow, Talend, Oracle Golden Gate, Arcion, Gravity Data, Skippr, IOblend, Attunity, DeltaStream, Upsolver, Timeplus, Debezium, Kafka, Apache Nifi, Maxwell's Daemon, Streamkap, StreamNative

Data Processing

Job to be done: Data warehouses (as well as data lakes and lakehouse architectures) do the heavy lifting for your Modern Data Stack. While it is possible to power analytics without a data warehouse (by connecting a data visualization tool directly to a production database), most teams that are serious about becoming data-driven will put in place a data warehouse immediately.

Data warehouse tools: Snowflake, Google BigQuery, Amazon Redshift, Azure Synapse, Databricks, Firebolt, ClickHouse, Dremio, Starburst, Onehouse, Qubole

Data Transformation

Job to be done: Every data stack needs some way to turn raw data into insights. Typically a data transformation tool is introduced as your data processing requirements increase, as the number of data models becomes unwieldy, or as your SQL queries become ineligible. Whether you use an open-source transformation provider or a cloud solution, these tools can help you stay organized.

Data transformation tools: DBT, Coalesce, Narrator, Matillion, Mozart Data, Google Dataform, Datameer, SqlDBM, Reconfigured, Retable

Orchestration

Job to be done: Data stacks are complex. As you add more components, you need to keep everything running seamlessly. Orchestration tools tie into APIs from other pieces of the Modern Data Stack - They kick off work, manage dependencies, and track the lineage of data through your pipelines.

Orchestration tools: Airflow, Shipyard, Stonebranch, Orchestra, Dagster, Prefect, Astronomer, Argo, Luigi, Temporal, Mage

Reverse ETL

Job to be done: Reverse ETL solutions convert your data warehouse from an analytics engine (only powering dashboards), into an operational system of record. With off-the-shelf data integrations activating data from your warehouse to downstream business applications (i.e. Salesforce and other SaaS applications), you can use Reverse ETL to automate business workflows.

Reverse ETL tools: Hightouch, Census, MessageGears, Omnata, Octolis, Lytics, Polytomic, RudderStack, SeekWell, Rivery, Weld, Twilio Segment

Data Visualization

Job to be done: Data visualization tools are typically one of the first, and most important, components of a Modern Data Stack to be introduced. They turn raw data into metrics, metrics into dashboards, and dashboards into insights that help your company make better strategic decisions. Business intelligence teams can not live without a great data visualization tool.

Data visualization tools: Astrato, Bloom AI, Canvas, Columns, Datawrapper, Domo, GoodData, Google Data Studio, Glean, Graphext, Hex, Holistics, Hyperquery, IBM Cognos Analytics, Infogram, Knowi, Lightdash, Logi Analytics, Looker, Metabase, Microsoft Power BI, Mode, Observable, Omni, Plotly, PopSQL, Preset, Qlik, Retool, SAP Lumira, Sigma, Sisense, SQL Server Reporting Services, Streamlit, Superset, Tableau, ThoughtSpot, TIBCO Spotfire, Toucan Toco, Veezoo, Zepl, Zing Data, Zoho Analytics, Zoomdata, Whaly, Count

Data Governance

Job to be done: The newest (and currently most talked about) aspect of the Modern Data Stack is data governance. There are quite a few subcomponents here - data catalogs, policy enforcement, data observability, lineage, etc., but they all revolve around a focus on data quality. These tools are typically introduced later in the data lifecycle.

Data governance tools: Immuta, Metaplane, Monte Carlo, Castor, Bigeye, Atlan, data.world, Alation, Secoda, Privitar, Telmai, Kensu, Select Star, Ataccama, Collibra, Amundsen, DataHub, OpenMetadata, Labellerr, Anomalo, Great Expectations, Sifflet, re_data, BigID, Acryl Data

Setting Up Your First Data Technology Stack?

In addition to the modular components listed above, there are also end-to-end data platforms like Mozart Data that offer a solution encompassing many of the components you need.

For teams that are beginning their data journey, bundled solutions and data consultants can be a great way to get started quickly.

End-to-end data platforms: Mozart Data, Keboola, Nexla, Y42, 5x, Untitled Firm, Actiondesk, Panoply, Canvas, Selfr, DataDrive, Datacoves, CorralData, IOMETE, Shakudo, ActionIQ

Data consultants: Slalom, The Seattle Data Guy, Brooklyn Data Co., Upright Analytics, Bytecode IO, Leit Data, Meru, Big Time Data, Data Captains, On the Mark Data, Ternary Data, 4 Mile Analytics, Revolt BI, Analytics8, phData, 3pillar Global, Kubrick Group, FluenFactors, Deepskydata, MODACO, Signific

Want To Learn More About the Modern Data Stack?

Here are some of my favorite free resources to learn about the Modern Data Stack:

Scroll through Data Creators Club to find thought leaders to follow.
Read Fundamentals of Data Engineering
Watch YouTube videos from The Seattle Data Guy
Explore the Data Beats Community
Subscribe to David Jayatillake's Substack
Browse The Modern Data Stack repository
Watch videos or attend events for Data Driven NYC
Listen to the Data Ideas podcast with Dustin Schimek
Check out The Ravit Show
Subscribe to the Scaling DataOps Newsletter
Join the DBT Labs Slack community
Read Benn's substack
Sign up to attend Low-Key Data Happy Hours in NYC
Follow me on LinkedIn

How To Get Started With a Modern Data Stack (Start Today)

Portable is a cloud-based ETL / ELT tool - replicating data to Snowflake, BigQuery, Amazon Redshift, PostgreSQL, and MySQL.

We build the no-code ETL / ELT connectors that aren't supported by other platforms. The niche tools and industry-specific applications that every data team needs at some point.

Pricing is simple. Manually triggered syncs are free. Recurring data flows are $200 a month.

To get started with your Modern Data Stack, Portable is a no-brainer. Explore our 300+ connectors today!