With the rise of big data and new data sources, there's a pressing need to transform raw data on the fly.
The complexity of wrangling various structured and unstructured data sources has led to the rapid emergence of the Modern Data Stack. In addition, there's a need for data integration tools to centralize data from numerous cloud-based apps and bring sanity from the chaos across the organization.
Thankfully, there are several powerful data transformation tools like Portable to perform extract, transform, load jobs within your data warehouse.
Like many data engineers, you have a lot on your plate. You don't have time to script every single data transformation by hand.
Where do you get started when evaluating ELT tools?
Which data integration tools should you evaluate?
What is the fastest way to perform real-time data transformation?
We'll cover these questions and the best ELT tools to add to your modern data stack in 2023.
How ELT tools create value for users
Who uses ELT tools
What data can ELT tools extract
The most common data sources for ELT tools
And, the differences between ETL and ELT
ELT tools sync raw data from applications into a data warehouse or data lake to power data analytics, process automation, and product development.
ELT Definition: Extract, Load, Transform (ELT) tools offer no-code connectors that sync data from systems across the enterprise into a data warehouse or data lake. In ELT workflows, data transformation (typically via SQL) occurs once data lands in the target system.
Why Use ELT: ELT tools improve strategic decision-making (business intelligence), automate manual tasks, and help product teams to build data products. In some scenarios, ELT tools can be used during one-off data migrations.
Who Uses ELT Tools: Data engineers and data teams are in charge of data management and enterprise data infrastructure, including ELT data flows, cloud data warehouses, and the extract, transform, and load process more generally.
Common ELT Data Sources: ELT tools pull data from APIs, databases, cloud data warehouses, event logs, webhooks, and unstructured data sources like files. The most common data types for ingestion are product databases, CRM systems, ERP platforms, and HR applications.
With that said, what makes one data pipeline tool better than another? Read on to learn which ones are best for handling complex data workloads.
Connectors - Which SaaS data integrations do they support out of the box? Do they cover the core systems you need to stand up your data stack? Can they connect you to the long tail data stores and apps? Do the schemas include the data you need? Can they load data to your data warehousing solution? (Amazon Redshift, Snowflake, Microsoft Azure Synapse, Google BigQuery, SQL Server, AWS S3, etc.)
Roadmap - How fast can they build? What's their vision for growing with you as an organization? Are they responsive to requests? Will they handle the scalability of the big data workloads your company manages?
Pricing - How much does the solution cost? Does the pricing model align with your data profile? Are you charged for your data flows or your data volume?
Support - How do you know your data pipelines will be maintained? Who is 'on call' when things break? Do you have a direct line to someone that can solve your problem?
Security - How do companies approach security workflows, privacy, and compliance during data processing? How is authentication handled in the user interface? Does it align with your company's data processing needs?
Let's dig into the top 5 ELT tools on the market today.
The top 5 ELT tools are:
If you are ready to invest in an ELT solution, you need a starting point for evaluation. Below, we've outlined some of the pros and cons of the top ELT platforms on the market today.
Fivetran is the most established ELT tool on the market today. They were founded in 2012 and were one of the early players in the ELT market as the shift took place from ETL to ELT, and they provide a robust and reliable solution for core ELT connectors.
Fivetran provides reliable cloud-based pipelines for the largest databases and business applications (Oracle, Salesforce, etc.) - connecting these data sources to the common data warehouses and data lakes.
For enterprises, its Oracle support is decent but has some limitations. Its documentation states it doesn't support Oracle's Active Data Guard, or Oracle's physical standby instances and Oracle table names longer than 30 characters --- which can be an issue when you need redundant data pipelines.
In many scenarios, data teams with access to budget (it's not cheap) will use Fivetran to build their modern data stack with core connectors to the largest applications within the enterprise. As needs expand and long-tail business applications become essential, it's common for data teams to augment Fivetran with additional ELT capabilities.
Stitch played a similar role to Fivetran in the shift from ETL to ELT. In 2018, Stitch was acquired by Talend.
This has led to changes in the team and a divergence in the support model between Stitch-supported and community-supported connectors.
From a technical perspective, Stitch pioneered the open-source model for modern ELT with an open-source ETL tool framework called Singer.
Stitch allowed community members to build and maintain their connectors with commonly used languages like Python. This community has developed, but in recent years, it has seen less investment than other open-source communities.
It supports many widely used Amazon data integrations. It supports AWS Aurora, AWS RDS, AWS S3, and AWS Redshift.
Stitch is a cost-effective solution for small data teams that don't want to spend much money on an ELT solution but want a no-code vendor to provide core ELT connectors. As a tradeoff, when things go wrong, data teams work with the community to address issues.
Airbyte is a recent addition to the ELT landscape, and the company has quickly raised massive capital.
From a technical perspective, the Airbyte open-source framework is not dissimilar from the Singer framework developed by Stitch. It also supports working with several on-premises data flows.
For teams that want to deploy their infrastructure, build their connectors, and work with open-source code directly, Airbyte is the most well-capitalized solution on the market. The connector catalog is on par with Singer, but support levels and investment are on the upswing, while the Singer open-source ecosystem sees less investment.
Airbyte recently released a cloud solution that competes on the common cloud data warehouse connectors you'll find from Fivetran, Stitch, and other core ELT solutions.
Founded in 2011, Matillion has been solving data integration problems for large enterprises for over a decade. In addition to native ELT processes, one of the unique aspects of Matillion is that the entire solution can be deployed on-premises or in a cloud environment (even though the technology is not open source).
The enterprise flexibility, built-in drag-and-drop transformation capabilities, and deployment model can make Matillion less approachable than the other tools on this list, but great to get started with large enterprise use cases and data modeling.
Portable is focused on long-tail ETL connectors. As data teams aim to integrate source data from applications to their warehouse in near real-time, they often need to use bespoke connectors in a user-friendly manner.
Built from the realization that every ELT company was making the same 150 connectors, Portable has focused on building a cloud platform on which new custom ETL connectors can be created on-demand for clients in hours or days.
Portable now supports 450+ data sources that connect your business apps to several data warehousing providers.
So, even in scenarios where you use a data integration platform like Fivetran, Stitch, Airbyte, or Matillion, Portable is the perfect solution to provide a no-code experience to pull data from SaaS apps quickly. It's extremely simple to get started.
Even though Portable is the most recent addition to the ELT landscape on this list - with over 450+ connectors - Portable has more cloud-hosted, no-code connectors than every other company on this list.
You might have heard about ELT and ETL when researching data integration tools. So, what are the differences between them?
The ETL process has been a reliable method for handling data integration for decades. ETL tools extract data from one or more sources, transform data to fit a specific schema or structure, and then load the data set into a data warehouse or database.
The ELT process is similar, but it first transfers data into a target system and then applies transformations. ELT tools extract data from one or more sources, load data directly into a data warehouse or data lake, and then modify the data set for data analysis.
ETL has been the standard approach for data integration for many years, but it does have some limitations. For one, ETL tools require significant time and effort to transform and aggregate data before replication occurs. This can lead to slower data processing times and more complexity.
With the advent of cloud-based data warehouses and lakes, ELT tools have become more popular. ELT can extract data in its raw form directly, which means that data teams can skip the time-consuming transformation and aggregation steps. This leads to faster data processing times, less complexity, and a more streamlined data integration process.
Further Reading: ETL vs. ELT: Differences, Similarities, & Which to Choose
As cloud-based data warehouses become the norm for storing data, cloud ETL and ELT tools facilitate enable faster data processing times, giving rise to faster data analysis.
When selecting an ELT tool, consider the big picture of their data connector catalog, product roadmap, pricing, support, and security. Of course, we wouldn't blame you if you wanted a simpler approach to data integration — try Portable for free — you can sync unlimited data volumes at no cost.
As data needs evolve, your ELT tools will continue to improve to meet the demands of the modern data ecosystem.