Custom ETL - Getting Started

CEO, Portable

Analytics teams commonly need data from bespoke applications delivered to their data warehouse. Custom ETL pipelines are likely the answer. Where do you start?

There are 3 business reasons to justify a custom ETL pipeline.

Data teams create value from data in 3 ways. Custom ETL pipelines are necessary in the following scenarios.

  1. Analytics - You're building a dashboard. To do so, you need data from a bespoke system.
  2. Automation - You're automating a manual task. You need data from a long-tail system.
  3. Data Products - You're trying to generate revenue. You need to offer a no-code connector to a particular tech platform.

It is clear that there is value to create; however, to build the dashboard, automate the workflow, or power the data product, you realize the solution relies on data from a long tail system.

What types of systems are long tail? When do you need custom ETL?

There are 3 types of systems to which the top ELT tools typically do not build or support connectors.

Vertical specific ETL connectors

Specific industries are powered by a fragmented ecosystem of long tail applications. If you work in these verticals, it's likely that you will need custom ETL pipelines.

For data teams in eCommerce, real estate, cannabis, finance, etc. long tail connectors are critical to unlocking a 360 degree view of your organization. While most ETL companies support CRM systems and databases that are used across verticals, niche industries tend to gravitate towards nuanced technologies purpose-built for the vertical.

Business unit specific integrations

HR teams need insights. Security teams need insights. Marketing teams need insights. In each business unit, teams might have their own analytics function, or they might rely on a centralized data team for analytics to answer business questions.

When you want to do a deep dive into the talent acquisition and retention pipeline, you need data from HR specific applications. These connectors are difficult to find (or might not exist yet) and fit squarely into the long tail where a custom ETL pipeline is necessary.

Connectors to nascent APIs

Even the largest companies build new APIs. In many scenarios, these aren't picked up by ETL vendors in a timely fashion. They might be waiting to see further adoption and maturity before investing the time developing and maintaining the connector. Or they might have the integration in their backlog, but it's going to take a while to build the connector.

That's great, but if you need insights now, you need a solution.

Before getting started, make sure you can't find an off-the-shelf connector first.

Before you evaluate custom ETL solutions, double check you can't find an off-the-shelf connector from your current ETL vendor.

If they don't have the connector you need, you need to find another solution.

What are your options? It's the common 'Build vs. buy' dilemma for custom ETL.

  1. You could write code yourself
  2. You can find a service provider to build the custom ETL pipeline for you

You have 6 paths forward for custom ETL.

Option #1 - Write code - and manage infrastructure - from scratch.

You can always develop custom ETL connectors from scratch. Is it worthwhile? Very rarely. In most scenarios, someone else has already written a framework, created a scaffold, or is willing to take on the development work so you don't have to.

Option #2 - Use an open source framework.

Open source frameworks are great at providing structure when you want to build your own ETL connector. They are particularly useful for companies in regulated industries like Healthcare and Financial Services, where you have to write code in-house (vs. using a cloud-hosted solution), but you'd like to have a starting point.

Option #3 - Hire a consultant.

Data consultants are great. Not only can they help you build custom ETL connectors, but they can also help create data models, develop dashboards, and assist in architecting your data stack. The problem with using a consultant for custom ETL connectors is that they are 1) expensive, and 2) ephemeral. This means if you move on, or stop paying them, you need to rebuild your custom ETL connector, or find someone else to maintain it.

Option #4 - Use serverless infrastructure like cloud functions.

This does not solve the entire problem. It's just a solution for not managing infrastructure. Serverless technology is simple, but you also need secure authentication, monitoring, alerting, retry logic, pagination logic, and more. This can help if you absolutely need to build in-house, but you don't want to manage infrastructure.

Option #5 - Use Airflow.

Airflow provides structure to an otherwise unstructured problem. Scheduling, orchestration, and stringing together requests, responses, and downstream actions can be handled with Airflow, but you still need actual integration logic. Airflow is simply a piece of the puzzle similar to serverless infrastructure.

Option #6 - Use Portable.

Portable specializes in building custom ETL connectors for clients. If you need a random, bespoke, long-tail system connected to your warehouse, just reach out. We build API to warehouse connectors on-demand, and can turn around production-grade SaaS hosted ETL connectors in a matter of hours or days. We handle development, monitoring, maintenance, alerting, troubleshooting, and support. If something goes wrong, we're on call, so you can sleep well.

How do you get started with Custom ETL?

tl;dr. Send a quick email to [email protected] with the name of the system you need. We ship new connectors lightning fast!

The slides below outline how simple it is to get started with custom ETL.


Want to learn more? Book time for a discussion or a demo directly on my calendar