Building connectors might be an interesting technical challenge, but the goal of building a connector is fast and reliable data replication for the purpose of creating business value.
Large ELT platforms like Airbyte and Fivetran have coverage over the most popular connectors, but they're likely missing the long-tail source connector you need.
If you need a business-critical connector for your new source, should you build your own connector, or buy a managed solution?
You should consider building or buying a custom connector when FiveTran and Stitch provide incomplete coverage for your new source or when you are experiencing friction around database replication.
Airbyte offers ~300 connectors under its free plan.
Airbyte is an open-source data integration framework.
Features like Oauth, SSH, and User Management are not free and not open-source.
Airbyte charges for reading API data and reading data from database sources.
Portable builds custom API integrations on-demand for clients. You get a no-code experience without the hassle of custom development, the pain of ongoing maintenance, or the hefty price tags that come along with data integration consulting or ETL consultants.
We maintain connectors so that you don't have to worry about uptime or error handling and you don't need to worry about ingestion of data into your analytics environment.
Portable frees you to focus on learning, building, and creating value.
If you still want to build your own connector, below are more details on how to do it.
To build your own Airbyte connector, familiarize yourself with the following resources:
Airbyte's API docs
Airbyte's protocol specification
Airbyte source code on Github
Airbyte Connector Development Kit
The most critical component to creating a custom Airbyte custom connector is Airbyte's Connector Development Kit (CDK).
Using the Connector Development Kit, you can take the following high level steps to building your source connector.
The Connector Development Kit (CDK) is a framework for organizing concepts between and within sources and databases.
In general, Airbyte views Sources as broadly organized into HTTP-API based connectors (REST, GraphQL, and SOAP) and Databases (relational, NoSQL, and graph).
Destinations are organized as data warehouses, datalakes, and APIs in the case of reverse ETL.
The Connector Development Kit provides:
A Python framework for writing source connectors
A generic implementation for rapidly developing connectors for HTTP APIs
A test suite to test compliance with the Airbyte Protocol and happy code paths
A code generator to bootstrap development and package your connector
Before building your source connector, you'll need:
An API key for the source you want to build a connector for (and proper authentication)
Python >= 3.9
Airbyte has the following core concepts:
A Stream describes the schema of a resource like a database table or a resource in a REST API.
A Field is a column in a Stream like a database column or JSON object.
A Catalog is a list of Streams.
Before starting to build your connector, you should familiarize yourself with the following commonly used definitions.
Endpoints: The location of the server or service from which you are pulling data
OAuth: The standard protocol for providing authentication to users.
Pagination: Process in which large datasets are dividing into small chunks for purposes of consumption
Rate limiting: Limits to the number of calls made to an API
Schemas: Defines the structure of your API call and later your normalized table
Data decoding: Process to convert data into a readable format (JSON, XML, CSV)
Incremental data exports: Managing what data that was already synced and data that needs to be refreshed
There are 5 steps in creating an open-source custom connector with Airbyte's connector development kit (CDK):
Define Your Connector (Python, Java)
Define Your Stream
Read Data from Source
Maintain Version Control and Deploy Infrastructure
Generate a connector template and set up your developer environment.
If using an API, connect to your source API using OAuth to establish proper API authentication.
Define your API schema.
Define schemas for your normalized tables.
Process and decode data from API responses (JSON, XML, CSV etc.)
Test your code --- you should receive a large JSON object as the result.
Define a Catalog for your stream and Read Data from your source.
After building your connector, you will need to manage a GitHub Repo for your source connector. If you are contributing to Airbyte's connector repository, make sure you define tests according to their specifications.
Docker: Airbyte uses fully incremental Docker builds. Familiarize yourself with Docker and how Airbyte handles Docker images. Managing new modules may involve some complexity.
Kubernetes: On Kubernetes, ensure that you're able to set resource limits and parallelize jobs.
Containerize your connector when you are done.
When building your connector, you'll learn new pieces of information along the way.
Once you have your connector up and running, you may have to maintain it to manage new endpoints and new fields. The Community Github repository is a good resource to access tutorials when you run into issues.
In today's modern data stack, robust data replication is increasingly important. Technologies like DBT and Airflow now allow data engineers to transform data on regular schedules, but FiveTran and Stitch provide an incomplete EL solution for long-tail connectors.
Portable has 300+ long-tail source connectors to the most common data warehouses - Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL.
Start automating workflows today. Try Portable!