Scripting An Airbyte Custom Connector? Read This First!

Ethan
CEO, Portable

Building connectors might be an interesting technical challenge, but the goal of building a connector is fast and reliable data replication for the purpose of creating business value.

Large ELT platforms like Airbyte and Fivetran have coverage over the most popular connectors, but they're likely missing the long-tail source connector you need.

If you need a business-critical connector for your new source, should you build your own connector, or buy a managed solution?

When to Build an Airbyte Custom Connector?

You should consider building or buying a custom connector when FiveTran and Stitch provide incomplete coverage for your new source or when you are experiencing friction around database replication.

What is Airbyte's Connector Compatibility?

Airbyte offers ~300 connectors under its free plan.

Is Airbyte open source?

Airbyte is an open-source data integration framework.

Features like Oauth, SSH, and User Management are not free and not open-source.

Airbyte charges for reading API data and reading data from database sources.

Get the ETL Connector You Need with Zero Effort

Portable builds custom API integrations on-demand for clients. You get a no-code experience without the hassle of custom development, the pain of ongoing maintenance, or the hefty price tags that come along with data integration consulting or ETL consultants.

Check out our connector catalog or request a new connector.

We maintain connectors so that you don't have to worry about uptime or error handling and you don't need to worry about ingestion of data into your analytics environment.

Portable frees you to focus on learning, building, and creating value.

If you still want to build your own connector, below are more details on how to do it.

How do I Create an Airbyte Custom Connector?

To build your own Airbyte connector, familiarize yourself with the following resources:

  • Airbyte's API docs

  • Airbyte's protocol specification

  • Airbyte source code on Github

  • Airbyte CLI

  • Airbyte Connector Development Kit

The most critical component to creating a custom Airbyte custom connector is Airbyte's Connector Development Kit (CDK).

Using the Connector Development Kit, you can take the following high level steps to building your source connector.

What is Airbyte's Connector Development Kit (CDK)?

The Connector Development Kit (CDK) is a framework for organizing concepts between and within sources and databases.

In general, Airbyte views Sources as broadly organized into HTTP-API based connectors (REST, GraphQL, and SOAP) and Databases (relational, NoSQL, and graph).

Destinations are organized as data warehouses, datalakes, and APIs in the case of reverse ETL.

The Connector Development Kit provides:

  1. A Python framework for writing source connectors 

  2. A generic implementation for rapidly developing connectors for HTTP APIs

  3. A test suite to test compliance with the Airbyte Protocol and happy code paths

  4. A code generator to bootstrap development and package your connector

Terms and Prerequisites

Before building your source connector, you'll need:

  • An API key for the source you want to build a connector for (and proper authentication)

  • Python >= 3.9

  • Docker

  • NodeJS

Core Concepts

Airbyte has the following core concepts:

  • Stream

  • Field

  • Catalog

Stream

A Stream describes the schema of a resource like a database table or a resource in a REST API.

Field

A Field is a column in a Stream like a database column or JSON object.

Catalog

A Catalog is a list of Streams.

More Prerequisites: Familiarize Yourself with Common Design Patterns.

Before starting to build your connector, you should familiarize yourself with the following commonly used definitions.

  • Endpoints: The location of the server or service from which you are pulling data

  • OAuth: The standard protocol for providing authentication to users.

  • Pagination: Process in which large datasets are dividing into small chunks for purposes of consumption

  • Rate limiting: Limits to the number of calls made to an API

  • Schemas: Defines the structure of your API call and later your normalized table

  • Data decoding: Process to convert data into a readable format (JSON, XML, CSV)

  • Incremental data exports: Managing what data that was already synced and data that needs to be refreshed

4 Steps in Creating an Open-Source Custom Connector using CDK

There are 5 steps in creating an open-source custom connector with Airbyte's connector development kit (CDK):

  1. Define Your Connector (Python, Java)

  2. Define Your Stream

  3. Read Data from Source

  4. Maintain Version Control and Deploy Infrastructure

Define Your Connector (Python, Java)

Generate a connector template and set up your developer environment.

If using an API, connect to your source API using OAuth to establish proper API authentication.

Define your API schema.

Define your Stream

Define schemas for your normalized tables.

Process and decode data from API responses (JSON, XML, CSV etc.)

Test your code --- you should receive a large JSON object as the result.

Read Data from Source

Define a Catalog for your stream and Read Data from your source.

Maintain

Version Control

After building your connector, you will need to manage a GitHub Repo for your source connector. If you are contributing to Airbyte's connector repository, make sure you define tests according to their specifications.

Deploy Infrastructure
  • Docker: Airbyte uses fully incremental Docker builds. Familiarize yourself with Docker and how Airbyte handles Docker images. Managing new modules may involve some complexity.

  • Kubernetes: On Kubernetes, ensure that you're able to set resource limits and parallelize jobs.

Containerize your connector when you are done.

Maintain
  • When building your connector, you'll learn new pieces of information along the way.

  • Once you have your connector up and running, you may have to maintain it to manage new endpoints and new fields. The Community Github repository is a good resource to access tutorials when you run into issues.

How to Build Thousands of Connectors

In today's modern data stack, robust data replication is increasingly important. Technologies like DBT and Airflow now allow data engineers to transform data on regular schedules, but FiveTran and Stitch provide an incomplete EL solution for long-tail connectors.

There Must Be An Easier Way

Portable has 300+ long-tail source connectors to the most common data warehouses - Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL.

Start automating workflows today. Try Portable!