With Portable, integrate Amplitude data with your Snowflake warehouse in minutes. Access your digital analytics and measurements platform data from Snowflake without having to manage cumbersome ETL scripts.
The Two Paths to Connect Amplitude to Snowflake
There are two ways to sync data from Amplitude into your data warehouse for analytics.
Method 1: Manually Developing a Custom Data Pipeline Yourself
Write code from scratch or use an open-source framework to build an integration between Amplitude and Snowflake.
Method 2: Automating the ETL Process with a No-Code Solution
Leverage a pre-built connector from a cloud-hosted solution like Portable.
How to Create Value with Amplitude Data
Teams connect Amplitude to their data warehouse to build dashboards and generate value for their business. Let’s dig into the capabilities Amplitude exposes via their API, outline insights you can build with the data, and summarize the most common analytics environments that teams are using to process their Amplitude data.
Extract: What Data Can You Extract from the Amplitude API?
Amplitude is a digital analytics and measurements platform used for understanding your users, driving conversions, and increasing engagement.
To help clients power downstream analytics, Amplitude offers an application programming interface (API) for clients to extract data on business entities. Here are a few example entities you can extract from the API:
- Attribution
- Batch Event Upload
- Behavioral Cohorts
- CCPA DSAR
- Chart Annotations
- Dashboard
- Export
- Group Identify
- HTTP V2
- Identify
- Lookup Tables
- Releases
- SCIM
- Taxonomy
- User Privacy
- User Profile
You can visit the Amplitude API Documentation to explore the entire catalog of available API resources and the complete schema definition for each.
As you think about the data you will need for analytics, don’t forget that Portable offers no-code integrations to other similar applications.
Regardless of the SaaS solution you use, it’s important to find a digital analytics and measurements platform with robust data available for analytics.
Load: Which Destinations Are Best for Your Amplitude ETL Pipeline?
To turn raw data from Amplitude into dashboards, most companies centralize information into a data warehouse or data lake. For Portable clients, the most common ETL pipelines are:
- Amplitude to Snowflake Integration
- Amplitude to Google BigQuery Integration
- Amplitude to Amazon Redshift Integration
- Amplitude to PostgreSQL Integration
Once you have a destination to load the data, it’s common to combine Amplitude data with information from other enterprise applications like Jira, Mailchimp, HubSpot, Zendesk, and Klaviyo.
From there, you can build cross-functional dashboards in a visualization tool like Power BI, Tableau, Looker, or Retool.
Develop: Which Dashboards Should You Build with Amplitude Data?
Now that you have identified the data you want to extract, the next step is to plan out the dashboards you can build with the data.
As a process, you want to consume raw data, overlay SQL logic, and build a dashboard to either 1) increase revenue or 2) decrease costs.
Replicating Amplitude data into your cloud data warehouse can unlock a wide array of opportunities to power analytics, automate workflows, and develop products. The use cases are endless.
Now that we have a clear sense of the insights we can create, let’s compare the process of developing a custom Amplitude integration with the benefits of using a no-code ETL solution like Portable.
Method 1: Building a Custom Amplitude ETL Pipeline
To build your own Amplitude integration, there are three steps:
- Navigate the Amplitude API documentation
- Make your first API request
- Turn an API request into a complete data pipeline
Let’s walk through the process in more detail.
How to Interpret Amplitude’s API Documentation
When reading API documentation, there are a handful of key concepts to consider.
Authentication
There are many common authentication mechanisms. OAuth 2.0 (Auth Code and Client Credentials), API Keys, JWT Tokens, Personal Access Tokens, Basic Authentication, etc. For Amplitude, it’s important to identify the authentication mechanism and how best to incorporate the necessary credentials into your API requests.
Amplitude has different authentication requirements for different APIs as explained below.
- Attribution API:
- Batch Event Upload API:
- Group Identify API:
- HTTP V2 API:
- Identify API:
These APIs doesn't use authorization, but uses your API key.
Pass your API key in the URL of the request like https://api2.amplitude.com/endpoint?api_key={{api-key}}.
- Behavioral Cohorts API:
- Data Subject Access Request API:
- Chart Annotations API:
- Dashboard REST API:
- Export API:
- Lookup Table API:
- Releases API:
- Taxonomy API:
- User Privacy API:
These APIs uses Basic Auth, using the API key and secret key for your project. Pass your API key in the request header like {{api-key}}:{{secret-key}}. api-key replaces username, and secret-key replaces the password.
-
SCIM API:
Authorize by sending a Bearer Token in the Authorization Header. The token should equal the key that's generated on the Access and SSO page in the Settings Tab of Amplitude. -
User Profile API:
The Profile API supports API key authentication that uses your secret key by setting an Authorization header. Note that this is different than most of the Amplitude APIs, because it uses your secret key only. 'Authorization': 'Api-Key SECRET_KEY '
Resources
It’s important to identify the Amplitude API endpoints you want to use for analytics. Most APIs offer a combination of GET, POST, PUT, and DELETE request methods; however, for analytics, GET requests are typically the most useful. At times, POST requests can be used to extract data as well.
For Amplitude, the Upload request endpoint is a great place to get started.
Request Parameters
For each API endpoint you would like to use for analytics, you need to understand the method (GET, POST, PUT, or DELETE) and the URL, but there are other considerations to take into account as well. You should look out for pagination mechanics, query parameters, and parameters that are added to the request path.
THe Amplitutde SCIM API uses the following parameters for pagination:
- startIndex
- filter
How Do You Call the Amplitude API? (Tutorial)
- Follow the instructions above to read the Amplitude API documentation
- Identify and collect your credentials for authentication
- Pick the API resource you want to pull data from
- Configure the necessary parameters, method, and URL to make your first request (e.g. with curl or Postman)
- Add your credentials and make your first API call . Here is an example request using curl (without real credentials):
curl -X POST 'https://api2.amplitude.com/2/httpapi' \
-H 'Content-Type: application/json' \
-H 'Accept: */*'
How Do You Maintain a Custom Amplitude to Snowflake ETL Pipeline?
Making a call to the Amplitude API is just the beginning of maintaining a complete custom ETL pipeline.
Here is a getting-started guide to building a production-grade pipeline for Amplitude:
- For each API endpoint, define schemas (which fields exist and the type for each)
- Process the API response and parse the data (typically parsing JSON or XML)
- Handle and replicate nested objects and custom fields
- Identify which Amplitude fields are primary keys and which keys are required vs. optional
- Version control your changes in a git-based workflow (using GitHub, GitLab, etc.)
- Handle code dependencies in your toolchain and the upgrades that come with each
- Monitor the health of the upstream API, and —when things go wrong— troubleshoot via the status page, reach out to support, and open tickets
- Handle error codes (HTTP error codes like 400s, 500s, etc.)
- Manage and respect rate limits imposed by the server
We won’t go into detail on all of the items above, but rate limits are a great example of the complexity found in a production-grade data pipeline.
Batch Event Upload API: In addition to the per-second limit, there is daily limit to prevent against spam and abuse. This limit is hard to exceed. Events starts counting toward the daily limit after Amplitude determines that a user/device is spamming the system. After a project reaches the limit, Amplitude enforces a daily limit of 500,000 events uploaded per rolling 24 hours. The 24 hour rolling period applies in one-hour intervals. The daily limit applies for each deviceID and each user_id for a project.
CCPA DSAR API: All DSAR endpoints share a budget of 14.4 K “cost” per hour. POST requests cost 8, and GET requests cost 1. Requests beyond this count get 429 response codes.
In general for each POST, there is typically one output file per month per project the user has events for. For example, if you are fetching 13 months of data for a user with data in two projects, expect about 26 files.
If you need to get data for 40 users per hour, you can spend 14400 / 40 = 360 cost per request. Conservatively allocating 52 GETs for output files (twice the computed amount) and 8 for the initial POST, you can poll for the status of the request 360 - 8 - 52 = 300 times. Given the 5 day SLA for results, this allows for checking the status every 52460 / 300 = 24 minutes over 5 days. A practical usage might be to have a service which runs every 30 minutes, posting 20 new requests and checking on the status of all outstanding requests.
Dashboard REST API: Concurrent Limit: You can run up to 5 concurrent requests across all Amplitude REST API endpoints, including cohort download.
User activity/user search limits You can run up to 360 queries per hour for user activity and user search endpoints. The User Activity and User Search endpoints have a different rate limit than all other request types.
SCIM API: The SCIM API supports 100 requests per minute per organization. Amplitude can lift this restriction for burst requests on a per-request basis. Contact the support team or a customer success manager for more information.
Taxonomy API: For each endpoint, there is a concurrent and a rate limit. The concurrent limit restricts the amount of requests you can run at the same time, while the rate limit restricts the total number of queries you can run per hour. The limits are per project, and exceeding any of these limits returns a 429 error.
The endpoints use a cost per query model. Here are the max costs per API Key:
Concurrent Cost Limit: You can run queries that collectively add up to 4 cost at the same time. Period Cost Limit: You can run up to 7200 cost per hour. Cost structure of each endpoint:
GET: 1 cost PUT: 2 cost POST: 2 cost DELETE: 2 cost
User Privacy API: The endpoint /api/2/deletions/users has a rate limit of 1 HTTP request per second. Each HTTP request can contain up to 100 amplitude_ids or user_ids. You can make up to 100 deletion requests per second if you batch 100 users in each request."
If you don’t respect rate limits, and if you can’t handle server responses (like 429 errors with a Retry-After header), your pipeline can break, and analytics can become out-of-date.
What Are the Drawbacks of Building the Amplitude ETL Pipeline Yourself?
You can probably tell at this point that there is a lot of work that goes into building and maintaining an ETL pipeline from Amplitude to your data warehouse.
If you want less development work, faster insights, and no ongoing responsibilities, you should consider a cloud-hosted ETL solution.
Let’s walk through the setup process for a no-code ETL solution and its benefits.
Method 2: Using a No-Code Amplitude ETL Solution
No-code ETL solutions are simple. Vendors specialize in building and maintaining data pipelines on your behalf. Instead of starting from scratch for each integration. Companies like Portable create connector templates that can be leveraged by hundreds or thousands of clients.
Step-By-Step Tutorial for Configuring Your Amplitude ETL Pipeline
Off-the-shelf ETL tools offer a no-code setup process. Here are the instructions to connect Amplitude to your cloud data warehouse with Portable.
- Create an account (no credit card required)
- Add a source —search for and select Amplitude
- Authenticate with Amplitude using the instructions in the Portable console
- Select Snowflake and authenticate
- Set up a flow connecting Amplitude to your analytics environment
- Run your flow to replicate data from Amplitude to your warehouse
- Use the dropdown to set your data flow to run on a cadence
What Are the Benefits of Using Portable for Amplitude ETL?
No-Code Simplicity
Start moving Amplitude data in minutes. Save yourself the headaches of reading API documentation, writing code, and worrying about maintenance. Leave the hassle to us.
Easy to Understand Pricing
With predictable, fixed-cost pricing per data flow, you know exactly how much your Amplitude integration will cost every month.
Fast Development Speeds
Access lightning-fast connector development. Portable can build new integrations on-demand in hours or days.
Hands-On Support
APIs change. Schemas evolve. Amplitude will have maintenance issues and errors. With Portable, we will do everything in our power to make your life easier.
Unlimited Data Volumes
You can move as much data from Amplitude to Snowflake as you want without worrying about usage credits or overages. Instead of analyzing your ETL costs, you should be analyzing your data.
Free to Get Started
Sign up and get started for free. You don’t need a credit card to manually trigger a data sync, so you can try all of our connectors before paying a dime.