With Portable, integrate Notion data with your Snowflake warehouse in minutes. Access your Note taking and Team collaboration tool data from Snowflake without having to manage cumbersome ETL scripts.
The Two Paths to Connect Notion to Snowflake
There are two ways to sync data from Notion into your data warehouse for analytics.
Method 1: Manually Developing a Custom Data Pipeline Yourself
Write code from scratch or use an open-source framework to build an integration between Notion and Snowflake.
Method 2: Automating the ETL Process with a No-Code Solution
Leverage a pre-built connector from a cloud-hosted solution like Portable.
How to Create Value with Notion Data
Teams connect Notion to their data warehouse to build dashboards and generate value for their business. Let’s dig into the capabilities Notion exposes via their API, outline insights you can build with the data, and summarize the most common analytics environments that teams are using to process their Notion data.
Extract: What Data Can You Extract from the Notion API?
Notion is a Note taking and Team collaboration tool used for planning, thinking and writing.
To help clients power downstream analytics, Notion offers an application programming interface (API) for clients to extract data on business entities. Here are a few example entities you can extract from the API:
- Databases
- Pages
- Blocks
- Comments
- Users
- Search
You can visit the Notion API Documentation to explore the entire catalog of available API resources and the complete schema definition for each.
As you think about the data you will need for analytics, don’t forget that Portable offers no-code integrations to other similar applications.
Regardless of the SaaS solution you use, it’s important to find a Note taking and Team collaboration tool with robust data available for analytics.
Load: Which Destinations Are Best for Your Notion ETL Pipeline?
To turn raw data from Notion into dashboards, most companies centralize information into a data warehouse or data lake. For Portable clients, the most common ETL pipelines are:
- Notion to Snowflake Integration
- Notion to Google BigQuery Integration
- Notion to Amazon Redshift Integration
- Notion to PostgreSQL Integration
Once you have a destination to load the data, it’s common to combine Notion data with information from other enterprise applications like Jira, Mailchimp, HubSpot, Zendesk, and Klaviyo.
From there, you can build cross-functional dashboards in a visualization tool like Power BI, Tableau, Looker, or Retool.
Develop: Which Dashboards Should You Build with Notion Data?
Now that you have identified the data you want to extract, the next step is to plan out the dashboards you can build with the data.
As a process, you want to consume raw data, overlay SQL logic, and build a dashboard to either 1) increase revenue or 2) decrease costs.
Replicating Notion data into your cloud data warehouse can unlock a wide array of opportunities to power analytics, automate workflows, and develop products. The use cases are endless.
Now that we have a clear sense of the insights we can create, let’s compare the process of developing a custom Notion integration with the benefits of using a no-code ETL solution like Portable.
Method 1: Building a Custom Notion ETL Pipeline
To build your own Notion integration, there are three steps:
- Navigate the Notion API documentation
- Make your first API request
- Turn an API request into a complete data pipeline
Let’s walk through the process in more detail.
How to Interpret Notion’s API Documentation
When reading API documentation, there are a handful of key concepts to consider.
Authentication
There are many common authentication mechanisms. OAuth 2.0 (Auth Code and Client Credentials), API Keys, JWT Tokens, Personal Access Tokens, Basic Authentication, etc. For Notion, it’s important to identify the authentication mechanism and how best to incorporate the necessary credentials into your API requests.
Requests use the HTTP Authorization header to both authenticate and authorize operations. The Notion API accepts bearer tokens in this header. Bearer tokens are provided to you when you create an integration. If you're creating a public OAuth integration, the integration also receives bearer tokens each time a user completes the OAuth flow.
Inside Notion, users will see updates made by integrations attributed to a bot. The bot's name and avatar are controlled in the integration settings.
Using a Notion SDK, a bearer token can be passed once to initialize a Client and the client can be used to send multiple authenticated requests.
Resources
It’s important to identify the Notion API endpoints you want to use for analytics. Most APIs offer a combination of GET, POST, PUT, and DELETE request methods; however, for analytics, GET requests are typically the most useful. At times, POST requests can be used to extract data as well.
For Notion, the query endpoint is a great place to get started.
Request Parameters
For each API endpoint you would like to use for analytics, you need to understand the method (GET, POST, PUT, or DELETE) and the URL, but there are other considerations to take into account as well. You should look out for pagination mechanics, query parameters, and parameters that are added to the request path.
Notion uses next_cursor parameters for pagination.
Some API endpoints require unique identifiers from a previous API response to be included in the URL path. For instance, to query a database, you need a database_id that is returned from another endpoint.
How Do You Call the Notion API? (Tutorial)
- Follow the instructions above to read the Notion API documentation
- Identify and collect your credentials for authentication
- Pick the API resource you want to pull data from
- Configure the necessary parameters, method, and URL to make your first request (e.g. with curl or Postman)
- Add your credentials and make your first API call . Here is an example request using curl (without real credentials):
curl -X POST 'https://api.notion.com/v1/databases/897e5a76ae524b489fdfe71f5945d1af/query' \
-H 'Authorization: Bearer "$NOTION_API_KEY" \
-H 'Notion-Version: 2022-06-28' \
-H "Content-Type: application/json" \
--data '{
"filter": {
"or": [
{
"property": "In stock",
"checkbox": {
"equals": true
}
},
{
"property": "Cost of next trip",
"number": {
"greater_than_or_equal_to": 2
}
}
]
},
"sorts": [
{
"property": "Last ordered",
"direction": ""ascending"
}
]
}'
How Do You Maintain a Custom Notion to Snowflake ETL Pipeline?
Making a call to the Notion API is just the beginning of maintaining a complete custom ETL pipeline.
Here is a getting-started guide to building a production-grade pipeline for Notion:
- For each API endpoint, define schemas (which fields exist and the type for each)
- Process the API response and parse the data (typically parsing JSON or XML)
- Handle and replicate nested objects and custom fields
- Identify which Notion fields are primary keys and which keys are required vs. optional
- Version control your changes in a git-based workflow (using GitHub, GitLab, etc.)
- Handle code dependencies in your toolchain and the upgrades that come with each
- Monitor the health of the upstream API, and —when things go wrong— troubleshoot via the status page, reach out to support, and open tickets
- Handle error codes (HTTP error codes like 400s, 500s, etc.)
- Manage and respect rate limits imposed by the server
We won’t go into detail on all of the items above, but rate limits are a great example of the complexity found in a production-grade data pipeline.
The API has rate limits. Your integration can gracefully handle these limits by slowing down when the API responds with a HTTP status 429.
Rate-limited requests will return a 'rate_limited' error code (HTTP response status 429). The rate limit for incoming requests is an average of 3 requests per second. Some bursts beyond the average rate are allowed.
Integrations should accommodate variable rate limits by handling HTTP 429 responses and respecting the Retry-After response header value, which is set as an integer number of seconds (in decimal). Requests made after waiting this minimum amount of time should no longer be rate limited.
Alternatively, rate limits can be accommodated by backing off (or slowing down) the speed of future requests. A common way to implement this is using one or several queues for pending requests, which can be consumed by sending requests as long as Notion does not respond with an HTTP 429.
Notion limits the size of certain parameters, and the depth of children in requests. A requests that exceeds any of these limits will return 'validation_error' error code (HTTP response status 400) and contain more specific details in the 'message' property.
Integrations should avoid sending requests beyond these limits proactively. It may be helpful to use test data in your own test suite which intentionally contains large parameters to verify that the errors are handled appropriately. For example, if the integration reads a URL from an external system to put into a Notion page property, the integration should have a plan to deal with URLs that are beyond the length limit of 1000 characters. The integration might choose to log the error, or send an alert to the user who set up the integration via an email, or some other action.
If you don’t respect rate limits, and if you can’t handle server responses (like 429 errors with a Retry-After header), your pipeline can break, and analytics can become out-of-date.
What Are the Drawbacks of Building the Notion ETL Pipeline Yourself?
You can probably tell at this point that there is a lot of work that goes into building and maintaining an ETL pipeline from Notion to your data warehouse.
If you want less development work, faster insights, and no ongoing responsibilities, you should consider a cloud-hosted ETL solution.
Let’s walk through the setup process for a no-code ETL solution and its benefits.
Method 2: Using a No-Code Notion ETL Solution
No-code ETL solutions are simple. Vendors specialize in building and maintaining data pipelines on your behalf. Instead of starting from scratch for each integration. Companies like Portable create connector templates that can be leveraged by hundreds or thousands of clients.
Step-By-Step Tutorial for Configuring Your Notion ETL Pipeline
Off-the-shelf ETL tools offer a no-code setup process. Here are the instructions to connect Notion to your cloud data warehouse with Portable.
- Create an account (no credit card required)
- Add a source —search for and select Notion
- Authenticate with Notion using the instructions in the Portable console
- Select Snowflake and authenticate
- Set up a flow connecting Notion to your analytics environment
- Run your flow to replicate data from Notion to your warehouse
- Use the dropdown to set your data flow to run on a cadence
What Are the Benefits of Using Portable for Notion ETL?
No-Code Simplicity
Start moving Notion data in minutes. Save yourself the headaches of reading API documentation, writing code, and worrying about maintenance. Leave the hassle to us.
Easy to Understand Pricing
With predictable, fixed-cost pricing per data flow, you know exactly how much your Notion integration will cost every month.
Fast Development Speeds
Access lightning-fast connector development. Portable can build new integrations on-demand in hours or days.
Hands-On Support
APIs change. Schemas evolve. Notion will have maintenance issues and errors. With Portable, we will do everything in our power to make your life easier.
Unlimited Data Volumes
You can move as much data from Notion to Snowflake as you want without worrying about usage credits or overages. Instead of analyzing your ETL costs, you should be analyzing your data.
Free to Get Started
Sign up and get started for free. You don’t need a credit card to manually trigger a data sync, so you can try all of our connectors before paying a dime.