Using Azure Blob as a Destination with Parquet Files
Denis
Staff Software Engineer
Azure Blob Storage is a widely-used cloud storage solution offering scalability, security, and cost-effectiveness. In this guide, we'll walk you through how to configure Azure Blob Storage as a destination in Portable to store your data in Parquet format, allowing you to efficiently work with structured data and easily integrate it into your analytics workflows.
Why Azure Blob Storage?
Azure Blob Storage provides a reliable, scalable, and highly available service for storing large volumes of unstructured data. Whether you're looking to archive data, manage backups, or store logs, Azure Blob Storage is a versatile solution. Portable allows you to leverage this storage while converting and loading your data in Parquet format, making it optimal for analytics and querying.
Benefits of using Parquet
Parquet is a columnar storage format optimized for performance and efficiency when handling large datasets. By storing your data in Parquet format, you can:
Reduce storage costs: Parquet compresses data efficiently, reducing the size of your files.
Faster querying: Parquet allows for faster reads on specific columns without reading the entire dataset.
Compatibility: It’s compatible with popular data analytics engines like Azure Data Lake Analytics, Databricks, and Synapse.
Prerequisites
Before setting up Azure Blob Storage as a destination in Portable, ensure the following:
You have an active Azure account.
You have created a Blob Storage container.
You have your access key
Step 1: Create Your Azure Blob Storage Account and Container
If you haven’t done so already, set up your Azure Blob Storage account:
Navigate to Storage Accounts and expand Security + networking.
Click on Access Keys.
Collect Accoun name, container and key to use in portable configuring your destination
Step 2: Configure Azure Blob Storage as a Destination in Portable
Log in to your Portable dashboard.
Navigate to the Destinations tab and click Add Destination.
Select Azure Blob Storage from the list of available destinations.
Enter your Account Name, Account Key, Container Name, and Upload Path from your Azure Blob Storage setup.
Upload Path If specified, the necessary folders will be automatically created if they do not already exist to ensure successful file uploads.
You should be able to see now your configure destination in your destinations' list
Step 3: Create Your Workflow
With your Destination and your Source configured, you’re ready to start the data pipeline:
Go to your Portable Flows and create a new Flow selecting your created Azure Blob destination and your confugured Source.
On the flow destail page you now have several options to run your Flow such as manual, in specific frequency or cron. Select your prefer option and save and run the flow
Monitor the flow in Portable's recent runs tables to see the status of the data transfer.
Step 5: Verify Your Data in Azure Blob Storage
Once the workflow is complete, navigate to your Azure Blob Storage container:
In the Azure Portal, navigate to your storage account.
Go to Containers, select your container, and verify that your Parquet files have been uploaded.
Conclusion
By setting up Azure Blob Storage as a destination in Portable, you can efficiently export data in Parquet format for optimized storage and analysis. Azure Blob Storage is a scalable and secure solution, while Portable simplifies the process of integrating data across platforms. This configuration allows for seamless data workflows, helping you store, manage, and analyze large datasets with ease.