Skip to content

Create a Dataset Using Parquet

This guide walks you through creating datasets using Parquet files in AIV. Parquet is ideal for large-scale analytics and integrates seamlessly with your existing data pipelines.

What You’ll Learn

  • Create a dataset from CSV, JSON, or Excel and schedule it to output Parquet
  • Create a new dataset from a generated Parquet file
  • Preview, validate, and save your Parquet dataset

Overview

Parquet is a columnar storage format optimized for analytical workloads, offering efficient compression and fast query performance. This workflow consists of two phases:

  1. Phase 1 — Create a dataset from your source file and schedule it to export as Parquet.
  2. Phase 2 — Use the generated Parquet file as the data source for a new dataset.

Prerequisites

Before you begin:

Using the sample workflow?

  • Download csv.zip
  • Extract the zip archive
  • Upload retail_sales.csv to AIV (see Upload guide for help)
  • Follow the CSV guide to create the initial dataset

Phase 1: Create a Dataset and Schedule Parquet Output

Step 1: Create the Initial Dataset

Create a dataset from your CSV, JSON, or Excel file by following the guide for your source format:

Source FormatGuide
CSVDataset using CSV — Method 1: Simple
ExcelDataset using Excel
JSONDataset using JSON

Step 2: Schedule the Dataset to Produce a Parquet File

  1. Locate your dataset in the Dataset Grid view.

    Schedule

  2. Click the Schedule icon (hamburger icon) to schedule it.

  3. In the scheduling window, open the Output tab.

  4. Select Parquet from the Format dropdown as the Output Type.

    Schedule Output Tab

  5. Click Run to confirm. A popup message will appear displaying Schedule Successfully.

    Schedule Successfully


Phase 2: Create a Dataset from the Parquet File

After the Parquet file is generated, refresh the Dataset Grid to see it listed. For connection behavior and terminology, see Connection Selection — Parquet.

Refresh

Step 1: Create a New Dataset and Select Parquet

Complete Step 1 (Navigate to the Dataset Section) and Step 2 (Create a New Dataset) from the Create Dataset guide.

Step 2: Choose Parquet as the Data Source

  1. Select Parquet Files as the data source type.

    Select Parquet Files

Step 3: Connection Selection

  1. Click Select Existing Files.

  2. Search for your Parquet file (for example, retail_sales.parquet).

  3. Select the file from the list. An automatic query will be generated.

    Query

  4. Open the Output Columns tab to review column names and data types.

    Output Columns

  5. Open the Preview Results tab to verify the data.

    Preview Results

Step 4: Save Dataset

  1. Click Save.
  2. In the Save Dataset dialog, optionally rename the dataset and select the destination folder.
  3. Click Save Dataset to confirm.

Step 5: View Your Dataset

  1. Go to the Dataset Grid view.
  2. Search for your dataset by name.
  3. Your dataset is now available for use in dashboards, reports, and visualizations.

With Parameters (Dynamic Filtering)

Use this method when you want users to filter Parquet data at runtime—for example, by store location, product category, or date range. The steps below build on the standard workflow above.

Step 1: Select Parquet as the Data Source

Follow Step 2: Choose Parquet as the Data Source above. Select Parquet Files as the data source type.

Step 2: Connection Selection

Follow Step 3: Connection Selection above. Click Select Existing Files, search for your Parquet file (for example, retail_sales.parquet), select it, and enable SQL Query.

Step 3: Add Parameters to the Query

  1. Update the SQL query to include a parameter placeholder:

    SELECT * FROM retail_sales WHERE ProductCategory IN ({{ProductCategory}})

    Parameter

  2. Click Preview Results.

  3. When prompted, enter a test value (for example, Home) for the parameter.

  4. Click Submit to view the filtered data.

  5. Verify the output in Preview Results.

    Parameter

Step 4: Save Dataset

Same as Step 4: Save Dataset above. Click Save, optionally rename and select the folder in the dialog, then click Save Dataset to confirm.

Step 5: View Your Dataset

Same as Step 5: View Your Dataset above. Go to the Dataset Grid view and search for your dataset. When used in dashboards or reports, users will be prompted to enter parameter values to filter the data dynamically.


Explore other ways to create datasets in AIV:

Data SourceGuide
Drag & dropCreate Dataset
ExcelDataset using Excel
CSVDataset using CSV
JSONDataset using JSON
NoSQLDataset using NoSQL
Flat filesDataset using Flat files
Google BigQueryDataset using Google BigQuery
Google SheetDataset using Google Sheet
External sourcesDataset using External sources
Stored ProcedureDataset using Stored Procedure
ViewDataset using View