Skip to content

Create Parquet Dataset

๐Ÿ“˜ Guide: Creating a Dataset with Multiple Parquet Files

Parquet is a popular format for storing large, structured data efficiently. Often, data is split across multiple Parquet files (e.g., Sales mobile Data in one file and Mobile Details in another).

By joining these files, you can create a single, meaningful dataset that brings all the information together.

This guide explains how to create a dataset using multiple Parquet files and run queries to join them.


๐Ÿ”น Step 1: Open the Dataset Section

  1. Click the Hamburger Menu (โ‰ก).
    ๐Ÿ“ธ Insert Screenshot โ€“ Hamburger Menu
  2. Expand Master Data.
    ๐Ÿ“ธ Insert Screenshot โ€“ Master Data Menu
  3. Select Datasets.
    ๐Ÿ“ธ Insert Screenshot โ€“ Datasets Option
  4. Youโ€™ll be redirected to the Dataset Page.
  5. At the bottom, click Create Dataset.
    ๐Ÿ“ธ Insert Screenshot โ€“ Create Dataset Button
    ๐Ÿ‘‰ Dataset creation box will appear.

๐Ÿ”น Step 2: Select Parquet Files

  1. On the left-hand side, go to the Datasource Section.
    ๐Ÿ“ธ Insert Screenshot โ€“ Datasource Section
  2. Select Parquet Files.
  3. In the middle panel, youโ€™ll see a list of folders and .parquet files.
    ๐Ÿ“ธ Insert Screenshot - Parquet File List
  4. Select the files you want to use.

๐Ÿ‘‰ Example:

  • Mobile Details.parquet
  • Sales mobile Data.parquet

๐Ÿ”น Step 3: Write Your Query

  1. After selecting the files, a Query Box will appear on the right.
    ๐Ÿ“ธ Insert Screenshot โ€“ Query Box
  2. Write your SQL query to join and fetch data.

๐Ÿ‘‰ Example (INNER JOIN):

SELECT 
    sm.Phone, 
    md."Price ($)",
    md.Status, 
    sm.Manager, 
    sm.Month, 
    sm.Stage, 
    sm.Deal_Status, 
    sm.Deal_Size
FROM Sales mobile Data AS sm
INNER JOIN Mobile Details AS md
    ON sm.Phone = md.Phone

๐Ÿ“Œ What this output shows:

  • The Price ($) and Status from Mobile Details
  • The Phone name, Manager, Month, Stage, Deal Status, and Deal Size from Sales Mobile Data

๐Ÿ‘‰ In short: This query gives you a combined dataset of sales and mobile details, showing only the phones that exist in both Parquet files.

๐Ÿ“Œ Tip:

  • Use INNER JOIN when you want records common in both files.
  • Use LEFT JOIN, RIGHT JOIN, or FULL JOIN depending on your needs.

๐Ÿ”น Step 4: Define Dataset Name

  1. Enter a Dataset Name (DS Name) to identify your dataset.
    ๐Ÿ“ธ Insert Screenshot โ€“ Dataset Name Field

๐Ÿ”น Step 5: Configure Output & Preview

  1. Go to the Output Columns section to review your selected fields.
    ๐Ÿ“ธ Insert Screenshot โ€“ Output Columns
  2. Click Preview to check the query results.
    ๐Ÿ“ธ Insert Screenshot โ€“ Preview Output

๐Ÿ”น Step 6: Create the Dataset

  1. Once everything looks good, click Create.
    ๐Ÿ“ธ Insert Screenshot โ€“ Create Button
    โœจ Your dataset is ready with multiple Parquet file joins. You can now use this dataset for analysis, reporting, or building dashboards.