Skip to content

Dataset using Parquet

This guide outlines how to manage datasets using parquet parameters in the AIV application. The steps below explain how to create a dataset from CSV/JSON/Excel files, schedule the output as a parquet file, and then utilize that parquet file as a data source.

1. Creating a Dataset from CSV/JSON/Excel Files.

  • Navigate to the Datasets section in the AIV application.
  • Click Create Dataset button.
  • Select the desired file format (CSV, JSON, or Excel).
  • Click Submit to create the dataset.

2. Scheduling the Dataset (Parquet Output).

  • Once the dataset is created, go to the dataset’s options and select Schedule.

  • In the scheduling window, choose Parquet File as the Output Type.

  • Set up the schedule, including frequency and start time, as needed.

  • Click Run button to confirm.

    Image

3. Creating a Dataset from the Generated Parquet File.

  • After the parquet file has been generated, navigate to the Datasets section and select Create New Dataset.

  • Choose Parquet as the data source. Image

  • You may add a query to filter or modify the data, as shown in the figure below. Image

4. Previewing and Completing the Dataset.

  • After adding the query, click the Preview button to check the data output.
  • Once satisfied with the preview, add the name and click Create button to complete the process. Image