Dataset using Parquet
This guide outlines how to manage datasets using parquet parameters in the AIV application. The steps below explain how to create a dataset from CSV/JSON/Excel files, schedule the output as a parquet file, and then utilize that parquet file as a data source.
1. Creating a Dataset from CSV/JSON/Excel Files.
- Navigate to the Datasets section in the AIV application.
- Click Create Dataset button.
- Select the desired file format (CSV, JSON, or Excel).
- Click Submit to create the dataset.
2. Scheduling the Dataset (Parquet Output).
-
Once the dataset is created, go to the dataset’s options and select Schedule.
-
In the scheduling window, choose Parquet File as the Output Type.
-
Set up the schedule, including frequency and start time, as needed.
-
Click Run button to confirm.
3. Creating a Dataset from the Generated Parquet File.
-
After the parquet file has been generated, navigate to the Datasets section and select Create New Dataset.
-
Choose Parquet as the data source.
-
You may add a query to filter or modify the data, as shown in the figure below.
4. Previewing and Completing the Dataset.
- After adding the query, click the Preview button to check the data output.
- Once satisfied with the preview, add the name and click Create button to complete the process.