Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:51:07 +08:00
commit d14bc8c36b
9 changed files with 209 additions and 0 deletions

View File

@@ -0,0 +1,7 @@
# Assets
Bundled resources for data-preprocessing-pipeline skill
- [ ] example_data.csv: Example dataset to demonstrate the pipeline's functionality.
- [ ] config.yaml: Configuration file for the data preprocessing pipeline.
- [ ] data_dictionary.md: A data dictionary describing the fields in the dataset.

View File

@@ -0,0 +1,35 @@
# example_data.csv
# This CSV file provides sample data to demonstrate the functionality of the data_preprocessing_pipeline plugin.
#
# Column Descriptions:
# - ID: Unique identifier for each record.
# - Feature1: Numerical feature with some missing values.
# - Feature2: Categorical feature with multiple categories and potential typos.
# - Feature3: Date feature in string format.
# - Target: Binary target variable (0 or 1).
#
# Placeholders:
# - [MISSING_VALUE]: Represents a missing value to be handled by the pipeline.
# - [TYPO_CATEGORY]: Represents a typo in a categorical value.
#
# Instructions:
# - Feel free to modify this data to test different preprocessing scenarios.
# - Ensure the data adheres to the expected format for each column.
# - Use the `/preprocess` command to trigger the preprocessing pipeline on this data.
ID,Feature1,Feature2,Feature3,Target
1,10.5,CategoryA,2023-01-15,1
2,12.0,CategoryB,2023-02-20,0
3,[MISSING_VALUE],CategoryC,2023-03-25,1
4,15.2,CategoryA,2023-04-01,0
5,9.8,CateogryB,[MISSING_VALUE],1
6,11.3,CategoryC,2023-05-10,0
7,13.7,CategoryA,2023-06-15,1
8,[MISSING_VALUE],CategoryB,2023-07-20,0
9,16.1,CategoryC,2023-08-25,1
10,10.0,CategoryA,2023-09-01,0
11,12.5,[TYPO_CATEGORY],2023-10-10,1
12,14.9,CategoryB,2023-11-15,0
13,11.8,CategoryC,2023-12-20,1
14,13.2,CategoryA,2024-01-25,0
15,9.5,CategoryB,2024-02-01,1
1 # example_data.csv
2 # This CSV file provides sample data to demonstrate the functionality of the data_preprocessing_pipeline plugin.
3 #
4 # Column Descriptions:
5 # - ID: Unique identifier for each record.
6 # - Feature1: Numerical feature with some missing values.
7 # - Feature2: Categorical feature with multiple categories and potential typos.
8 # - Feature3: Date feature in string format.
9 # - Target: Binary target variable (0 or 1).
10 #
11 # Placeholders:
12 # - [MISSING_VALUE]: Represents a missing value to be handled by the pipeline.
13 # - [TYPO_CATEGORY]: Represents a typo in a categorical value.
14 #
15 # Instructions:
16 # - Feel free to modify this data to test different preprocessing scenarios.
17 # - Ensure the data adheres to the expected format for each column.
18 # - Use the `/preprocess` command to trigger the preprocessing pipeline on this data.
19 ID Feature1 Feature2 Feature3 Target
20 1 10.5 CategoryA 2023-01-15 1
21 2 12.0 CategoryB 2023-02-20 0
22 3 [MISSING_VALUE] CategoryC 2023-03-25 1
23 4 15.2 CategoryA 2023-04-01 0
24 5 9.8 CateogryB [MISSING_VALUE] 1
25 6 11.3 CategoryC 2023-05-10 0
26 7 13.7 CategoryA 2023-06-15 1
27 8 [MISSING_VALUE] CategoryB 2023-07-20 0
28 9 16.1 CategoryC 2023-08-25 1
29 10 10.0 CategoryA 2023-09-01 0
30 11 12.5 [TYPO_CATEGORY] 2023-10-10 1
31 12 14.9 CategoryB 2023-11-15 0
32 13 11.8 CategoryC 2023-12-20 1
33 14 13.2 CategoryA 2024-01-25 0
34 15 9.5 CategoryB 2024-02-01 1