Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:51:19 +08:00
commit d595b82716
11 changed files with 557 additions and 0 deletions

View File

@@ -0,0 +1,50 @@
# example_dataset.csv
# This CSV file provides a sample dataset for demonstrating feature engineering techniques within the feature-engineering-toolkit plugin.
#
# Column Descriptions:
# - user_id: Unique identifier for each user (integer).
# - age: Age of the user (integer).
# - gender: Gender of the user (categorical: Male, Female, Other).
# - signup_date: Date the user signed up (YYYY-MM-DD).
# - last_login: Date of the user's last login (YYYY-MM-DD).
# - total_purchases: Total number of purchases made by the user (integer).
# - avg_purchase_value: Average value of each purchase (float).
# - country: Country of the user (categorical).
# - marketing_channel: The marketing channel through which the user signed up (categorical).
# - is_active: Indicates whether the user is currently active (boolean: True, False).
# - churned: Target variable indicating whether the user churned (boolean: True, False). This is what we want to predict.
#
# Instructions:
# - Use this dataset to experiment with feature engineering techniques.
# - Consider creating new features such as:
# - Time since signup (calculated from signup_date).
# - Time since last login (calculated from last_login).
# - Purchase frequency (total_purchases / time since signup).
# - Age groups (binning the age variable).
# - Interactions between features (e.g., age * avg_purchase_value).
# - Use feature selection techniques to identify the most important features for predicting churn.
# - Apply feature transformations (e.g., scaling, normalization, encoding categorical variables).
# - Remember to handle missing values appropriately (if any).
# - The 'churned' column is the target variable. The goal is to build a model that accurately predicts churn.
user_id,age,gender,signup_date,last_login,total_purchases,avg_purchase_value,country,marketing_channel,is_active,churned
1,25,Male,2023-01-15,2024-01-10,10,25.50,USA,Facebook,True,False
2,30,Female,2023-02-20,2024-01-15,5,50.00,Canada,Google Ads,True,False
3,40,Other,2023-03-10,2023-12-20,2,100.00,UK,Email,False,True
4,22,Male,2023-04-05,2024-01-05,15,15.75,Germany,Facebook,True,False
5,35,Female,2023-05-01,2023-11-30,1,200.00,France,Referral,False,True
6,28,Male,2023-06-12,2024-01-20,8,30.20,USA,Google Ads,True,False
7,45,Female,2023-07-08,2023-10-25,3,75.00,Canada,Email,False,True
8,31,Other,2023-08-03,2024-01-01,12,20.00,UK,Facebook,True,False
9,24,Male,2023-09-18,2023-12-10,7,40.00,Germany,Referral,False,True
10,38,Female,2023-10-22,2024-01-25,6,60.50,France,Google Ads,True,False
11,29,Male,2023-11-05,2023-12-15,4,80.00,USA,Email,False,True
12,33,Female,2023-12-01,2024-01-08,9,28.00,Canada,Facebook,True,False
13,42,Other,2024-01-02,2024-01-28,11,22.50,UK,Google Ads,True,False
14,27,Male,2023-01-28,2024-01-12,13,18.00,Germany,Referral,True,False
15,36,Female,2023-02-15,2023-11-01,0,0.00,France,Email,False,True
16,23,Male,2023-03-22,2024-01-18,14,17.25,USA,Facebook,True,False
17,39,Female,2023-04-10,2023-10-10,2,90.00,Canada,Google Ads,False,True
18,41,Other,2023-05-05,2024-01-03,16,14.50,UK,Referral,True,False
19,26,Male,2023-06-01,2023-12-25,5,55.00,Germany,Email,False,True
20,34,Female,2023-07-15,2024-01-22,17,13.00,France,Facebook,True,False
1 # example_dataset.csv
2 # This CSV file provides a sample dataset for demonstrating feature engineering techniques within the feature-engineering-toolkit plugin.
3 #
4 # Column Descriptions:
5 # - user_id: Unique identifier for each user (integer).
6 # - age: Age of the user (integer).
7 # - gender: Gender of the user (categorical: Male, Female, Other).
8 # - signup_date: Date the user signed up (YYYY-MM-DD).
9 # - last_login: Date of the user's last login (YYYY-MM-DD).
10 # - total_purchases: Total number of purchases made by the user (integer).
11 # - avg_purchase_value: Average value of each purchase (float).
12 # - country: Country of the user (categorical).
13 # - marketing_channel: The marketing channel through which the user signed up (categorical).
14 # - is_active: Indicates whether the user is currently active (boolean: True, False).
15 # - churned: Target variable indicating whether the user churned (boolean: True, False). This is what we want to predict.
16 #
17 # Instructions:
18 # - Use this dataset to experiment with feature engineering techniques.
19 # - Consider creating new features such as:
20 # - Time since signup (calculated from signup_date).
21 # - Time since last login (calculated from last_login).
22 # - Purchase frequency (total_purchases / time since signup).
23 # - Age groups (binning the age variable).
24 # - Interactions between features (e.g., age * avg_purchase_value).
25 # - Use feature selection techniques to identify the most important features for predicting churn.
26 # - Apply feature transformations (e.g., scaling, normalization, encoding categorical variables).
27 # - Remember to handle missing values appropriately (if any).
28 # - The 'churned' column is the target variable. The goal is to build a model that accurately predicts churn.
29 user_id,age,gender,signup_date,last_login,total_purchases,avg_purchase_value,country,marketing_channel,is_active,churned
30 1,25,Male,2023-01-15,2024-01-10,10,25.50,USA,Facebook,True,False
31 2,30,Female,2023-02-20,2024-01-15,5,50.00,Canada,Google Ads,True,False
32 3,40,Other,2023-03-10,2023-12-20,2,100.00,UK,Email,False,True
33 4,22,Male,2023-04-05,2024-01-05,15,15.75,Germany,Facebook,True,False
34 5,35,Female,2023-05-01,2023-11-30,1,200.00,France,Referral,False,True
35 6,28,Male,2023-06-12,2024-01-20,8,30.20,USA,Google Ads,True,False
36 7,45,Female,2023-07-08,2023-10-25,3,75.00,Canada,Email,False,True
37 8,31,Other,2023-08-03,2024-01-01,12,20.00,UK,Facebook,True,False
38 9,24,Male,2023-09-18,2023-12-10,7,40.00,Germany,Referral,False,True
39 10,38,Female,2023-10-22,2024-01-25,6,60.50,France,Google Ads,True,False
40 11,29,Male,2023-11-05,2023-12-15,4,80.00,USA,Email,False,True
41 12,33,Female,2023-12-01,2024-01-08,9,28.00,Canada,Facebook,True,False
42 13,42,Other,2024-01-02,2024-01-28,11,22.50,UK,Google Ads,True,False
43 14,27,Male,2023-01-28,2024-01-12,13,18.00,Germany,Referral,True,False
44 15,36,Female,2023-02-15,2023-11-01,0,0.00,France,Email,False,True
45 16,23,Male,2023-03-22,2024-01-18,14,17.25,USA,Facebook,True,False
46 17,39,Female,2023-04-10,2023-10-10,2,90.00,Canada,Google Ads,False,True
47 18,41,Other,2023-05-05,2024-01-03,16,14.50,UK,Referral,True,False
48 19,26,Male,2023-06-01,2023-12-25,5,55.00,Germany,Email,False,True
49 20,34,Female,2023-07-15,2024-01-22,17,13.00,France,Facebook,True,False