# Supervised Learning Reference ## Overview Supervised learning algorithms learn from labeled training data to make predictions on new data. Scikit-learn provides comprehensive implementations for both classification and regression tasks. ## Linear Models ### Regression **Linear Regression (`sklearn.linear_model.LinearRegression`)** - Ordinary least squares regression - Fast, interpretable, no hyperparameters - Use when: Linear relationships, interpretability matters - Example: ```python from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test) ``` **Ridge Regression (`sklearn.linear_model.Ridge`)** - L2 regularization to prevent overfitting - Key parameter: `alpha` (regularization strength, default=1.0) - Use when: Multicollinearity present, need regularization - Example: ```python from sklearn.linear_model import Ridge model = Ridge(alpha=1.0) model.fit(X_train, y_train) ``` **Lasso (`sklearn.linear_model.Lasso`)** - L1 regularization with feature selection - Key parameter: `alpha` (regularization strength) - Use when: Want sparse models, feature selection - Can reduce some coefficients to exactly zero - Example: ```python from sklearn.linear_model import Lasso model = Lasso(alpha=0.1) model.fit(X_train, y_train) # Check which features were selected print(f"Non-zero coefficients: {sum(model.coef_ != 0)}") ``` **ElasticNet (`sklearn.linear_model.ElasticNet`)** - Combines L1 and L2 regularization - Key parameters: `alpha`, `l1_ratio` (0=Ridge, 1=Lasso) - Use when: Need both feature selection and regularization - Example: ```python from sklearn.linear_model import ElasticNet model = ElasticNet(alpha=0.1, l1_ratio=0.5) model.fit(X_train, y_train) ``` ### Classification **Logistic Regression (`sklearn.linear_model.LogisticRegression`)** - Binary and multiclass classification - Key parameters: `C` (inverse regularization), `penalty` ('l1', 'l2', 'elasticnet') - Returns probability estimates - Use when: Need probabilistic predictions, interpretability - Example: ```python from sklearn.linear_model import LogisticRegression model = LogisticRegression(C=1.0, max_iter=1000) model.fit(X_train, y_train) probas = model.predict_proba(X_test) ``` **Stochastic Gradient Descent (SGD)** - `SGDClassifier`, `SGDRegressor` - Efficient for large-scale learning - Key parameters: `loss`, `penalty`, `alpha`, `learning_rate` - Use when: Very large datasets (>10^4 samples) - Example: ```python from sklearn.linear_model import SGDClassifier model = SGDClassifier(loss='log_loss', max_iter=1000, tol=1e-3) model.fit(X_train, y_train) ``` ## Support Vector Machines **SVC (`sklearn.svm.SVC`)** - Classification with kernel methods - Key parameters: `C`, `kernel` ('linear', 'rbf', 'poly'), `gamma` - Use when: Small to medium datasets, complex decision boundaries - Note: Does not scale well to large datasets - Example: ```python from sklearn.svm import SVC # Linear kernel for linearly separable data model_linear = SVC(kernel='linear', C=1.0) # RBF kernel for non-linear data model_rbf = SVC(kernel='rbf', C=1.0, gamma='scale') model_rbf.fit(X_train, y_train) ``` **SVR (`sklearn.svm.SVR`)** - Regression with kernel methods - Similar parameters to SVC - Additional parameter: `epsilon` (tube width) - Example: ```python from sklearn.svm import SVR model = SVR(kernel='rbf', C=1.0, epsilon=0.1) model.fit(X_train, y_train) ``` ## Decision Trees **DecisionTreeClassifier / DecisionTreeRegressor** - Non-parametric model learning decision rules - Key parameters: - `max_depth`: Maximum tree depth (prevents overfitting) - `min_samples_split`: Minimum samples to split a node - `min_samples_leaf`: Minimum samples in leaf - `criterion`: 'gini', 'entropy' for classification; 'squared_error', 'absolute_error' for regression - Use when: Need interpretable model, non-linear relationships, mixed feature types - Prone to overfitting - use ensembles or pruning - Example: ```python from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier( max_depth=5, min_samples_split=20, min_samples_leaf=10, criterion='gini' ) model.fit(X_train, y_train) # Visualize the tree from sklearn.tree import plot_tree plot_tree(model, feature_names=feature_names, class_names=class_names) ``` ## Ensemble Methods ### Random Forests **RandomForestClassifier / RandomForestRegressor** - Ensemble of decision trees with bagging - Key parameters: - `n_estimators`: Number of trees (default=100) - `max_depth`: Maximum tree depth - `max_features`: Features to consider for splits ('sqrt', 'log2', or int) - `min_samples_split`, `min_samples_leaf`: Control tree growth - Use when: High accuracy needed, can afford computation - Provides feature importance - Example: ```python from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier( n_estimators=100, max_depth=10, max_features='sqrt', n_jobs=-1 # Use all CPU cores ) model.fit(X_train, y_train) # Feature importance importances = model.feature_importances_ ``` ### Gradient Boosting **GradientBoostingClassifier / GradientBoostingRegressor** - Sequential ensemble building trees on residuals - Key parameters: - `n_estimators`: Number of boosting stages - `learning_rate`: Shrinks contribution of each tree - `max_depth`: Depth of individual trees (typically 3-5) - `subsample`: Fraction of samples for training each tree - Use when: Need high accuracy, can afford training time - Often achieves best performance - Example: ```python from sklearn.ensemble import GradientBoostingClassifier model = GradientBoostingClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, subsample=0.8 ) model.fit(X_train, y_train) ``` **HistGradientBoostingClassifier / HistGradientBoostingRegressor** - Faster gradient boosting with histogram-based algorithm - Native support for missing values and categorical features - Key parameters: Similar to GradientBoosting - Use when: Large datasets, need faster training - Example: ```python from sklearn.ensemble import HistGradientBoostingClassifier model = HistGradientBoostingClassifier( max_iter=100, learning_rate=0.1, max_depth=None, # No limit by default categorical_features='from_dtype' # Auto-detect categorical ) model.fit(X_train, y_train) ``` ### Other Ensemble Methods **AdaBoost** - Adaptive boosting focusing on misclassified samples - Key parameters: `n_estimators`, `learning_rate`, `estimator` (base estimator) - Use when: Simple boosting approach needed - Example: ```python from sklearn.ensemble import AdaBoostClassifier model = AdaBoostClassifier(n_estimators=50, learning_rate=1.0) model.fit(X_train, y_train) ``` **Voting Classifier / Regressor** - Combines predictions from multiple models - Types: 'hard' (majority vote) or 'soft' (average probabilities) - Use when: Want to ensemble different model types - Example: ```python from sklearn.ensemble import VotingClassifier from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC model = VotingClassifier( estimators=[ ('lr', LogisticRegression()), ('dt', DecisionTreeClassifier()), ('svc', SVC(probability=True)) ], voting='soft' ) model.fit(X_train, y_train) ``` **Stacking Classifier / Regressor** - Trains a meta-model on predictions from base models - More sophisticated than voting - Key parameter: `final_estimator` (meta-learner) - Example: ```python from sklearn.ensemble import StackingClassifier from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC model = StackingClassifier( estimators=[ ('dt', DecisionTreeClassifier()), ('svc', SVC()) ], final_estimator=LogisticRegression() ) model.fit(X_train, y_train) ``` ## K-Nearest Neighbors **KNeighborsClassifier / KNeighborsRegressor** - Non-parametric method based on distance - Key parameters: - `n_neighbors`: Number of neighbors (default=5) - `weights`: 'uniform' or 'distance' - `metric`: Distance metric ('euclidean', 'manhattan', etc.) - Use when: Small dataset, simple baseline needed - Slow prediction on large datasets - Example: ```python from sklearn.neighbors import KNeighborsClassifier model = KNeighborsClassifier(n_neighbors=5, weights='distance') model.fit(X_train, y_train) ``` ## Naive Bayes **GaussianNB, MultinomialNB, BernoulliNB** - Probabilistic classifiers based on Bayes' theorem - Fast training and prediction - GaussianNB: Continuous features (assumes Gaussian distribution) - MultinomialNB: Count features (text classification) - BernoulliNB: Binary features - Use when: Text classification, fast baseline, probabilistic predictions - Example: ```python from sklearn.naive_bayes import GaussianNB, MultinomialNB # For continuous features model_gaussian = GaussianNB() # For text/count data model_multinomial = MultinomialNB(alpha=1.0) # alpha is smoothing parameter model_multinomial.fit(X_train, y_train) ``` ## Neural Networks **MLPClassifier / MLPRegressor** - Multi-layer perceptron (feedforward neural network) - Key parameters: - `hidden_layer_sizes`: Tuple of hidden layer sizes, e.g., (100, 50) - `activation`: 'relu', 'tanh', 'logistic' - `solver`: 'adam', 'sgd', 'lbfgs' - `alpha`: L2 regularization parameter - `learning_rate`: 'constant', 'adaptive' - Use when: Complex non-linear patterns, large datasets - Requires feature scaling - Example: ```python from sklearn.neural_network import MLPClassifier from sklearn.preprocessing import StandardScaler # Scale features first scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) model = MLPClassifier( hidden_layer_sizes=(100, 50), activation='relu', solver='adam', alpha=0.0001, max_iter=1000 ) model.fit(X_train_scaled, y_train) ``` ## Algorithm Selection Guide ### Choose based on: **Dataset size:** - Small (<1k samples): KNN, SVM, Decision Trees - Medium (1k-100k): Random Forest, Gradient Boosting, Linear Models - Large (>100k): SGD, Linear Models, HistGradientBoosting **Interpretability:** - High: Linear Models, Decision Trees - Medium: Random Forest (feature importance) - Low: SVM with RBF kernel, Neural Networks **Accuracy vs Speed:** - Fast training: Naive Bayes, Linear Models, KNN - High accuracy: Gradient Boosting, Random Forest, Stacking - Fast prediction: Linear Models, Naive Bayes - Slow prediction: KNN (on large datasets), SVM **Feature types:** - Continuous: Most algorithms work well - Categorical: Trees, HistGradientBoosting (native support) - Mixed: Trees, Gradient Boosting - Text: Naive Bayes, Linear Models with TF-IDF **Common starting points:** 1. Logistic Regression (classification) / Linear Regression (regression) - fast baseline 2. Random Forest - good default choice 3. Gradient Boosting - optimize for best accuracy