zhongwei/gh-bradleyboehmke-brads-marketplace-course-builder

Fork 0

Files

Zhongwei Li 9daed3091f Initial commit

2025-11-29 18:01:52 +08:00

8.2 KiB

Raw Blame History

Intro to Data Mining - Course Profile

Course: UC BANA 4080: Introduction to Data Mining with Python Instructor: Brad Boehmke

Audience

Student Level: Undergraduate (juniors/seniors)

Background:

Business students with foundation in calculus, statistics, and possibly regression
May have Excel experience and basic VBA exposure
Variable programming backgrounds - many are complete coding beginners
Understand business operations, customer behavior, and quantitative thinking
Know how to think critically but lack experience turning theory into practice with code

Prerequisites:

Quantitative methods and statistical inference courses
No prior programming experience required or expected
Course explicitly designed for beginners

Key Challenge: Bridging the gap between classroom theory and real-world data analysis

Learning Philosophy

Core Approach: Hands-on, immersive learning through doing

Key Principles:

Practice over theory - Students learn by working with real, messy datasets
Build confidence through action - Focus on getting students comfortable with tools before perfection
Close the theory-practice gap - Move from "knowing concepts" to "applying skills"
AI as assistant, not autopilot - Use GenAI tools (ChatGPT, Claude, Copilot) to help learn, but emphasize understanding
Collaborative learning - Build community; encourage students to help each other
Growth mindset - Normalize struggle; coding is a new language that takes time

Teaching Style:

Conversational, relatable tone (see intro chapter example)
Use storytelling and scenarios (e.g., "Taylor the intern")
Address student concerns directly (e.g., "Why learn this when AI exists?")
Set realistic expectations about difficulty
Encourage persistence and resilience

Unique Aspects:

Explicitly addresses role of GenAI in learning process
Balances AI assistance with foundational skill building
Uses real-world business contexts students can relate to

Technical Stack

Core Environment:

Python (chosen for beginner-friendliness + professional power)
Jupyter Lab/Notebooks (primary development environment)
Google Colab (cloud-based option for students)
Quarto (for textbook and slides)
Virtual environment (venv for package management)

Primary Libraries (Weeks 1-6):

pandas - data manipulation and DataFrames
numpy - numerical computation
matplotlib - basic visualization
seaborn - statistical visualization

Machine Learning Libraries (Weeks 8-13):

scikit-learn - all ML models and evaluation

Additional Tools:

CSV/Excel file handling
Basic SQL concepts (joins in pandas)
Git/GitHub for assignment submission

File Formats:

Quarto markdown (.qmd) for book chapters
Jupyter notebooks (.ipynb) for examples, labs, homework
Real datasets (CSV, Excel) in /data/ directory

Content Style

Writing Style:

Conversational and approachable - Not dry or overly academic
Student-focused - Addresses "you" directly
Motivational - Builds confidence, normalizes struggle
Practical - Always tied to real-world application
Honest - Acknowledges difficulties, doesn't sugar-coat challenges

Explanation Approach:

Start with WHY - Motivate the topic before diving in
Use analogies and stories - Make abstract concepts concrete
Show, don't just tell - Working code examples over theory
Progressive complexity - Start simple, build gradually
Address common questions - Anticipate student concerns

Examples:

Use relatable business scenarios (customer data, marketing analytics, retail transactions)
Work with messy, real-world datasets (not clean, perfect examples)
Include visual aids heavily (plots, diagrams, screenshots)
Provide executable code that students can run and modify

Pedagogical Elements:

Callout boxes for tips, warnings, reflections, and examples
Student reflection prompts to encourage metacognition
Exercises that build on chapter concepts
Code comments that explain what's happening
Error messages and debugging guidance

Depth:

Prioritize intuition over mathematical rigor
Show code implementation before heavy theory
Balance "just enough math" with practical application
Focus on interpretation and application over derivations

Key Topics

Module 1: Python Fundamentals (Week 1)

Course intro + motivation
Variables, data types, basic operators
Why Python? Why not just use AI?
Setting up environment

Module 2: Jupyter & Data Structures (Week 2)

Jupyter notebooks and reproducible workflows
Lists, dictionaries, tuples
Pandas introduction
Importing CSV data

Module 3: Data Wrangling (Week 3)

DataFrame manipulation
Filtering and subsetting
Aggregating data
GroupBy operations

Module 4: Advanced Data Manipulation (Week 4)

Working with dates and times
String operations
Relational data and joins (SQL-style in pandas)
Merging DataFrames

Module 5: Data Visualization (Week 5)

Matplotlib basics
Seaborn for statistical plots
Exploratory data analysis with visuals
Best practices for effective visualization

Module 6: Writing Efficient Code (Week 6)

Control flow (if/else, loops)
Functions and modularity
List comprehensions
Code efficiency and readability

Week 7: Midterm Project

Application of Modules 1-6
Work with messy, real datasets
Open-ended analysis problem

Module 7: Machine Learning Intro (Week 8)

What is ML and when to use it?
Train/test split
Features and labels
Model building process

Module 8: Regression (Week 9)

Correlation analysis
Linear regression with scikit-learn
Model evaluation (R², RMSE)
Interpretation

Module 9: Classification (Week 10)

Logistic regression
Classification metrics (accuracy, precision, recall, F1)
Confusion matrices
When to use classification vs regression

Module 10: Tree-Based Models (Week 11)

Decision trees
Random forests
Feature importance
Model interpretation

Module 11: Model Optimization (Week 12)

Feature engineering
Cross-validation
Hyperparameter tuning (GridSearchCV)
Model selection

Module 12: Advanced Topics (Week 13)

Unsupervised learning (clustering, PCA)
Deep learning overview
Introduction to LLMs and GenAI concepts

Week 14: Final Project

Comprehensive data science project
Full pipeline from data cleaning to modeling

Assessment Approach

Grading Components:

Labs - Weekly hands-on activities (Thursdays)
Homework - Applied assignments (with answer keys for instructor)
Midterm Project - Comprehensive application of Modules 1-6
Final Project - End-to-end data science project
Quizzes - Knowledge checks (materials in /planning/quizzes/)

Student Support:

Canvas discussion boards for peer collaboration
Office hours
Answer keys provided for labs and homework (instructor use)
Multiple formats (notebook, HTML, PDF) for accessibility

GenAI Policy:

Encouraged to use ChatGPT, Claude, Copilot as learning aids
Required to understand code, not just copy it
Emphasis on using AI to learn, not to avoid learning
Students asked to reflect on AI tool use and limitations

Project Structure:

Templates provided for major assignments
Rubrics included in /planning/projects/
Real-world datasets required
Open-ended problems that require creative problem-solving

Content Format

Textbook: Quarto book with modules 1-6 + appendices Slides: Weekly presentations using Quarto + Reveal.js Examples: Numbered sequence of Jupyter notebooks (01-17) Labs: Weekly hands-on activities with answer keys Homework: Individual assignments with solutions in multiple formats Datasets: Real-world data in /data/ directory (retail, airlines, housing, etc.)

Course Materials Repository

All materials maintained in Git repository with structure:

/book/ - Textbook chapters
/slides/ - Weekly presentations
/example-notebooks/ - Companion code examples
/labs/ - Hands-on activities
/homework/ - Assignments
/data/ - Datasets
/planning/ - Instructor resources (Canvas docs, rubrics, quizzes)

8.2 KiB Raw Blame History