Skip to main content
Harvard
Computer Science
4 credits

Harvard CS 109A: Data Science 1: Introduction to Data Science

CS 109A — cross-listed as Stat 109A — is the first half of Harvard's data science sequence: data wrangling, exploratory analysis, regression, classification, and model evaluation in Python. Past course materials are published openly on the teaching team's site, giving it a large self-study audience beyond enrolled students.

Fennie is independent and not affiliated with Harvard University. This is an unofficial study guide.

Build my CS 109A study plan

What makes it hard

The course sits at the junction of programming, statistics, and judgment — homeworks demand clean pandas code, correct inference, and sensible modeling decisions all at once. Students with only one of those legs (strong coders weak on stats, or vice versa) feel the missing leg on every assignment.

What you'll cover

  • Data wrangling with Python and pandas
  • Exploratory data analysis and visualization
  • Linear and logistic regression
  • Model selection and regularization
  • Classification and k-NN
  • Cross-validation and model evaluation

The CS 109A study guide

How to study for Harvard CS 109A, step by step.

  1. 1

    Audit both legs: Python and statistics

    CS 109A assumes CS50-level programming and Stat 100-level statistics, and homeworks punish whichever one is weaker. Identify your weak leg in week one and put deliberate practice there.

  2. 2

    Rebuild the lab notebooks from scratch

    Running provided notebooks feels like learning and isn't. After each lab, recreate the analysis in a blank notebook from the raw data — that's the skill homeworks actually grade.

  3. 3

    Narrate every modeling decision

    Why this model, why these features, why this validation split — write one sentence per choice. The graders reward reasoning, and the habit catches errors before they propagate.

  4. 4

    Keep a personal log of pandas and sklearn gotchas

    Index alignment, data leakage, fit-versus-transform — the same handful of traps cost points all semester. Recording each one once prevents paying for it twice.

  5. 5

    Pace the pipeline with Fennie

    Upload the CS 109A syllabus — or the public materials if you're self-pacing — and Fennie's Daily Plan schedules labs and homework stages to your deadlines, with quizzes on the statistical concepts generated from the actual course content. Free to start.

    Start my CS 109A plan free

How Fennie helps with CS 109A

Fennie's Daily Plans pace CS 109A's lab-and-homework pipeline so the statistics review happens before the code needs it. Chat through why a model overfits or what a regularization penalty is doing, and quiz yourself on the inference concepts that separate working code from correct analysis.

FAQ

Is CS 109A hard?

It's demanding through breadth — programming, statistics, and modeling judgment in every homework. Students solid in Python and intro statistics find the workload heavy but fair.

What's the difference between CS 109A and Stat 110?

Stat 110 is probability theory; CS 109A is applied data science — wrangling, regression, and machine learning practice in Python. Stat 110 is recommended background, not a substitute.

Can I self-study CS 109A online?

Yes — past offerings publish lectures, labs, and homework notebooks openly. Work the homeworks honestly rather than reading the solutions; the judgment is built by doing.

Pass CS 109A with a plan, not a cram

Upload your CS 109A materials and Fennie generates a Daily Plan paced to your deadline — plus chat, flashcards, and quizzes built from the actual course content.

Get started free

More Harvard courses