2009
STAT 430: EMPIRICAL METHODS FOR COMPUTER SCIENCE RESEARCH
Notes: one file |
WebCT 
|
Syllabus: syllabus
Office hours: [Dorman] Friday 9-10am in Science II 534; [Koh] Tuesday 2-4pm in Snedecor 3205
News
- [2009-12-24] Grades posted on WebCT
and with Registrar. Graded projects, HW5, and exams available after Jan. 12. There is project feedback for everyone, especially those who asked.
- [2009-12-21] Exam detailed solutions posted and final exam grades posted on WebCT?. Sorry for the delay. I was sick for a couple of days. When I finish with the projects and decide final grades, I will email. You do not need to check back.
- [2009-12-15] HW5 solutions posted.
- [2009-12-10] HW5 will be accepted if turned in by Monday at 5pm. HW4 grades have been posted. I forgot to return them today, so if you want to see them graded, please stop by office hours tomorrow or otherwise let me know.
- [2009-12-07] Tomorrow's lottery order: Kodavali, Pandit, Sridharan; Vanderplas; Abhijeet, Cho, Vammi, Walia; Sandeep, Strasburg; Mistry, Pett, but everyone should come prepared to listen and participate. Also, this will be the only time to ask questions (in class) about the final, so prepare any questions.
- [2009-12-02] Typo corrected in 3(b) of HW5. Typo correction in 4(b).
- [2009-12-01] Based on discussion today you need not turn in HW5. Instead, your written report will also count as your HW5 grade. If you turn in HW5, it will be business as usual. See project information for more details about the trade-off.
- [2009-12-01] Project Information, including sample presentations, writeups, and expectations. After consultation with a colleague I have decided to put a 6 page limit on the report. Don't dump all your results into this written report. Pick and choose the results that help you tell an interesting story. Scientific publication works that way...
- [2009-11-28] HW5 updated. This homework looks long, but it has very few challenges. It reads more as a tutorial. It will take some time though, so don't wait until the last minute. And if you need experience working with multiple regression (linear or otherwise), it might be a good tutorial to run through asap.
- [2009-11-27] HW5 posted.
- [2009-11-21] First draft of lecture notes for all multiple regression lectures plus nonlinear regression posted. Let me know of typos.
- [2009-11-13] HW3 graded; WebCT
updated with HW2/3 grades
- [2009-11-10] Resource list
- [2009-11-08] HW4 posted.
- Old News
Homework
- HW5 | soln
- HW4 | soln
- dye.txt
- lh.txt
- What nonparametric test for #3? Try generalizing the Kruskal-Wallis to multiple fixed effects. In our description of KW, SSB is the sum-of-squares for the one-effect, but in two-way, you have SSA and SSB. Each one in the Kruskal-Wallis statistic is asymptotically chi-squared (and independent), but of course, you need to do two tests, one for each effect.
- HW3:
- HW2: read.txt | titanic.csv | titanice.csv
- What is a probability plot? I mean the normal probability plot we have now discussed in class. Also, I flip the x and y axis relative to lecture. It shouldn't matter how you plot it. The question is whether the points fall on a line.
- solutions
- R code for Question 3
- R code for Question 4
- HW1 | solutions | free-standing R code for 6(d)
- Questions and Corrections
- data set 'Cars93?' not found: The Cars93? dataset is in the MASS library. Load the library first: library(MASS).
- 1(e) and 1(f) should be answered using the full dataset.
- 3. Y bar is the mean of SIX (not size) five-minute counts
- 3. The limits of the probability are 1175 and 1225 for both intervals. PDF file updated.
- Free-standing R code for 6(d) demonstrates EM algorithm
. It is not required material for this course, but it may be worth your while to know it exists. The method is useful for maximizing the likelihood when there is some kind of hidden data (e.g. the sampled units come from different populations, but you don't know which unit belongs to which populations; the genotype when you observe phenotype, whether a next generation sequencing read contains an error; where the binding site is upstream of a gene, when you know there is a binding site; which feature(s) made a face recognizable; the photographed object, when the image is blurry; etc.)
Lectures
- Markov chains: outline [updated 2009-12-03 after errors discovered in lecture] covers more than we'll manage this week
- Linear Regression outline [draft updated 2009-11-21, covers all lectures]
- simple linear regression: least squares estimators, maximum likelihood esimators, sampling distributions, hypothesis testing, confidence intervals
- Manual Fitting Tool
- matrix notation, random vectors, expectation/variance of linear transformations of random vectors, quadratic form and its expectation
- multiple linear regression: least squares estimation, expectation and variance of estimators, inference another's take
- logistic regression, poisson regression, nonlinear regression
- ANOVA outline
- one-way anova: model formulation, assumptions, parameter estimation, hypothesis testing, confidence intervals, Tukey method, Kruskal-Wallis nonparametric test
- two-way anova: model formulation, assumptions, parameter estimation, hypothesis testing, confidence intervals, randomized block design, Friedman nonparametric test
- [2009-10-20] outline
- chi-square test of homogeneity and independence
- McNemar?'s test
- definition and usefulness of odds ratio
- [2009-10-15] guest lecture slides
- [2009-10-08] outline
- Wilcoxon Signed Rank Test
- discussion of experimental design: placebo effect & control; selection bias and randomization; blinding; confounding variables; multiple tests
- introduction to categorical data
- Fisher's exact test
- [2009-10-06] outline
- power calculations for two sample tests of difference in population mean
- Wilcoxon Rank Sum Test
- paired two sample t test
- [2009-10-01] outline
- two-sample t-tests
- heat of fusion example
- normal probability plot, including uniform distribution (if X~F(x), then F(X)~Unif(0,1)) result useful for random number generation, see hw2
- of mice and iron example R code for Fe analysis, including idea of log transformation
- [2009-09-29] outline
- t-distribution
- all-encompassing theorem on distributions useful for sample means and sample variances
- one-sample analyses: confidence intervals, hypothesis testing (one-sample Z or t test), power calculations (Z test only)
- [2009-09-15 - 2009-09-24] outline mostly covers the same material as 2007 notes
- numerical and graphical summaries of data
- sampling distribution
- central limit theorem
- parameter estimation via maximum likelihood
- confidence intervals
- hypothesis testing: means, proportions, differences, and goodness-of-fit
- [2009-08-25] slides | outline | Notes from Stat341 (slides and handout are the same document, different formats)
Exams
Links