Difference between revisions of "De Veaux Map"

From Sean_Carver
Jump to: navigation, search
(Chapter 15: Random Variables)
(Chapter 6: Scatterplots, Association, and Correlation)
 
(26 intermediate revisions by the same user not shown)
Line 89: Line 89:
 
::::"Straight enough condition,"  
 
::::"Straight enough condition,"  
 
::::"No outliers condition"
 
::::"No outliers condition"
 +
:::Correlation Properties
 +
:::How strong is strong?
 +
:::Measuring trend: Kendall's tau
 +
:::Nonparametric association: Spearman's Rho
 
* 6.3: Warning: Correlation Does Not Equal Causation
 
* 6.3: Warning: Correlation Does Not Equal Causation
 
* 6.4: Straightening Scatterplots
 
* 6.4: Straightening Scatterplots
Line 259: Line 263:
  
 
* 15.1: Center: The Expected Value
 
* 15.1: Center: The Expected Value
 +
:::Definition of a random variable
 +
:::Discrete random variables (can "list" all the outcomes)
 +
:::Continuous random variables (not discrete)
 
:::Probability models for discrete random variables
 
:::Probability models for discrete random variables
 
:::Computation of expected value for discrete random variables
 
:::Computation of expected value for discrete random variables
Line 296: Line 303:
 
:::The uniform distribution
 
:::The uniform distribution
 
:::The exponential model
 
:::The exponential model
 +
 +
== Part V: From the Data at Hand to the World at Large ==
 +
 +
=== Chapter 17: Sampling Distribution Models ===
 +
 +
* 17.1: Sampling Distribution of a Proportion
 +
:::Often, the Normal model well fits the sampling distribution for proportion
 +
:::Which Normal? Mean/standard deviation for Normal approximation to the sampling distribution for proportions
 +
:::Sampling variability
 +
* 17.2: When Does the Normal Model Work Well?  Assumptions and Conditions (for proportions)
 +
:::The independence assumption
 +
:::The randomization condition
 +
:::The 10% condition
 +
:::The success/failure condition
 +
* 17.3: The Sampling Distributions of Other Statistics
 +
:::Simulating the sampling distributions of other statistics
 +
::::Medians
 +
::::Variances
 +
::::Minimums
 +
:::Simulating the sampling distribution of a mean
 +
* 17.4: The Central Limit Theorem: The Fundamental Theorem of Statistics
 +
:::Statement of theorem
 +
:::Assumptions and conditions
 +
:::But which Normal: Mean and standard deviation for sampling distributions for means
 +
* 17.5: Sampling Distributions: A Summary
 +
 +
=== Chapter 18: Confidence Intervals for Proportions ===
 +
 +
* 18.1: A Confidence Interval
 +
:::The standard error
 +
:::What a confidence interval says about a parameter
 +
* 18.2: Interpreting Confidence Intervals: What Does 95% Confidence Really Mean
 +
* 18.3: Margin of Error: Certainty vs. Precision
 +
:::Margin of error
 +
:::How the margin of error depends upon the confidence level
 +
:::Critical values
 +
* 18.4: Assumptions and Conditions
 +
:::Independence assumption
 +
::::Independence condition
 +
::::Randomization condition
 +
::::10% condition
 +
:::Sample size assumption
 +
::::Success/failure condition
 +
 +
=== Chapter 19: Testing Hypotheses About Proportions ===
 +
 +
* 19.1: Hypotheses
 +
:::The null hypothesis
 +
:::The alternative hypothesis
 +
:::A trial (criminal justice) as a hypothesis test
 +
* 19.2: P-Values
 +
:::Definition of P-value
 +
:::What to do with an "innocent" defendant (verdict: ''not guilty'')
 +
* 19.3: The Reasoning of Hypothesis Testing
 +
:::1. Hypotheses (pose hypotheses)
 +
:::2. Model (verify problem satisfies conditions)
 +
:::3. Mechanics (perform calculations)
 +
:::4. Conclusion (interpret results)
 +
* 19.4: Alternative Alternatives
 +
:::Two-sided alternative
 +
:::One-sided alternative
 +
* 19.5: P-Values and Decisions: What to Tell About a Hypothesis Test
 +
:::Discussion of when a p-value is small enough (no threshold yet)
 +
 +
=== Chapter 20: Inference About Means ===
 +
 +
* 20.1: Getting Started: The Central Limit Theorem (Again)
 +
:::For means, population standard deviation is required, sample standard deviation is all we have
 +
* 20.2: Gosset's t
 +
:::t-Distribution versus Normal distribution
 +
:::Degrees of freedom
 +
:::What did Gosset see?
 +
:::A confidence interval for means
 +
:::A practical sampling distribution model for means
 +
:::One-sample t-interval for the mean
 +
:::Assumptions and Condition
 +
::::Independence assumption (randomization condition)
 +
::::Normal population assumption (nearly normal condition)
 +
::::Relationship to sample size
 +
:::Using Table T to find t-Values
 +
* 20.3: Interpreting Confidence Intervals
 +
* 20.4: A Hypothesis Test for the Mean
 +
:::One-sample t-test for the mean
 +
:::Intervals and tests (relationship)
 +
:::The special case of proportions (relationship above differs)
 +
* 20.5: Choosing the Sample Size
 +
 +
=== Chapter 21: More About Tests and Intervals ===
 +
 +
* 21.1: Choosing Hypotheses
 +
* 21.2: How to Think About P-Values
 +
:::The P-value is ''not'' the probability that the null hypothesis is true
 +
:::What to do with a small P-value
 +
::::A small p-value does not imply a large effect
 +
:::What to do with a high P-value
 +
::::A big p-value does not prove the null hypothesis
 +
* 21.3: Alpha Levels
 +
:::Alpha levels and statistical significance
 +
:::Where did the value 0.05 come from?
 +
:::Practical vs. statistical significance
 +
* 21.4: Critical Values for Hypothesis Tests
 +
:::Table T
 +
:::A confidence interval for small samples
 +
:::Confidence intervals and hypothesis tests
 +
* 21.5: Errors
 +
:::Type I errors
 +
:::Type II errors
 +
:::Probabilities defined as alpha and beta
 +
:::Power
 +
:::Effect size
 +
:::Pictures of errors
 +
:::Reducing both type I and type II errors
 +
 +
== Part VI: Accessing Associations Between Variables ==
 +
 +
=== Chapter 22: Comparing Groups ===
 +
 +
* 22.1: The Standard Deviation of a Difference
 +
:::The standard deviation of the difference between two proportions
 +
* 22.2: Assumptions and Conditions for Comparing Proportions
 +
:::Independence
 +
::::Independence assumption
 +
::::Randomization condition
 +
::::The 10% condition
 +
::::Independence groups assumptions
 +
:::Sample Size
 +
:::Success/failure condition for both groups
 +
* 22.3: A Confidence Interval for the Difference Between Two Proportions
 +
:::The sampling distribution model for a difference between two independent proportions
 +
:::A two-proportion z-interval
 +
:::Two-proportion z-test
 +
* 22.4: The Two Sample z-Test: Testing for the Difference Between Proportions
 +
:::Pooling for tests of equal proportions
 +
* 22.5: A Confidence Interval for the Difference Between Two Means
 +
:::The standard error for the difference between two means
 +
:::Two-sample t-interval
 +
:::Degrees of freedom and the two sample t-distribution
 +
:::Assumptions and conditions
 +
::::Independence
 +
::::Normal population (nearly normal condition, sample size)
 +
:::A note about independent groups
 +
* 22.6: The Two-Sample t-Test: Testing for the Difference Between to Means
 +
* [Unnumbered section, labeled optional]: Tukey's Quick Test
 +
* [Unnumbered section, labeled optional]: A Rank Sum Test
 +
* 22.7: The Pooled t-Test: Everyone into the Pool?
 +
:::Details of the pooled t-test
 +
:::Equal variance assumption (similar spreads condition)
 +
:::Pooled t-test and confidence interval for means
 +
:::Is the pool all wet (when to use a pooled t-test)
 +
:::Pooling (discussion and in more general contexts)
 +
 +
=== Chapter 23: Paired Samples and Blocks ===
 +
 +
* 23.1: Paired Data
 +
* 23.2: Assumptions and Conditions
 +
:::Paired data condition
 +
:::Independence assumption (differences independent)
 +
:::Normal population assumption
 +
::::Nearly normal condition
 +
::::Sample size
 +
* 23.3: Confidence Intervals for Matched Pairs
 +
:::Paired t-interval
 +
:::Effect size
 +
* 23.4: Blocking
 +
 +
=== Chapter 24: Comparing Counts ===
 +
 +
* 24.1: Goodness-of-Fit Tests
 +
* 24.2: Chi-Square Test of Homogeneity
 +
* 24.3: Examining the Residuals
 +
* 24.4: Chi-Square Tests of Independence
 +
 +
=== Chapter 25: Inferences for Regression ===
 +
 +
== Part VII: Inference When Variables are Related ==
 +
 +
=== Chapter 26: Analysis of Variance ===
 +
 +
=== Chapter 27: Multifactor Analysis of Variance ===
 +
 +
=== Chapter 28: Multiple Regression ===
 +
 +
=== Chapter 29: Multiple Regression Wisdom ===

Latest revision as of 01:30, 21 November 2018

Contents

Part I: Exploring and Understanding Data

Chapter 1: Exploring and Understanding Data

  • 1.1: What is Statistics?
  • 1.2: Data
  • 1.3: Variables
Types of variables: Quantitative, identifier, ordinal, categorical (categorical & nominal considered synonyms)

Chapter 2: Displaying and Describing Categorical Data

  • 2.1: Summarizing and Displaying a Single Categorical Variable
The area principle
Frequency tables
Bar charts
Pie charts
  • 2.2: Exploring the Relationship Between Two Categorical Variables
Contingency tables
Conditional distributions
Independence
Plotting conditional distributions (with pie charts, bar charts and segmented bar charts)

Chapter 3: Displaying and Displaying Quantitative Data

  • 3.1: Displaying Quantitative Variables
Histograms
Stem and leaf displays
Dotplots
  • 3.2: Shape
Unimodal, bimodal or multimodal
Symmetric or skewed
Outliers
  • 3.3: Center
Median
  • 3.4: Spread
Range, min, max
Interquartile range, Q1, Q3
  • 3.5: Boxplots and 5-Number Summaries
  • 3.6: The Center of a Symmetric Distribution: The Mean
Mean or Median?
  • 3.7: The Spread of a Symmetric Distribution: The Standard Deviation
Formulas for variance and standard deviation
Thinking about variation
  • 3.8: Summary---What to Tell About a Quantitative Variable

Chapter 4: Understanding and Comparing Distributions

  • 4.1: Comparing Groups with Histograms
  • 4.2: Comparing Groups with Boxplots
  • 4.3: Outliers
  • 4.4: Timeplots
  • 4.5: Re-Expressing Data: A First Look
...To improve symmetry
...To equalize spread across groups

Chapter 5: The Standard Deviation as a Ruler and the Normal Model

  • 5.1: Standardizing with z-Scores
  • 5.2: Shifting and Scaling
Shifting to adjust the center
Rescaling to adjust the scale
Shifting, scaling and z-Scores
  • 5.3: Normal Models
The "nearly normal condition"
The 68-95-99.7 Rule
Working with pictures of the Normal curve
Inflection points at mean +/- one standard deviation
Interpretation of area under Normal curve as proportion of observations in interval (implied by pictures and exposition)
  • 5.4: Finding Normal Percentiles
Normal percentiles
Other models
From percentiles to scores: z in reverse
  • 5.5: Normal Probability Plots

Part II: Exploring Relationships Between Variables

Chapter 6: Scatterplots, Association, and Correlation

  • 6.1: Scatterplots
Direction (negative or positive)
Form
Strength
Outliers
Explanatory and response variables
  • 6.2: Correlation
Formula
Assumptions and conditions for correlation, including...
"Quantitative variables condition,"
"Straight enough condition,"
"No outliers condition"
Correlation Properties
How strong is strong?
Measuring trend: Kendall's tau
Nonparametric association: Spearman's Rho
  • 6.3: Warning: Correlation Does Not Equal Causation
  • 6.4: Straightening Scatterplots

Chapter 7: Linear Regression

  • 7.1 Least Squares: The Line of "Best Fit"
The linear model
Predicted values and residuals
The least squares line and the sense in which it is the best fit
  • 7.2 The Linear Model
Using the linear model to make predictions
  • 7.3 Finding the Least Squares Line
Formulas for slope and intercept
  • 7.4 Regression to the Mean
Etiology of the word "Regression"
Math Box: Derivation of regression formula
  • 7.5 Examining the Residuals
Formula for residuals
Appropriate (lack of) form of Residuals versus x-Values plot
The residual standard deviation
  • 7.6 R^2---The Variation Accounted For by the Model
How big should R^2 be?
Predicting in the other direction---A tale of two regressions
  • 7.7 Regression Assumptions and Conditions
"Quantitative variable" condition
"Straight enough" condition
"Outlier" condition
"Does the plot thicken?" condition
Judging the conditions with the residuals-versus-predicted-values plot

Chapter 8: Regression Wisdom

  • 8.1: Examining Residuals
Getting the "bends": When the residuals aren't straight
Sifting residuals for groups
Subsetting with a categorical variable
  • 8.2: Extrapolation: Reaching Beyond the Data
Warning with extrapolation
Warning with predicting what will happen to cases in the regression if they were changed
  • 8.3: Outliers, Leverage, and Influence
  • 8.4: Lurking Variables and Causation
  • 8.5: Working with Summary Values

Chapter 9: Re-expressing Data: Get It Straight!

  • 9.1: Straightening Scatterplots -- The Four Goals
Goal 1: Make the distribution of a variable more symmetric.
Goal 2: Make the spread of several groups more alike, even if their centers differ
Goal 3: Make the form of a scatterplot more nearly linear
Goal 4: Make the scatter in a scatterplot spread out evenly rather than thinkening at one end
Recognizing when a re-expression can help
  • 9.2: Finding a Good Re-Expression
Plan A: The ladder of powers
Re-expressing to straighten a scatterplot
Comparing re-expressions
Plan B: Attack of the logarithms
Multiple benefits to re-expressions
Why not just fit a curve?

Part III: Gathering Data

Chapter 10: Understanding Randomness

  • 10.1: What Is Randomness?
Meaning of the word "random"
Discussion of the process of generating random numbers
  • 10.2: Simulating by Hand
Basic terminology: Simulations, trials, components, response variable

Chapter 11: Sample Surveys

  • 11.1: The Three Big Ideas of Sampling
Idea 1: Examine a part of the whole
Population versus sample
Bias
Idea 2: Randomize
Idea 3: It's the sample size
Sample size
Does a census make sense
  • 11.2: Populations and Parameters
  • 11.3: Simple Random Samples
Sampling frame
Sampling variability
  • 11.4: Other Sampling Designs
Stratified sampling
Cluster sampling
Multistage sampling
Systematic sampling
  • 11.5: From the Population to the Sample: You Can't Always Get What You Want
  • 11.6: The Valid Survey
Know what you want to know
Tune your instrument
Ask specific rather than general questions
Ask for quantitative results when possible
Be careful in phrasing questions
Pilot studies
  • 11.7: Common Sampling Mistakes or How to Sample Badly
Mistake 1: Sample volunteers
Mistake 2: Sample convieniently
Mistake 3: Use a bad sampling frame
Mistake 4: Undercoverage
Nonresponse bias
Response bias
How to think about biases
Look for biases in any survey you encounter
Spend your time and resources reducing biases
Think about the members of the population who could have been excluded from your study
Always report your sampling methods in detail

Chapter 12: Experiments and Observational Studies

  • 12.1: Observational Studies
Observational studies
Retrospective studies
Prospective studies
  • 12.2: Randomized, Comparative Experiments
Random assignment of subjects to treatments
Explanatory variables, factors and levels
Response variables
  • 12.3: The Four Principles of Experimental Design
Principle 1: Control
Principle 2: Randomize
Principle 3: Replicate
Principle 4: Block
Diagramming experiments
Statistically significant differences between groups
Contrasting experiments and samples
  • 12.4: Control Treatments
Blinding (single and double)
Placebos
  • 12.5: Blocking
Matched participants
  • 12.6: Confounding
Lurking or confounding

Part IV: Randomness and Probability

Chapter 13: From Randomness to Probability

  • 13.1: Random Phenomena
"A random phenomenon is a situation in which we know what outcomes can possibly occur, but we don't know which particular outcome will happen"
Trials
Outcomes
Sample space
Events
The law of large numbers
Empirical probability
The nonexistent law of averages
  • 13.2: Modeling Probability
Theoretical probability
Personal probability
  • 13.3: Formal Probability
The five rules of probability
Rule 1: A probability must be a number between 0 and 1
Rule 2: Probability assignment rule: The probability of a the sample space must be 1
Rule 3: The complement rule
Rule 4: The addition rule
Rule 5: The multiplication rule

Chapter 14: Probability Rules!

  • 14.1: The General Addition Rule
  • 14.2: Conditional Probability and the General Multiplication Rule
  • 14.3: Independence
  • 14.4: Picturing Probability: Tables, Venn Diagrams, and Trees
  • 14.5: Reversing the Conditioning and Bayes' Rule

Chapter 15: Random Variables

  • 15.1: Center: The Expected Value
Definition of a random variable
Discrete random variables (can "list" all the outcomes)
Continuous random variables (not discrete)
Probability models for discrete random variables
Computation of expected value for discrete random variables
  • 15.2: Spread: The Standard Deviation
Computation of variance and standard deviation for discrete random variables
  • 15.3: Shifting and Combining Random Variables
E(X +/- c)
Var(X +/- c)
E(aX)
Var(aX)
E(X +/- Y)
Var(X +/- Y), when X and Y are independent
  • [Unnumbered section, labeled optional]: Correlation and Covariance
Covariance of two random variables
Var(X +/- Y), when X and Y covary
Correlation of two random variables
  • 15.4: Continuous Random Variables
The Normal random variable as an example of a continuous random variable
Caption to Figure 15.1: Interpretation of area under Normal curve as probability of finding an observation in the interval.
How can every value have a probability 0?
Sums of independent Normal random variables are Normal.

Chapter 16: Probability Models

  • 16.1: Bernoulli Trials
  • 16.2: The Geometric Model
Independence
The 10% condition
  • 16.3: The Binomial Model
Binomial probabilities and the binomial model
Binomial coefficients
  • 16.4: Approximating the Binomial Model with a Normal Model
The success/failure condition
  • 16.5: The Continuity Correction
  • 16.6: The Poisson Model
  • 16.7: Other Continuous Random Variables: The Uniform and the Exponential
The uniform distribution
The exponential model

Part V: From the Data at Hand to the World at Large

Chapter 17: Sampling Distribution Models

  • 17.1: Sampling Distribution of a Proportion
Often, the Normal model well fits the sampling distribution for proportion
Which Normal? Mean/standard deviation for Normal approximation to the sampling distribution for proportions
Sampling variability
  • 17.2: When Does the Normal Model Work Well? Assumptions and Conditions (for proportions)
The independence assumption
The randomization condition
The 10% condition
The success/failure condition
  • 17.3: The Sampling Distributions of Other Statistics
Simulating the sampling distributions of other statistics
Medians
Variances
Minimums
Simulating the sampling distribution of a mean
  • 17.4: The Central Limit Theorem: The Fundamental Theorem of Statistics
Statement of theorem
Assumptions and conditions
But which Normal: Mean and standard deviation for sampling distributions for means
  • 17.5: Sampling Distributions: A Summary

Chapter 18: Confidence Intervals for Proportions

  • 18.1: A Confidence Interval
The standard error
What a confidence interval says about a parameter
  • 18.2: Interpreting Confidence Intervals: What Does 95% Confidence Really Mean
  • 18.3: Margin of Error: Certainty vs. Precision
Margin of error
How the margin of error depends upon the confidence level
Critical values
  • 18.4: Assumptions and Conditions
Independence assumption
Independence condition
Randomization condition
10% condition
Sample size assumption
Success/failure condition

Chapter 19: Testing Hypotheses About Proportions

  • 19.1: Hypotheses
The null hypothesis
The alternative hypothesis
A trial (criminal justice) as a hypothesis test
  • 19.2: P-Values
Definition of P-value
What to do with an "innocent" defendant (verdict: not guilty)
  • 19.3: The Reasoning of Hypothesis Testing
1. Hypotheses (pose hypotheses)
2. Model (verify problem satisfies conditions)
3. Mechanics (perform calculations)
4. Conclusion (interpret results)
  • 19.4: Alternative Alternatives
Two-sided alternative
One-sided alternative
  • 19.5: P-Values and Decisions: What to Tell About a Hypothesis Test
Discussion of when a p-value is small enough (no threshold yet)

Chapter 20: Inference About Means

  • 20.1: Getting Started: The Central Limit Theorem (Again)
For means, population standard deviation is required, sample standard deviation is all we have
  • 20.2: Gosset's t
t-Distribution versus Normal distribution
Degrees of freedom
What did Gosset see?
A confidence interval for means
A practical sampling distribution model for means
One-sample t-interval for the mean
Assumptions and Condition
Independence assumption (randomization condition)
Normal population assumption (nearly normal condition)
Relationship to sample size
Using Table T to find t-Values
  • 20.3: Interpreting Confidence Intervals
  • 20.4: A Hypothesis Test for the Mean
One-sample t-test for the mean
Intervals and tests (relationship)
The special case of proportions (relationship above differs)
  • 20.5: Choosing the Sample Size

Chapter 21: More About Tests and Intervals

  • 21.1: Choosing Hypotheses
  • 21.2: How to Think About P-Values
The P-value is not the probability that the null hypothesis is true
What to do with a small P-value
A small p-value does not imply a large effect
What to do with a high P-value
A big p-value does not prove the null hypothesis
  • 21.3: Alpha Levels
Alpha levels and statistical significance
Where did the value 0.05 come from?
Practical vs. statistical significance
  • 21.4: Critical Values for Hypothesis Tests
Table T
A confidence interval for small samples
Confidence intervals and hypothesis tests
  • 21.5: Errors
Type I errors
Type II errors
Probabilities defined as alpha and beta
Power
Effect size
Pictures of errors
Reducing both type I and type II errors

Part VI: Accessing Associations Between Variables

Chapter 22: Comparing Groups

  • 22.1: The Standard Deviation of a Difference
The standard deviation of the difference between two proportions
  • 22.2: Assumptions and Conditions for Comparing Proportions
Independence
Independence assumption
Randomization condition
The 10% condition
Independence groups assumptions
Sample Size
Success/failure condition for both groups
  • 22.3: A Confidence Interval for the Difference Between Two Proportions
The sampling distribution model for a difference between two independent proportions
A two-proportion z-interval
Two-proportion z-test
  • 22.4: The Two Sample z-Test: Testing for the Difference Between Proportions
Pooling for tests of equal proportions
  • 22.5: A Confidence Interval for the Difference Between Two Means
The standard error for the difference between two means
Two-sample t-interval
Degrees of freedom and the two sample t-distribution
Assumptions and conditions
Independence
Normal population (nearly normal condition, sample size)
A note about independent groups
  • 22.6: The Two-Sample t-Test: Testing for the Difference Between to Means
  • [Unnumbered section, labeled optional]: Tukey's Quick Test
  • [Unnumbered section, labeled optional]: A Rank Sum Test
  • 22.7: The Pooled t-Test: Everyone into the Pool?
Details of the pooled t-test
Equal variance assumption (similar spreads condition)
Pooled t-test and confidence interval for means
Is the pool all wet (when to use a pooled t-test)
Pooling (discussion and in more general contexts)

Chapter 23: Paired Samples and Blocks

  • 23.1: Paired Data
  • 23.2: Assumptions and Conditions
Paired data condition
Independence assumption (differences independent)
Normal population assumption
Nearly normal condition
Sample size
  • 23.3: Confidence Intervals for Matched Pairs
Paired t-interval
Effect size
  • 23.4: Blocking

Chapter 24: Comparing Counts

  • 24.1: Goodness-of-Fit Tests
  • 24.2: Chi-Square Test of Homogeneity
  • 24.3: Examining the Residuals
  • 24.4: Chi-Square Tests of Independence

Chapter 25: Inferences for Regression

Part VII: Inference When Variables are Related

Chapter 26: Analysis of Variance

Chapter 27: Multifactor Analysis of Variance

Chapter 28: Multiple Regression

Chapter 29: Multiple Regression Wisdom