**Key Points from Each Chapter**

**Chapter 1**

Chapter 1 includes an extremely basic definition of data, probability, and data gathering techniques. This chapter should be skimmed in less than 5 minutes, as it is assumed that these concepts are already understood.

**Chapter 2**

Bar charts are used to display qualitative variables, with the separation of rectangles in a bar chart representing a distinct category. Histograms display quantitative variables, with the corresponding rectangular representation of data directly adjacent (for intervals of non-zero frequency) suggesting that the variables values are measured along a scale.

A scatter plot is the best tool for visualizing the relationship between two quantitative variables.

It is important to understand the difference between Mode, Median, Trimmed mean, and Mean, as shown in Figure 2.19

Variance = $[Standard Deviation]^2$

or

Standard Deviation = $sqrt(Variance)$

It is important to know roughly the value of 1, 2, and 3 standard deviations [1s, 2s, 3s, respectively] from mean.

+/- 1 s = approximately 68% of all measurements

+/- 2 s = approximately 95% of all measurements

1/- 3 s = approximately 99.7% of all measurements

Note that a **general rule of thumb** for Interquartile range calculations is that:

lower inner fence = 25th percentile - 1.5 IQR

upper inner fence = 7th percentile + 1.5 IQR

lower outer fence = 25th percentile - 3.0 IQR

upper outer fence = 75th percentile + 3.0 IQR

**Chapter 3**

If the number of outcomes of an event are equally likely and known, then classical interpretations of probably can be calculated (ex: likelihood of drawing an ace from a deck of cards).

If the probability of the outcome of an event is not known, approximations of the probability of the event occurring improve with larger sample sizes (example: the prevalence of a certain type of genetic disease in a population, and hence the probability of someone being born with that disease is more accurately estimated the larger the sample size considered)

When mutually exclusive events are considered: P(A or B) = P(A) + P(B)

Example: The probability of a coin landing heads or tails (mutually exclusive event - a coin can not land on its side) => P(heads or tails) = P(heads) + P(tails) = .5 + 5 = 1 or in % terms, 100%

When non-mutually exclusive events are considered: P(A or B) = P(A) + P(B) - P(A and B)

Example: What is the probability of a blond person or a boy being born to two randomly selected German parents? If we assume that the blond gene in Germany is prevalent at 40% (I made this number up), then:

P(blond or boy) = P(blond) + P(boy) - P(blond and boy) = .4 + .5 - (.5*.4) = .4 + .5. - .2 = .7 or in % terms, 70%.

One can think of the subtraction of P(blond and boy) as removing the effects of "double counting" blond boys from the sample set, since a blond boy only counts once, as they satisfy the criteria of being blond or a boy. We don't count them twice for being both.

Two of the **most important concepts** in this chapter is that of conditional probability and the probabilistic multiplication law:

Conditional probability involves some restriction or condition on randomness, such as the probability of some event happening *given some other conditional event*.

**Conditional probability** is defined as:

P(A|B) = $P(A and B) / P(A)$

P(A|B) is read "the probability of A given B"

The multiplication law is simply a rewrite of the definition of conditional probability.

The **multiplication law** is defined as:

P(A and B) = P(A)P(B|A) = P(B)P(A|B)

*note that if A and B are independent events, the multiplication law reduces to:

P(A and B) = P(A)P(B)

**Chapter 4**

Remember that a random variable is defined as *a quantitative (numerical) result from a random experiment.*

In general, capital letter notation (generally X, Y, or Z) signifies an observed number of events (random variables) within a particular data set. Lower case notation (generally x, y, or z) denotes possible random variables within a particular data set. Thus for a six-sided dice, x=1,2,3,4,5,or 6, e.g. a dice can return 1,2,3, etc. If I roll the dice 1000 times, we might find that P(X=1) = 1/6, P(X=2) =1/6, etc. Think of P(X=1) as "resulting calculated probability of a dice roll returning 1, based on observation of (in this case, 1000) events".

It is important to understand the implications of Definition 4.2 (p.138). I will paraphrase in simple English the implications of the 3 properties discussed in Definition 4.2 as:

1. The probability of a random variable being within a data set can range from 0% to 100%

2. The sum of all possible probabilities of occurrence of all random variables within a data set equals 100%.

3. Discrete random variables are mutually exclusive. Thus if I roll a 5 with a dice, there is an equal likelihood that I'll roll a 5 again on the next try as there is a likelihood that I roll any other number. In other words, rolling a 5 now has no effect on future events. Its funny to watch people gamble, because they (erroneously) believe that current events can influence future events, e.g. many actually believe that if they roll a 5 now, they are more likely to roll a 5 on the next attempt. This is simply not true. This is also why many gamblers lose a lot of money.

One of the most useful concepts in statistics is that of *expected value*.

Expected value is simply the multiplication of a weighting value associated with a probability. This is best explained in by an example:

Whats a better deal: Having a 30% chance of winning $100 or a 10% chance of winning $200?

One only needs to calculate the expected value of each event:

E(Y) = $100(.30) = $30

E(Y) = $200(.10) = $20

The expected value of the first option is $30, while the expected value of the second option is $20. Clearly the first option is a better deal.

It is **highly recommended** to do example 4.3 from the book on your own to ensure proper understanding of expected values.

**Chapter 5**

The key points to understand here are the basic shape of a Normal distribution, and where the mean and various standard deviations are located on a particular normal curve. Figure 5.5 sums that up well.

An understanding of how to solve problems using Appendix table 3 is essential.

**Chapter 6**

Some of the key points in Ch. 6 are poorly explained in the book. The main point to understand in section 6.2 is that of *sampling distributions*, which should not be confused with the underlying properties of the population being sampled. A sample distribution is the probability distribution of the sample statistic.

Key points on sampling distributions:

Our measurements of a particular data set become more accurate the larger the number of samples that we take of that data set.

This explains why the standard deviation of the sampling distribution decreases as the sample size increases. One can think of this concept in the way: if a classroom has only 2 students who take a test, the standard deviation between the two students scores is likely to be high (for example, one student scores a 70% and the other scores a 90%). On the other hand, in a classroom of 1000 students, the standard deviation *between scores* is likely to be low. The distribution *between* scores is referred to as the standard error, and it decreases in proportion to the square root of the number of data samples. Its very important to understand the difference in standard deviation in a population (for example, in the class scores) versus the *standard error* in measurement of the underlying population.

Another important point to understand is that of the Central Limit Theorem for Sums and Means:

For any population (with finite mean and standard deviation), the sampling distribution of the sample mean is approximately normal if the sample size *n* sufficiently large.

Note that as a general rule of thumb, populations that are skewed require a relatively higher number of samples are needed to reconstruct the population characteristics via sampling that from a population that is symmetric.

**Chapter 7**

It is important to understand what a confidence interval means. A 90% confidence interval means that in the long run over many samples, 90% of the confidence intervals calculated will contain the true population mean. In other words, 10% of the time the calculated intervals will NOT include the actual population mean. Obviously larger sample sizes allow us to have a higher degree of confidence in our measurements.

Understanding Definition 7.6 is key. For example, if you understand the equation, you can understand the effects of (for example) the effects of a larger standard deviation on a given confidence interval and standard deviation.

When the standard deviation is not precisely known (as is often the case in real world applications), t-statistics must be employed. It is important to note that as the number of samples increases, the estimate t will approach z, and hence more closely approximate a normal distribution.

**Chapter 8**

[Coming soon]

*Chapter Key Concepts to Understand*

- Bar chart
- Histogram
- Stem and leaf diagram
- Scatterplot
- Average
- Mode
- Median
- Figure 2.19
- Variance
- Standard Deviation
- Definition 2.4
- Interquartile Range
- Box and Whiskers Plot
- Definition 3.1
- Definition 3.2
- Union and Intersect notation from Venn Diagrams
- Definition 3.3
- Definition 3.4
- Definition 3.5
- Definition 3.6
- Definition 3.7
- Probability tree
- Random Variable
- Probability Distribution
- Definition 4.2
- Expected Value
- Calcultions of variance and standard deviation using expected values
- Figure 5.5
- Use of Appendix Table 3 for Normal Distribution calculations
- The difference between an underlying population probability distribution and a sampling distribution
- The reason why the standard deviation on a sampling distribution decreases as the number of samples increase
- The reason why the expected value of a sample distribution is the same as the mean of the underlying population distribution
- Central Limit Theorem
- Determining z for a given confidence interval
- Definition 7.6

**Important Definitions**

**Confidence Interval**

Designates confidence that the actual sample means lies between a calculated upper and lower value

**Data**: For our purposes, we will only consider numerical data. Other types data (for example, anecdotal) is not systematic.

**Expected Value**: The expected value is a probability weighted average of possible values.

**IQR**: Interquartile range, or the data in a distribution that sits between the 25th and 75th percentiles.

**Mean**: The mean of a variable is the sum of the measurements taken on that variable divided by the number of measurements. Only meaningful for quantitative data. Mean is what we most commonly think of when we hear the word "average value".

**Median**: The median of a set of data is the middle value when the data is arranged from highest to lowest, e.g. it is the data point where we can say that half the values in the data set are above it, and half are below it. Mathematically, if the sample size *n* is odd, the median is the (n+1)/2 point; if *n* is even, the median is the average of the *n*/2 and (*n*+2)/2 values.

**Mode**: The mode of a variable is the value or category with the highest frequency in the data.

**Mutual Exclusivity**: Events are mutually exclusive if they have no outcomes in common.

**Qualitative Data**: Data which merely defines categories, with no true numerical meaning, in other words numbers are arbitrary codes. Ex: assigning employees an office location number. A distribution of employees by office number could then be created.

**Quantitative Data**: Data that contain actual numerical measurements.

**Random Sample**: A sample that is taken in such a way that any possible sample (of a specific size) has the same probability of being selected. Therefore the outcomes (possible samples) are equally likely, and probabilities can be found by counting favorable outcomes.

**Random Variable**: A quantitative (numerical) result from a random experiment

**Sampling**: Taking a subset of data in order to gain insights into the complete data set. Sampling is required when doing a survey of the complete data set is unreasonable. Example: Survey randomly selected 1000 people about their preferred candidate for an election, and then draw conclusions about each candidates popularity on a national level.

**Standard Deviation**: Square root of the average squared deviation from the mean of a data set.

**Statistical Independence**: States that the occurrence of event A does not change the probability that event B occurs. Mathematically, A and B are statistically independent if and only if P(B|A) = P(B), or similarly, P(A|B) = P(A).

**Variable**: Represents a possible numerical value

**Variance**: Average squared deviation from the mean of a data set.