Data Analysis Practice Exam
The following questions are designed to indicate the scope and
style of questions that may be on your midterm. Warning: Do not stop your
preparations prematurely. The real exam may seem more difficult!
1. Name the two primary divisions of statistics: _____________________
2. Measurement scales using numbers can be ordinal or cardinal. How are these two different? _______________________
3. What is the difference between a census and a sample? __________________
4. What make a “random sample” random? _______________________
5. The essence of statistical logic is ___________________________
The next two questions are based on the following situation. A news report says that a national opinion poll of 1500 randomly selected adults in the United States found that 43 percent thought they would be worse off during the next year. The news report went on to say that the margin of error in the poll result is plus or minus 3 percentage points with 95 percent confidence.
6. If the poll had interviewed 1000 persons rather than 1500 (and still
found 43% believing they would be worse off), the margin of error for 95%
confidence would be:
A. less than plus or minus 3 percentage points
B. equal to plus or minus 3 percentage points
C. greater than plus or minus 3 percentage points
D. any of the above -- the margin of error is random
7. If the poll had obtained the outcome 43 percent by a similar random
sampling method from all adults in New York State (population 18 million)
instead of from all adults in the U.S. (population 249 million), the margin of
error for 95 percent confidence would be:
A. less than plus or minus 3 percentage points
B. equal to plus or minus 3 percentage points
C. greater than plus or minus 3 percentage points
D. any of the above -- the margin of error is random
8. According to the Central Limit Theorem, what distribution must the population have before we can assume that the means of large samples selected from that population would follow a normal distribution? ________________
9. A large percentage of people in the general population choose not to respond to a survey about abortion because they simply abhor it. From a statistical point of view, it would not be valid to draw conclusions from the survey about the general public because the results would contain what sort of bias?
10. In hypotheses testing, the worst error is usually referred to as the:
A. Type I error
B. Type II error
C. Type III error
D. null hypothesis
E. none of these
11. Whenever possible, the null hypothesis should be set up so as to:
A. minimize sample error
B. avoid making a Type I error
C. avoid making a Type II error
D. minimize sample size
E. none of these
12. The mean of the sample is used to estimate the mean of the population and the standard deviation of the sample is used to estimate the standard deviation of the population. The standard error of the sample is used to:
A. estimate the range of the population
B. estimate the variance of the population
C. estimate the "non-response bias" in the sample
D. estimate the skewness of the population
E. none of these
13. Review the tables below, then answer the questions.
A recent newspaper headline reported that death rates of patients undergoing operations were higher in public hospitals than in private hospitals, namely 3 percent versus 2 percent, respectively. A statistician was hired to search for lurking variables. The first thing she did was to separate out the death rates for patients who were in good condition healthwise when they were admitted from those who were in poor condition. Do these figures suggest a "Simpson's Paradox" situation? Why or why not? Explain fully.
|
|
Good Condition |
Poor Condition |
||
|
|
Public |
Private |
Public |
Private |
|
Died |
6 |
8 |
57 |
8 |
|
Survived |
594 |
492 |
1443 |
192 |
14. Bill is a politician who wants to be reelected very much. He ratings are going down, however, and he is worried. His support is eroding quickly and his campaign director believes that unless he has over 60 percent of the vote right now, he will lose the election by the time the election takes place in three weeks. A TV campaign could help but is very expensive and his campaign funds are low. He plans to have a pollster estimate his current support to see how close he is to 60 percent.
a. What would be the two mistakes Bill could make regarding the TV campaign?
b. Which would be the worse mistake? Why?
c. Which of the following should be Bill's null hypothesis be regarding the 60 percent?
A. Ho: P <= 60%
B. Ho: P >=60%
C. Ho: P = 60%
D. Ho: P not = 60%
E. None of these
15. Which of the following is not a legitimate Excel command:
A. =max() B. =min() C. =mean() D. =stdev() E. none of these
16. To tabulate a frequency distribution, the commands on Excel would be _____________________
17. A teacher wants to tabulate scores from a test into a frequency distribution showing the number of students who scored 90-100 percent, 80-90 percent, etc. In Excel, the intervals to be used for the frequency distribution are known as the _____________
18. Ruling out chance as the explanation for a result means is has a “low” probability of occurring due to random variation. Traditionally, “low” means ____ or less, which corresponds to _____ or more standard errors on the normal curve.
19. The value X comes from a population that has a normal distribution with a mean of 50 and a standard deviation 10. Calculate (from a normal table) the approximate probability that X picked at random would have a value :
A. 60 or more: _________
B. 75 or more: _________
C. 45.5 or more: _________
D. between 35 and 60: _______
E. 30 or less: _______
20. A vending machine company wants to know how much change students carry on average. A sample of 49 observations of the pocket change held by students was $.35 with a standard deviation of $.14.
A, Construct a 95 percent confidence interval for the mean amount of change carried.
B. The vending machine company wants will provide a change machine for dollar bills if the amount of change carried is less than $.50. It feels the worse mistake would be to not provide the machine when it should. Based on the above results, should it provide the machine?
C. What if the sample mean had been $.55 instead of $.35?
Answers:
1. Descriptive and Inferential
2. Ordinal scales measure rank or order only, not magnitudes that can be added or subtracted, multiplied or divided. You cannot add first place and second place to get third place. Means, medians, modes, standard deviations, etc. cannot be calculated for numbers that represent rank. Cardinal scales measure magnitude and the usual operations can be done on them. (The two primary types of cardinal scales are interval scales and ratio scales. Interval scales – like temperature – have no true zero point, but ratio scales – like money, distance, weight – do have a true zero point so ratios can be meaningfully calculated)
3. A census means 100 percent of the population is measured. A sample means only part of the population is measured. Inferential statistics is the science of inferring facts about the population from the sample.
4. The selection process. If there is any bias in the selection process, the sample may not be representative of the population.
5. Comparison between what we expect the data to show and what the data actually show.
6. C - The basic equation is Sxbar = S/sqrt(n) . If n drops from 1500 to 1000, Sxbar increases.
7. B - The size of the population being sampled has no bearing on the formula
above. This is the non-intuitive,
mysterious phenomenon of sampling – the distance of the sample mean from the
population mean depends only on the standard deviation of the population (as
estimated by S, the sample standard deviation), and n, the number of
observations in the sample. Provided
the population is at least ten times larger than the sample, it can be any size
or shape.
8. any distribution, regardless of its shape. A "large" sample is
generally considered to be any randomly selected sample that has 30 or more
observations (although some scientists prefer at least 125 - the bottom line is
the larger the better).
9. “non-response” bias
10. A
11. B - It is a matter of logic to set up the null hypothesis so as to avoid the worst error because you will take action based on believing that the null is true unless the data indicate otherwise. The classic example is law, where we think it is worse to convict an innocent person than free a guilty one, so we believe everyone is innocent and will free them unless the evidence shows 'beyond the shadow of a doubt' that the person is guilty.
12. E (The standard error is used to measure how far away from the true
population mean is likely to be from the sample mean.)
13. The news report says 3% died in public hospitals vs. 2% in private. Compare these to the percentages that died in each when the lurking variable of initial patient condition is separated out:
|
|
Good Condition |
Poor Condition |
||
|
|
Public |
Private |
Public |
Private |
|
Died |
6 |
8 |
57 |
8 |
|
Survived |
594 |
492 |
1443 |
192 |
|
Total: |
600 |
500 |
1500 |
200 |
|
As Percent: |
1% |
1.6% |
3.8% |
4% |
Note that when separated out, a smaller percentage of public patients die. Since this is a reversal of the total, Simpson's paradox is present.
14a.1. To spend the money when he should not because he will win anyway.
2. Not to spend the money when he should.
14b. Worse mistake is not spend it when he should because he wants to be elected rather than save money.
14c. A. Null hypothesis: P <= 60% (In other words, he will assume he does not have the necessary support and will therefore spend the money to avoid the worst mistake. Only if the sample result convinces him that the null is not true – that is, at least two or three standard errors above 60 percent - will he reject the null and thus not spend the money.)
15. C (To get the mean, we use =average(…))
16. Tools/Data Analysis/Histogram
17. “bin range”
18. Five percent, two standard errors.
19. A. 16% (.1587) B. 7% (.0668) C. 67% (.6736) D. 77% (.7745)
E. 2% (.0228)
20A. Standard error = Sxbar = S/sqrt(n) = .14/sqrt(49) = .02. 95 percent confidence corresponds to the sample mean plus or minus 1.96 standard errors. Round this off to 2 standard errors and you get $.35 plus or minus $.04 or $.31 - $.39. We can be 95 confident that the true mean falls within this interval.
20B. To avoid the worse mistake, the company wants to believe that the average is less than $.50 and will install the machine unless the sample result is at least two standard errors ABOVE $.50. Its null is Ho: mean is less than $.50. Since this sample mean is already below $.50, they will install the machine.
20C. If we assume the company uses the traditional five percent rule, the sample result must be at least two standard errors above $.50 to prevent them from installing the change machine. ($.55 - $.50)/.02 = 2.5 standard errors, so the company would not install the machine (but if they use a tighter standard, say 3 chances out of a 1000, then the sample mean would have to be at least three standard errors above $.50, which it is not. Therefore, the tighter standard would tell them to accept the null and install the machine.