| Answers to written exercises |
|||||||||||||||||||||||||||||||||||||||||||||||||
| Use your browser's back button or click on the return to last
page button. Link to glossary terms by clicking on the word if it is shown as a link. |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
| #1 Weigh the contents of 1 bag each hour, across all shifts. #2 Three ideas among many possible: 1. Chart the results by shift over time. 2. Chart the overall average weight and average weight per shift. 3. Make a chart of how many are underweight and how many overweight. #3 To calculate the average, add the valuess and divide by the number of values. The mean is 14.5 oz. #4 Sort the values by size; pick the middle one or find the average of the middle two values. The median is 14 oz. #5 The most common value is 14 oz. The mode is 14. #6 Mean- $46,356.67. Median- (37,320+31,400) / 2 = 34,360 If there is no middle value the median is tha average of the middle values. Thus the median may not be a number on the list. Mode- 31,400. There is a second spike at 44,500. This dataset could be considered bimodal. #7 The very high top salary raises the average salary above all but the 4 highest. The median is unaffected by extremely high or low values. Since the mode may not be consistent with the average or median, there would have to be a strong reason to use the mode to represent all the data. The median is the best guide when the data are skewed by extreme values. #8 Even though the mean is the same, the data from the second line is more variable. See # 11. #9 Yes, 31. This is probably a digit reversal when the data were
copied. Maybe it should be 13. If the raw data do not confirm
this mistake, the 31 should be thrown out. The high salary in
the salary data (143,800) is an outlier, but it should not be
thrown out. See answer 6. #11.1 The range of the salaries is R = 143,800 22,000 = $121,800. The range hides the fact that the most of the salaries run from 22,000 to 59,200 (in color), a spread of only $37,200.
#12 There is a clearly higher failure rate during the first 20 hours. The second jump after 100 hours is probably due to the a chance variability in the data..Variability due to random chance in sampling is called sampling error ("error" does not mean anything is incorrect; refers to variation that always occurs in samples). |
|||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
| #14 The first line produces more consistent weights than the second.
The variability of the second line is more. Both lines produce
some extreme high weights. One can say the distributions are both
skewed right. #15 Range = 279.5 82.5 = 197. 197 is close to 200, which divides easily by 10 classes, giving a class width of 20. A histogram's appearance will change according to the number of classes used. #16 The gap at $140,000 seems to be real, given that there are bar heights of 8 and 5 on either side. It looks like prices are piling up at $140,000. Perhaps there is a psychological price barrier. A home that is priced at $142,000 might sell as well at $162,000. The gap at $240,000 is probably due to natural variation; since fewer hoses sold at the higher prices, it's reasonable that none sold in any one class. |
|||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||
| #17 Line 1: s = 1.27 Line 2: s = 1.84 #18 The larger standard deviation indicates more spread, more variation. The smaller the s, the less variation, the less spread, and the smaller range. Histogram no. 2 is lower and wider than no. 1. #19 The piled up one. Both average about the same, but the amount of contents would be more predictable in the process indicated in histogram 2. #20
#22 72"-69" = 3" or 1 standard deviation above the mean. By the empirical rule this is about 34% of men. #24.2 The claim is sooo not true. Rearranging the order of the beers would change the shape of the curve. The x-axis does not contain variable data. The graph shows two-varible (bivariate) data when a histogram uses one variable (univariate) data. This is a good bar graph. To show a histogram, however, the student needs to form classes of beer prices and make a frequency chart of the number of prices in each class. #24.5 When x is below the mean, z will be negative. #25 The area under the probability curve is always 1, so 1 0.96 = 0.04. About 4% of the class scored higher in English. #26 In math only 1 0.98 = 0.02 = 2% scored higher. Scores from tests with different averages and different variability can be compared by standardizing scores in this way. Z-scores are standardized scores because they always have mean = 0 and standard deviation =1. |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
| #27 This question was removed.
#28 a) Which method do you think usually comes closest to the actual average virus size? The random sample. A sample of 10 is a lot closer to an average of all 100 than a sample of 1. Of the two samples with n = 10, the random sample is less likely to be biased by human judgment. For example, the human eye is attracted to the larger viruses, overlooking the many small ones. Over the long run the random samples will be closest to the true average. b) How would you rank the methods and why? Best: random sample n=10. Medium: judgment sample n = 10. Worst: judgment sample, n =1. The reasons are in part a). c) What is the more accurate method, judgment or random? Random, although the larger the sample, the more the sample would represent the population. A judgment sample of 99 would always be more accurate than a random sample of 1. d) What is the effect of sample size on accuracy? The average virus size is 4.55 The mean of the sample means should
be close to this value, because the mean of the individual observations
and the mean of all sample means is the same number. This is one
of the two main ideas of the central limit theorem discussed below. #29 a) The number of values observed from 42.5 to 47.5 (centered on 45) is more than expected. #30 a) The larger samples are less variable. The range of the means with n=5 is about 25. The range of means with n=25 is about 10. Remember the data consists of averages, not individual values. The variability of the averages of samples of size 5 is greater than the average of samples of size 25. For large samples, it is less likely that all the individual values in the sample could be above or below the average; high values will be balanced by low ones. With a small sample it is more likely that all or most of the observed values could be extreme. There may not be high values to balance low ones. Means of large samples are more stable because the sample more resembles the whole population. b) The 50 averages must fit in fewer classes, so the bars are higher. Instead of making the whole graph taller, the grid lines were packed closer together. "Squished" is the technical term.
#32 Number of observations (individuals ) N = 72 #33 Question removed. #34 DABC, True.
|
|||||||||||||||||||||||||||||||||||||||||||||||||
| #38 N = 500, k = 5, n = 100, The p's are 0.14, 0.12, 0.05, 0.07, 0.10 The standard deviation is UCLp = 0.18 LCLp = 0.01 |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||