Answers to written exercises
Use your browser's back button or click on the return to last page button.
Link to glossary terms by clicking on the word if it is shown as a link.
#1 Weigh the contents of 1 bag each hour, across all shifts.

#2 Three ideas among many possible: 1. Chart the results by shift over time. 2. Chart the overall average weight and average weight per shift. 3. Make a chart of how many are underweight and how many overweight.

#3 To calculate the average, add the valuess and divide by the number of values. The mean is 14.5 oz.

#4 Sort the values by size; pick the middle one or find the average of the middle two values. The median is 14 oz.

#5 The most common value is 14 oz. The mode is 14.

#6 Mean- $46,356.67. Median- (37,320+31,400) / 2 = 34,360 If there is no middle value the median is tha average of the middle values. Thus the median may not be a number on the list. Mode- 31,400. There is a second spike at 44,500. This dataset could be considered bimodal.

#7 The very high top salary raises the average salary above all but the 4 highest. The median is unaffected by extremely high or low values. Since the mode may not be consistent with the average or median, there would have to be a strong reason to use the mode to represent all the data. The median is the best guide when the data are skewed by extreme values.

#8 Even though the mean is the same, the data from the second line is more variable. See # 11.

#9 Yes, 31. This is probably a digit reversal when the data were copied. Maybe it should be 13. If the raw data do not confirm this mistake, the 31 should be thrown out. The high salary in the salary data (143,800) is an outlier, but it should not be thrown out. See answer 6.

#10 Answers vary. My family size was 7 people.

#11 The Line 1 weights go from 13 to 17, but the line 2 weights go from 12 to 18. The weights from line 2 are more variable.

#11.1 The range of the salaries is R = 143,800 – 22,000 = $121,800. The range hides the fact that the most of the salaries run from 22,000 to 59,200 (in color), a spread of only $37,200.

$22,000 25,760 28,900 31,400 31,400 31,400
37,320 44,500 44,500 56,100 59,200 143,800

#12 There is a clearly higher failure rate during the first 20 hours. The second jump after 100 hours is probably due to the a chance variability in the data..Variability due to random chance in sampling is called sampling error ("error" does not mean anything is incorrect; refers to variation that always occurs in samples).

#13

#14 The first line produces more consistent weights than the second. The variability of the second line is more. Both lines produce some extreme high weights. One can say the distributions are both skewed right.

#15 Range = 279.5 – 82.5 = 197. 197 is close to 200, which divides easily by 10 classes, giving a class width of 20. A histogram's appearance will change according to the number of classes used.

#16 The gap at $140,000 seems to be real, given that there are bar heights of 8 and 5 on either side. It looks like prices are piling up at $140,000. Perhaps there is a psychological price barrier. A home that is priced at $142,000 might sell as well at $162,000. The gap at $240,000 is probably due to natural variation; since fewer hoses sold at the higher prices, it's reasonable that none sold in any one class.

#17 Line 1: s = 1.27 Line 2: s = 1.84

#18 The larger standard deviation indicates more spread, more variation. The smaller the s, the less variation, the less spread, and the smaller range. Histogram no. 2 is lower and wider than no. 1.

#19 The piled up one. Both average about the same, but the amount of contents would be more predictable in the process indicated in histogram 2.

#20
Line 2
net wts.
Distance
from mean
Squared
distance
12 -2.5 6.25
13 -1.5 2.25
13 -1.5 2.25
14 -0.5 0.25
14 -0.5 0.25
14 -0.5 0.25
15 0.5 0.25
17 2.5 6.25
18 3.5 12.25
mean = 14.5 sum of squares = 30.5
n-1 mean of squares = 30.5 / 9 = 3.38888
sq. root of 3.38888 = 1.84 = s

#21

#22 72"-69" = 3" or 1 standard deviation above the mean. By the empirical rule this is about 34% of men.

#23 By the empirical rule, over 99% of the population is contained between + or - 3 s.d. 69" + 3(3) = 69"+9" = 78" or 6'-6" on the tall side and 69" - 3(3) = 69"-9" = 60" or 5'-0" on the short side.

#24 Less than 1/2 of 1% or fewer than one in 200 adult men would be over 6'-6" tall.

#24.2 The claim is sooo not true. Rearranging the order of the beers would change the shape of the curve. The x-axis does not contain variable data. The graph shows two-varible (bivariate) data when a histogram uses one variable (univariate) data. This is a good bar graph. To show a histogram, however, the student needs to form classes of beer prices and make a frequency chart of the number of prices in each class.

#24.5 When x is below the mean, z will be negative.

#25 The area under the probability curve is always 1, so 1 – 0.96 = 0.04. About 4% of the class scored higher in English.

#26 In math only 1 – 0.98 = 0.02 = 2% scored higher. Scores from tests with different averages and different variability can be compared by standardizing scores in this way. Z-scores are standardized scores because they always have mean = 0 and standard deviation =1.
Do you think it looks like 2% of the area under the curve is to the right of 1.96 on the x-axis? These curves were generated in Excel.

#27 This question was removed.

#28 a) Which method do you think usually comes closest to the actual average virus size?

The random sample. A sample of 10 is a lot closer to an average of all 100 than a sample of 1. Of the two samples with n = 10, the random sample is less likely to be biased by human judgment. For example, the human eye is attracted to the larger viruses, overlooking the many small ones. Over the long run the random samples will be closest to the true average.

b) How would you rank the methods and why? Best: random sample n=10. Medium: judgment sample n = 10. Worst: judgment sample, n =1. The reasons are in part a).

c) What is the more accurate method, judgment or random? Random, although the larger the sample, the more the sample would represent the population. A judgment sample of 99 would always be more accurate than a random sample of 1.

d) What is the effect of sample size on accuracy?
Larger sample sizes give a closer estimate of the population characteristics (czlled parameters). Suppose you are studying the average number of hours worked by students attending a community college. The larger your sample, the closer it gets to being the whole population. If you randomly chose and interviewed half the students you would have a very close estimate of the true average number of hours worked by the whole population. In practice samples of size 30 or more have desirable characteristics, explained in the following pages. If the population data happens to be normally distributed, smaller samples will work.

The average virus size is 4.55 The mean of the sample means should be close to this value, because the mean of the individual observations and the mean of all sample means is the same number. This is one of the two main ideas of the central limit theorem discussed below.

#29 a) The number of values observed from 42.5 to 47.5 (centered on 45) is more than expected.
b) Symmetry suggests there should be more around 75 in the sample.
c) Most of the normal curve (99.7%) is contained in 6 standard deviations centered on the mean. An estimate of the standard deviation is therefore 75 – 20 = 55. 55 / 6 = 9.2.
The actual value for this population is 9.

#30 a) The larger samples are less variable. The range of the means with n=5 is about 25. The range of means with n=25 is about 10. Remember the data consists of averages, not individual values. The variability of the averages of samples of size 5 is greater than the average of samples of size 25. For large samples, it is less likely that all the individual values in the sample could be above or below the average; high values will be balanced by low ones. With a small sample it is more likely that all or most of the observed values could be extreme. There may not be high values to balance low ones. Means of large samples are more stable because the sample more resembles the whole population.

b) The 50 averages must fit in fewer classes, so the bars are higher. Instead of making the whole graph taller, the grid lines were packed closer together. "Squished" is the technical term.

#31
1. = 90.60 + 1.02(0.95) = 91.57 = 2.574 (0.95) = 2.45
= 90.60 – 1.02(0.95) = 88.66 = 0 (0.95) = 0
2. = 10.3 + 0.577(1.6) = 11.2 = 2.114 (1.6) = 3.4
= 10.3 - 0.577(1.6) = 9.4 = 0 (1.6) = 0

#32 Number of observations (individuals ) N = 72
1. Subgroup size, n = 3
2. Number of samples k = 24
3. = 27.8 = 2.5
4. What is one problem with putting day shift data together consecutively ?
Adjustments were also made by the other 2 shifts. This list treats the day shift as if there were no intervening adjustments or other changes. It would be best to include all shifts, but color-code each shift to highlight differences.
5. The center weight averages more than the left or right. Perhaps the left and right spray pattern are overlapping into the center.

#33 Question removed.

#34 DABC, True.
D. The sample averages are constant, indicating a stable process center, but the range has increased to more than 4 times its initial value.
A. Both process center and dispersion vary, but in a way that suggests no particular cause is threatening the overall stability of the process.
B. The process center shows an upward trend. and the width of the distribution of individuals is consistent. C. Similar to B, but with control limits shown dashed.
True. The x bar chart plots averages, not individuals.

#35 = =  = 1.30

#36 Question removed.
#37 = =  = 0.84 Z (d.n.s.) = = = -2.86 Cpk = = =0.95.

This is the worst case since it is the z-score of the distance to the nearest specification. The "Cpk" of the other specification limit is 1.03.

The process is capable with respect to the upper specification limit but is not capable with respect to the lower specification limit. The capability ratio of this process is 0.95. The process is not capable. The variation in the process (the process spread) needs to be reduced.

#38 N = 500, k = 5, n = 100, = 100

The p's are 0.14, 0.12, 0.05, 0.07, 0.10 = 0.096 1– = 0.904

The standard deviation is = 0.03
UCLp = 0.18

LCLp = 0.01
Process Simulations home page