University of Maryland University College
STAT200 – Assignment #2: Descriptive Statistics Analysis and Writeup
Identifying Information
Class: STAT 200 6373 Introduction to Statistics (2202)
Date: 5th February 2020
Introduction:
Scenario: I am 45 years old, married, and the head of a household. I have a degree in Business Management. I have two children. I earn $101,829 and spend $10,821 on food, and $810 on education.[unique_solution]
Table 1. Variables Selected for the Analysis
Variable Name in data set | Description | Type of Variable (Qualitative or Quantitative) |
Variable 1: “Income”
| Annual household income in USD. | Quantitative |
Variable 2: “Age Head Household” | Age of the Head of Household.
| Quantitative |
Variable 3: “Family Size” | The total of Both Adults and Children in the family. | Quantitative |
Variable 4: “Food Expenditures” | The total Annual Food Expenditure. | Quantitative |
Variable 5: “Education Expenditures” | The total Annual Entertainment Expenditure | Quantitative |
Data Set Description and Method Used for Analysis:
Variable Name | Measures of Central Tendency and Dispersion | Graph |
Variable 1: “Income”
| Number of Observations, Median, Sample Standard Deviation, Mean | Histogram |
Variable 2: “Age Head Household” | Observations, sample standard deviation, and mean | Histogram |
Variable 3: “Family Size” | Observations, Mean, Standard Deviation, and mode | Pie-chart |
Variable 4: “Food Expenditures” | Observations, mean, and Sample Standard Deviation | Histogram |
Results:
Variable 1: Income
This variable is crucial in catering for the necessities required by the family to survive.
Numerical Summary.
Income analysis quantitatively through a measure of central tendency to get the median, and Mean. Mean is usually important when the sample is uniformly distributed. However, the median will be preferred measure of central tendency when data is not normally distributed due to outliers.
The sample standard deviation was the measure of dispersion because the data represents a sample of data set.
A histogram was used to show the normal distribution of data.
Table 2. Descriptive Analysis for Variable 1
Descriptive Statistics | |
Mean | 99661.03 |
Standard Error | 938.2426 |
Median | 97304.5 |
Mode | #N/A |
Standard Deviation | 5138.966 |
Sample Variance | 26408973 |
Kurtosis | 1.960306 |
Skewness | 1.607153 |
Range | 19576 |
Minimum | 94929 |
Maximum | 114505 |
Sum | 2989831 |
Count | 30 |
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion |
Variable: Income | 98717 | Median=97304.5 | SD = 5138.966 |
96572 | |||
96690 | |||
96664 | |||
96886 | |||
96522 | |||
97912 | |||
96727 | |||
96928 | |||
95744 | |||
97681 | |||
95432 | |||
94929 | |||
96621 | |||
95366 | |||
101829 | |||
98309 | |||
112559 | |||
106894 | |||
98686 | |||
103422 | |||
100964 | |||
95835 | |||
102326 | |||
106627 | |||
95922 | |||
95975 | |||
99610 | |||
114505 | |||
106977 |
Graph and/or Table: Histogram of Income
Description of Findings.
It is skewed to the right, which implies that the number of people earning high-income decreases as the level of income increases. Most people are earning less than the mean income ($99661.03). The dispersion level is not as high as the Standard Deviation of $5138.966 was not high. These imply most people earn averagely with the only difference of about $5138.966
Variable 2: Age Head Household
Numerical Summary.
Median, mode, and mean were used as a measure of central tendency because the variable is quantitative.
The sample standard deviation was the measure of dispersion.
A histogram was used to show the normal distribution of Age Head data.
Table 3. Descriptive Analysis for Variable 2
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion | ||||||||||||||||||||||||||||||
Variable: |
| Mean =47.03333 Median= 50 Mode= 51 | SD= 8.479116 |
Graph and/or Table.
Mean | 47.03333 |
Standard Error | 1.548068 |
Median | 50 |
Mode | 51 |
Standard Deviation | 8.479116 |
Sample Variance | 71.8954 |
Kurtosis | -0.74237 |
Skewness | -0.49985 |
Range | 31 |
Minimum | 28 |
Maximum | 59 |
Sum | 1411 |
Count | 30 |
Description of Findings.
The histogram is skewed towards the left, which implies that more people are old. Most are over 40 years. The mean is 47.03333 years, whereas the median was 50 years. More than one person had 51 years based on the mode. Most of the sampled people had commonalities in age. On the other hand, the SD of 8.479116 means that ages among the respondents differed with about 8 years compared to the sample mean.
Variable 3: Family Size
The family size is another significant variable expenses.
Numerical Summary.
Median, mode, and mean were used as a measure of central tendency because the variable is quantitative.
The sample standard deviation was the measure of dispersion.
Pie Chart was preferred in this case to compare the distribution based on the number of family members in each category.
Table 4. Descriptive Analysis for Variable 3
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion |
Variable: | 3 | Mean =2.833333 Median= 3 Mode= 3 | SD= 1.019917 |
2 | |||
2 | |||
3 | |||
2 | |||
4 | |||
1 | |||
2 | |||
3 | |||
4 | |||
4 | |||
1 | |||
2 | |||
2 | |||
2 | |||
4 | |||
2 | |||
3 | |||
5 | |||
3 | |||
4 | |||
2 | |||
3 | |||
3 | |||
3 | |||
3 | |||
3 | |||
2 | |||
5 | |||
3 |
Graph and/or Table.
Mean | 2.833333 |
Standard Error | 0.18621 |
Median | 3 |
Mode | 3 |
Standard Deviation | 1.019917 |
Sample Variance | 1.04023 |
Kurtosis | -0.17122 |
Skewness | 0.355973 |
Range | 4 |
Minimum | 1 |
Maximum | 5 |
Sum | 85 |
Count | 30 |
No. of Children | Frequency |
1 | 2 |
2 | 10 |
3 | 11 |
4 | 5 |
5 | 2 |
total | 30 |
Description of Findings.
There was consistency in the family sizes as the Mean was 2.833333, Median at 3, whereas the Mode was 3. It shows that average families have 3 people. However, based on the mode, most have 3 followed by 2 people at 36 and 33%, respectively. The Pie chart reveals that about 70% have either 2 or 3 individuals. On the other hand, there was little deviation from mean given the SD= 1.019917, which can be interpreted as most families slightly more or less as 3 children.
Variable 4: Food Expenditures
Food expenditure is another important variable because the amount of money that is spent on food will entirely be depending on the income and the size of the household. After all, food is the one basic necessity for survival.
Numerical Summary.
Median and mean were used as a measure of central tendency because the variable is quantitative.
The sample standard deviation was the measure of dispersion.
A histogram was used to show the normal distribution of Food Expenditure data.
Table 5. Descriptive Analysis for Variable 4
Variable | N | Mean/Median | St. Dev. | ||||||||||||||||||||||||||||||
Variable 4: |
| Mean =8504.867
Median= 8011.5
| SD=1646.068
|
Graph and/or Table.
Mean | 8504.867 |
Standard Error | 300.5294 |
Median | 8011.5 |
Mode | 7051 |
Standard Deviation | 1646.068 |
Sample Variance | 2709538 |
Kurtosis | -1.35233 |
Skewness | 0.477097 |
Range | 4553 |
Minimum | 6822 |
Maximum | 11375 |
Sum | 255146 |
Count | 30 |
Description of Findings.
Based on the histogram, it is evident that the distribution of food expenditures have an almost similar trend as the income levels. Most people are spending less than the Mean of 8504.867. Only a few can manage to spend more than the average expenditures. Based on the SD of 1646.068, the deviation from the mean food expenditures is not high.
Variable 5: Education Expenditures
Numerical Summary.
Median and mean were used as a measure of central tendency because the variable is quantitative.
The sample standard deviation was the measure of dispersion.
A histogram was used to show the normal distribution of Education Expenditures data.
Table 6. Descriptive Analysis for Variable 5
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion | ||||||||||||||||||||||||||||||
Variable: |
| Mean =348.3
Median= 311.5
| SD=187.4512
|
Graph and/or Table.
Mean | 348.3 |
Standard Error | 34.22374 |
Median | 311.5 |
Mode | #N/A |
Standard Deviation | 187.4512 |
Sample Variance | 35137.94 |
Kurtosis | 1.151167 |
Skewness | 0.721469 |
Range | 799 |
Minimum | 11 |
Maximum | 810 |
Sum | 10449 |
Count | 30 |
Description of Findings.
The distribution is bell-shaped, which means that most people spend averagely 348.3 with only a few spending more or less on education. The dispersion is equally noticeable based on the SD of 187.4512. It shows a sizeable difference from the mean expenditure on education.
Discussion and Conclusion.
Education | Food | Entertainment | |
Mean | 348.3 | 8504.867 | 110.6667 |
Median | 311.5 | 8011.5 | 105 |
Mode | #N/A | 7051 | 106 |
Standard Deviation | 187.4512 | 1646.068 | 38.20664 |
Range | 799 | 4553 | 163 |
Minimum | 11 | 6822 | 38 |
Maximum | 810 | 11375 | 201 |
Sum | 10449 | 255146 | 3320 |
Count | 30 | 30 | 30 |
Based on the line graph above, most of the expenditures are on food whereas less is spent on entertainment. However, there is some respondent who spent more on entertainment than on food. The highest expenditure on Education, food, and Entertainment was $810, $11375, and $201, respectively. On the other hand, the lowest expenditures of the above components were $11, $6822, and $38. Therefore, the respondents should save more on entertainment to ensure there is enough money to cater for food and education. Some people have emphasized more on entertainment yet spend less on education. If more is spent on education, it will boost the chances of earning more income.