A histogram is useful to show the distribution of a interval or continuous variable. This illustrates whether the variable has a normal or other distribution. NOTE. this is not a test of normality!
You will use the dataset juul which can be found by installing the package ISwR. Create a histogram of igf1 (insuline growth factor) separate for boys and girls. Make sure the graph looks as in the slides:
Test the normality. Use functions skewness
/kurtosis
from package moments
. A normal distribution has a kurtosis of 3, and a skewness of 0.
This should be the kurtosis:
## [1] 2.928243
And the skewness:
## [1] 0.6047032
Is the variable normally distributed?
To know this, you need to calculate the confidence interval (i.e. 1.96 +/- the standard error). For the standard error, see slides. This is the solution you should get for the kurtosis:
## [1] 0.2624046
and for the skewness:
## [1] 0.1312023
Generate a factor. Transform age into a factor with three categories:
1: pre puberty (up to 11)
2: puberty (years 11-17)
3: post puberty (18 and older)
Attach this recoded variable to juul. Use function factor
and cut
. Check with summary. How many people are in puberty in the sample?
## pre puberty post NA's
## 517 581 236 5
Create a barplot of gender and add color using argument col
:
Create a stacked barplot of the recoded age distribution for boys and girls separately. Use same colors.
What is the mean level of igf1 for boys and girls for age 15 until 16? Use function tapply
.
## 1 2
## 486.2333 559.0357
Create a function that calculates the z-score of a variable. Check with scale
. Here is the summary of the scale function:
## V1
## Min. :-1.8427
## 1st Qu.:-0.8064
## Median :-0.1559
## Mean : 0.0000
## 3rd Qu.: 0.7167
## Max. : 3.3609
## NA's :321
Calculate the standard deviation of both age and igf1 in juul using a for loop. Use different names for your objects as are known to R (i.e. mean, sd etc).
## [1] 11.25288 171.03560
## age igf1
## 11.25288 171.03560