EXE 2.1

Histogram

A histogram is useful to show the distribution of a interval or continuous variable. This illustrates whether the variable has a normal or other distribution. NOTE. this is not a test of normality!

You will use the dataset juul which can be found by installing the package ISwR. Create a histogram of igf1 (insuline growth factor) separate for boys and girls. Make sure the graph looks as in the slides:

Test the normality. Use functions skewness/kurtosis from package moments. A normal distribution has a kurtosis of 3, and a skewness of 0.

This should be the kurtosis:

## [1] 2.928243

And the skewness:

## [1] 0.6047032

Is the variable normally distributed?

To know this, you need to calculate the confidence interval (i.e. 1.96 +/- the standard error). For the standard error, see slides. This is the solution you should get for the kurtosis:

## [1] 0.2624046

and for the skewness:

## [1] 0.1312023

EXE 2.2

Recode variables

Generate a factor. Transform age into a factor with three categories:

  • 1: pre puberty (up to 11)

  • 2: puberty (years 11-17)

  • 3: post puberty (18 and older)

Attach this recoded variable to juul. Use function factor and cut. Check with summary. How many people are in puberty in the sample?

##     pre puberty    post    NA's 
##     517     581     236       5

Barplot

Create a barplot of gender and add color using argument col:

Create a stacked barplot of the recoded age distribution for boys and girls separately. Use same colors.

What is the mean level of igf1 for boys and girls for age 15 until 16? Use function tapply.

##        1        2 
## 486.2333 559.0357

EXE 2.3

Z-score

Create a function that calculates the z-score of a variable. Check with scale. Here is the summary of the scale function:

##        V1         
##  Min.   :-1.8427  
##  1st Qu.:-0.8064  
##  Median :-0.1559  
##  Mean   : 0.0000  
##  3rd Qu.: 0.7167  
##  Max.   : 3.3609  
##  NA's   :321

Standard deviation

Calculate the standard deviation of both age and igf1 in juul using a for loop. Use different names for your objects as are known to R (i.e. mean, sd etc).

## [1]  11.25288 171.03560
##       age      igf1 
##  11.25288 171.03560