EXE 3.1

plot line

First calculate the coefficients (beta, intercept) for the model where the dependent variable horsepower (hp) is related to the independent variable miles per gallon (mpg) from dataset mtcars.

## [1] -8.829731
## [1] 324.0823

Plot the two variables. Add the line based on the estimates that you calculated above.

Assess fit

Calculate the fit of the model above. Check with lm function. Look up function ?lm.

## [1] 0.6024373

EXE 3.2

Dummy variables

Dummy variables have two levels like gender to test differences between men and women in an interval variable you can use a linear model or a t-test. In this exercise you will do both to see how these solutions are comparable. Try to interpret the coefficients as discussed during the lecture. Use dataset juul (see last lecture). Conduct first linear regression to explain igf1 (insuline growth factor) by sex.

## 
## Call:
## lm(formula = igf1 ~ factor(sex), data = juul)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -343.10 -135.10  -23.89  117.90  604.11 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   310.887      7.722  40.262  < 2e-16 ***
## factor(sex)2   57.214     10.605   5.395 8.54e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 168.5 on 1011 degrees of freedom
##   (326 observations deleted due to missingness)
## Multiple R-squared:  0.02798,    Adjusted R-squared:  0.02702 
## F-statistic:  29.1 on 1 and 1011 DF,  p-value: 8.54e-08

Then compare with t.test. Use var.equal=TRUE.

## 
##  Two Sample t-test
## 
## data:  juul$igf1 by juul$sex
## t = -5.3949, df = 1011, p-value = 8.54e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -78.02476 -36.40325
## sample estimates:
## mean in group 1 mean in group 2 
##        310.8866        368.1006

Interval variable

Relate age to igf1 (both interval variables), plot the variables, add the regression line. What do you see?

## 
## Call:
## lm(formula = igf1 ~ age, data = juul)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -334.78 -134.38  -25.29  121.52  570.16 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 360.8602     9.0219  39.998  < 2e-16 ***
## age          -1.1918     0.4408  -2.704  0.00697 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 170.3 on 1011 degrees of freedom
##   (326 observations deleted due to missingness)
## Multiple R-squared:  0.00718,    Adjusted R-squared:  0.006198 
## F-statistic: 7.311 on 1 and 1011 DF,  p-value: 0.006967