list.of.packages <- c("foreign", "kableExtra", "ggplot2", "knitr","formattable","dplyr", "rockchalk")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only=TRUE)
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
Here we consider a subset of data on birth outcomes, provided by Abrevaya (2006), which is analyzed in chapter 3. The data were derived from birth certificates by the U.S. National Center for Health Statistics. The variables in smoking.dta that we will consider here are:
momid: mother identifier
idx: chronological numbering of multiple children to the same mother in the database (1: first child; 2: second child; 3: third child)
birwt: birthweight (in grams)
male: dummy variable for baby being male (1: male; 0: female)
hsgrad: dummy variable for mother having graduated from high school
somecoll: dummy variable for mother having some college education (but no degree)
collgrad: dummy variable for mother having graduated from college
black: dummy variable for mother being black (1: black; 0: white)
smoking.data <- read.dta("http://www.stata-press.com/data/mlmus3/smoking.dta")
head(smoking.data)
## momid idx stateres mage meduc mplbir nlbnl gestat birwt cigs smoke
## 1 14 1 AL 16 10 VA 0 24 2790 0 Nonsmoker
## 2 14 2 AL 17 10 VA 1 42 2693 0 Nonsmoker
## 3 14 3 AL 20 10 VA 2 39 3600 0 Nonsmoker
## 4 25 1 AL 23 11 NJ 2 41 2778 0 Nonsmoker
## 5 25 2 AL 24 11 NJ 3 37 2835 0 Nonsmoker
## 6 25 3 AL 26 11 NJ 4 40 3402 0 Nonsmoker
## male year married hsgrad somecoll collgrad magesq black kessner2
## 1 Female 0 0 0 0 0 256 Black 0
## 2 Female 1 0 0 0 0 289 Black 0
## 3 Female 4 0 0 0 0 400 Black 0
## 4 Male 2 0 0 0 0 529 Black 1
## 5 Male 3 0 0 0 0 576 Black 1
## 6 Male 5 0 0 0 0 676 Black 1
## kessner3 novisit pretri2 pretri3
## 1 1 0 0 1
## 2 1 0 0 1
## 3 0 0 0 0
## 4 0 0 1 0
## 5 0 0 1 0
## 6 0 0 1 0
1. Keep only the data on each mother’s first birth, that is, where idx is 1.
smoking.data2 <- smoking.data[which(smoking.data$idx==1),]
dim(smoking.data2)
## [1] 3978 24
2. Create the variable education, taking the value 1 if hsgrad is 1, the value 2 if somecoll is 1, the value 3 if collgrad is 1, and the value 0 otherwise.
smoking.data2$education<- factor(ifelse(smoking.data2$hsgrad==1,1, ifelse(smoking.data2$somecoll==1, 2, ifelse(smoking.data2$collgrad==1,3,0))), labels=c("Other","HSGrad","SomeColl","ColGrad"))
##If you are like me and don't like the tedious ifelse statements above, use this code instead:
##smoking.data2$education <- smoking.data2$hsgrad*1+smoking.data2$somecoll*2+smoking.data2$collgrad*3
with(smoking.data2, table(male,smoke,black,education))
## , , black = White, education = Other
##
## smoke
## male Nonsmoker Smoker
## Female 99 95
## Male 115 78
##
## , , black = Black, education = Other
##
## smoke
## male Nonsmoker Smoker
## Female 26 12
## Male 23 15
##
## , , black = White, education = HSGrad
##
## smoke
## male Nonsmoker Smoker
## Female 394 100
## Male 436 98
##
## , , black = Black, education = HSGrad
##
## smoke
## male Nonsmoker Smoker
## Female 45 9
## Male 57 10
##
## , , black = White, education = SomeColl
##
## smoke
## male Nonsmoker Smoker
## Female 414 45
## Male 369 51
##
## , , black = Black, education = SomeColl
##
## smoke
## male Nonsmoker Smoker
## Female 33 4
## Male 17 3
##
## , , black = White, education = ColGrad
##
## smoke
## male Nonsmoker Smoker
## Female 662 18
## Male 699 21
##
## , , black = Black, education = ColGrad
##
## smoke
## male Nonsmoker Smoker
## Female 16 1
## Male 12 1
3. Produce a table of the means and standard deviations of birwt for all the subgroups defined by smoke, education, male, and black. Hint: Use the table command with smoke as rowvar, education as colvar, and male and black as superrowvars; see help table.
##############################################################
##Either do the mean and SD in two separate steps like this:##
##############################################################
# means <- aggregate(smoking.data2$birwt, list(smoking.data2$smoke, smoking.data2$education, smoking.data2$male, smoking.data2$black),mean)
# SDs <- aggregate(smoking.data2$birwt, list(smoking.data2$smoke, smoking.data2$education, smoking.data2$male, smoking.data2$black),sd)
# descriptives <- cbind(means, SDs)
############################################################
#Or you may just do both steps at once like the following:##
############################################################
descriptives <- do.call(data.frame, aggregate(birwt ~ smoke+education+male+black, data = smoking.data2, FUN = function(x) c(mn = mean(x), sd = sd(x))))
#Now we populate the 'xx' vector from the 'descriptives' data frame
q=0
xx<-c()
for(i in 1:32){
xx<- c(xx,descriptives[i,5])
xx <- c(xx,descriptives[i,6])
q=q+2
}
#create a matrix with 4 rows and populate it by row.
m.sd <- matrix(data=round(xx,digits = 2), ncol = 16,nrow = 4, byrow = TRUE)
ddd<-data.frame(m.sd)
rownames(ddd) <- c("Female", "Male", "Female2", "Male2")
ddd %>%
kable("html", escape=FALSE)%>%
kable_styling(c("striped", "bordered", "hover"))%>%
add_header_above(c(" ", "Non-Smoker" = 2, "Smoker"=2, "Non-Smoker" = 2, "Smoker"=2,"Non-Smoker" = 2, "Smoker"=2,"Non-Smoker" = 2, "Smoker"=2))%>%
add_header_above(c(" ", "Other" = 4, "High School Graduate" = 4, "Some Collge" = 4, "College Graduate"=4)) %>%
add_header_above(c(" ", "Education" = 16))%>%
group_rows("White", 1,2)%>%
group_rows("Black",3,4)
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | X13 | X14 | X15 | X16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
White | ||||||||||||||||
Female | 3307.70 | 455.65 | 3074.78 | 551.68 | 3443.77 | 473.14 | 3157.85 | 506.02 | 3443.65 | 467.92 | 3266.11 | 451.56 | 3462.05 | 483.49 | 3291.39 | 491.03 |
Male | 3441.71 | 551.25 | 3162.87 | 427.40 | 3540.63 | 496.32 | 3307.01 | 492.60 | 3536.32 | 482.28 | 3329.67 | 511.32 | 3580.05 | 497.23 | 3272.19 | 460.49 |
Black | ||||||||||||||||
Female2 | 3146.62 | 410.12 | 3088.33 | 621.93 | 3134.96 | 530.18 | 3081.22 | 325.96 | 3274.58 | 483.86 | 2626.25 | 405.43 | 3232.06 | 462.91 | 3289.00 | NA |
Male2 | 3178.39 | 466.66 | 2922.80 | 570.12 | 3184.26 | 523.40 | 2829.20 | 505.37 | 3369.71 | 268.79 | 3873.33 | 626.44 | 3393.50 | 364.43 | 2778.00 | NA |
4. Produce box plots for the same groups. Hint: Use the asyvars option and the over() option for each grouping variable except the last (starting with over(education)), and use the by() option for the last grouping variable. Use the nooutsides option to suppress the display of outliers, making the graph easier to interpret. What do you observe?
###################################
#https://www.r-bloggers.com/box-plot-with-r-tutorial/
boxplot(birwt~smoke+male+black+education, outline=FALSE, las=2, par(mar = c(10, 4, 2, 2)+ 0.1,cex.axis=.7), at =c(1,2,3,4,5,6,7,8,10,11,12,13,14,15,16,17,19,20,21,22,23,24,25,26,28,29,30,31,32,33,34,35), col=c(rep("chartreuse",8),rep("gold",8),rep("skyblue",8),rep("thistle1",8)),data = smoking.data2)
legend('toprigh',legend= c("Other","HS","CLG","ALUM"), lty=1, col=c("chartreuse", "gold","skyblue","thistle1"), bty='n', cex=.75)
ggplot(aes(y = birwt, x = smoke, fill = education), data = smoking.data2) + geom_boxplot(outlier.color = NA)+ facet_wrap(~black+male)
5. Regress birwt on smoke and interpret the estimated regression coefficients.
reg.model1 <- lm(birwt ~ smoke, data = smoking.data2)
m0.or <- outreg(list("Linear Model" = reg.model1), type = "html", browse = FALSE)
You asked for html output, but set browser = FALSE.As a result, we suspect you know what you are doing.Inspect the output object, write it to a file with cat, as in:myreg <- outreg(modelList = list(Linear Model
= reg.model1), type = “html”, browser = FALSE)
cat(myreg, file = “reg.html”) Then open reg.html in a word processor or web browser.
m0.or <- gsub("[ ]{2,}", " ", m0.or)
cat(m0.or)
Linear Model | |||||||||||||||||||||
Estimate | |||||||||||||||||||||
(S.E.) | |||||||||||||||||||||
(Intercept) | 3477.902*** | ||||||||||||||||||||
( 8.495) | |||||||||||||||||||||
smokeSmoker | -289.763*** | ||||||||||||||||||||
(22.620) | |||||||||||||||||||||
N | 3978 | ||||||||||||||||||||
RMSE | 496.561 | ||||||||||||||||||||
R2 | 0.040 | ||||||||||||||||||||
|
6. Add mage, male, black, hsgrad, somecoll, and collgrad to the model in step 5.
reg.model2 <- lm(birwt ~ smoke+mage+male+black+education, data = smoking.data2)
m0.or <- outreg(list("Linear Model" = reg.model2), type = "html", browse = FALSE)
You asked for html output, but set browser = FALSE.As a result, we suspect you know what you are doing.Inspect the output object, write it to a file with cat, as in:myreg <- outreg(modelList = list(Linear Model
= reg.model2), type = “html”, browser = FALSE)
cat(myreg, file = “reg.html”) Then open reg.html in a word processor or web browser.
m0.or <- gsub("[ ]{2,}", " ", m0.or)
cat(m0.or)
Linear Model | |||||||||||||||||||||
Estimate | |||||||||||||||||||||
(S.E.) | |||||||||||||||||||||
(Intercept) | 3214.389*** | ||||||||||||||||||||
(45.941) | |||||||||||||||||||||
smokeSmoker | -239.292*** | ||||||||||||||||||||
(23.758) | |||||||||||||||||||||
mage | 5.133** | ||||||||||||||||||||
( 1.713) | |||||||||||||||||||||
maleMale | 102.218*** | ||||||||||||||||||||
(15.463) | |||||||||||||||||||||
blackBlack | -237.016*** | ||||||||||||||||||||
(30.597) | |||||||||||||||||||||
educationHSGrad | 81.563** | ||||||||||||||||||||
(28.275) | |||||||||||||||||||||
educationSomeColl | 92.746** | ||||||||||||||||||||
(30.559) | |||||||||||||||||||||
educationColGrad | 100.044** | ||||||||||||||||||||
(31.742) | |||||||||||||||||||||
N | 3978 | ||||||||||||||||||||
RMSE | 487.172 | ||||||||||||||||||||
R2 | 0.077 | ||||||||||||||||||||
adj R2 | 0.075 | ||||||||||||||||||||
|
7. Interpret each of the estimated regression coefficients from step 6.
8. Discuss the difference in the estimated coefficient of smoke from steps 5 and6.
9. Use the margins command to produce a table of estimated population means for girls born to white mothers of average age by smoking status and education. (This requires you to run the regress command again with the factor variables i.smoke and i.education.)
10. Extend the model from step 6 to investigate whether the adjusted difference in mean birthweight between boys and girls differs between black and white mothers. Is there any evidence at the 5% level that it does?