Abdulrazzag M. Falah

Data Scientist & Education Specialist.




POLS 909 - Assignment 1





Checking Missing Packages and Install and/or Load them

list.of.packages <- c("foreign", "kableExtra", "ggplot2", "knitr","formattable","dplyr", "rockchalk")

new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only=TRUE)
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] TRUE
## 
## [[6]]
## [1] TRUE
## 
## [[7]]
## [1] TRUE

Study variables

Smoking-and-birthweight data

Here we consider a subset of data on birth outcomes, provided by Abrevaya (2006), which is analyzed in chapter 3. The data were derived from birth certificates by the U.S. National Center for Health Statistics. The variables in smoking.dta that we will consider here are:

smoking.data <- read.dta("http://www.stata-press.com/data/mlmus3/smoking.dta")
head(smoking.data)
##   momid idx stateres mage meduc mplbir nlbnl gestat birwt cigs     smoke
## 1    14   1       AL   16    10     VA     0     24  2790    0 Nonsmoker
## 2    14   2       AL   17    10     VA     1     42  2693    0 Nonsmoker
## 3    14   3       AL   20    10     VA     2     39  3600    0 Nonsmoker
## 4    25   1       AL   23    11     NJ     2     41  2778    0 Nonsmoker
## 5    25   2       AL   24    11     NJ     3     37  2835    0 Nonsmoker
## 6    25   3       AL   26    11     NJ     4     40  3402    0 Nonsmoker
##     male year married hsgrad somecoll collgrad magesq black kessner2
## 1 Female    0       0      0        0        0    256 Black        0
## 2 Female    1       0      0        0        0    289 Black        0
## 3 Female    4       0      0        0        0    400 Black        0
## 4   Male    2       0      0        0        0    529 Black        1
## 5   Male    3       0      0        0        0    576 Black        1
## 6   Male    5       0      0        0        0    676 Black        1
##   kessner3 novisit pretri2 pretri3
## 1        1       0       0       1
## 2        1       0       0       1
## 3        0       0       0       0
## 4        0       0       1       0
## 5        0       0       1       0
## 6        0       0       1       0

Question One

1. Keep only the data on each mother’s first birth, that is, where idx is 1.

smoking.data2 <- smoking.data[which(smoking.data$idx==1),]
dim(smoking.data2)
## [1] 3978   24

Question Two

2. Create the variable education, taking the value 1 if hsgrad is 1, the value 2 if somecoll is 1, the value 3 if collgrad is 1, and the value 0 otherwise.

smoking.data2$education<- factor(ifelse(smoking.data2$hsgrad==1,1, ifelse(smoking.data2$somecoll==1, 2, ifelse(smoking.data2$collgrad==1,3,0))), labels=c("Other","HSGrad","SomeColl","ColGrad")) 

##If you are like me and don't like the tedious ifelse statements above, use this code instead:
##smoking.data2$education <- smoking.data2$hsgrad*1+smoking.data2$somecoll*2+smoking.data2$collgrad*3

with(smoking.data2, table(male,smoke,black,education))
## , , black = White, education = Other
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        99     95
##   Male         115     78
## 
## , , black = Black, education = Other
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        26     12
##   Male          23     15
## 
## , , black = White, education = HSGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female       394    100
##   Male         436     98
## 
## , , black = Black, education = HSGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        45      9
##   Male          57     10
## 
## , , black = White, education = SomeColl
## 
##         smoke
## male     Nonsmoker Smoker
##   Female       414     45
##   Male         369     51
## 
## , , black = Black, education = SomeColl
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        33      4
##   Male          17      3
## 
## , , black = White, education = ColGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female       662     18
##   Male         699     21
## 
## , , black = Black, education = ColGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        16      1
##   Male          12      1

Question Three

3. Produce a table of the means and standard deviations of birwt for all the subgroups defined by smoke, education, male, and black. Hint: Use the table command with smoke as rowvar, education as colvar, and male and black as superrowvars; see help table.

##############################################################
##Either do the mean and SD in two separate steps like this:##
##############################################################

# means <- aggregate(smoking.data2$birwt, list(smoking.data2$smoke, smoking.data2$education, smoking.data2$male, smoking.data2$black),mean)
# SDs <- aggregate(smoking.data2$birwt, list(smoking.data2$smoke, smoking.data2$education, smoking.data2$male, smoking.data2$black),sd)
# descriptives <- cbind(means, SDs)

############################################################
#Or you may just do both steps at once like the following:##
############################################################

descriptives <- do.call(data.frame, aggregate(birwt ~ smoke+education+male+black, data = smoking.data2, FUN = function(x) c(mn = mean(x), sd = sd(x))))

#Now we populate the 'xx' vector from the 'descriptives' data frame
q=0
xx<-c()
for(i in 1:32){
  xx<- c(xx,descriptives[i,5])
  xx <- c(xx,descriptives[i,6])
  q=q+2
}

#create a matrix with 4 rows and populate it by row.
m.sd <- matrix(data=round(xx,digits = 2), ncol = 16,nrow = 4, byrow = TRUE)

ddd<-data.frame(m.sd)

rownames(ddd) <- c("Female", "Male", "Female2", "Male2")


ddd %>%
kable("html", escape=FALSE)%>%
  kable_styling(c("striped", "bordered", "hover"))%>%
  add_header_above(c(" ", "Non-Smoker" = 2, "Smoker"=2, "Non-Smoker" = 2, "Smoker"=2,"Non-Smoker" = 2, "Smoker"=2,"Non-Smoker" = 2, "Smoker"=2))%>%
  add_header_above(c(" ", "Other" = 4, "High School Graduate" = 4, "Some Collge" = 4, "College Graduate"=4)) %>%
  add_header_above(c(" ", "Education" = 16))%>%
  group_rows("White", 1,2)%>%
  group_rows("Black",3,4)
Education
Other
High School Graduate
Some Collge
College Graduate
Non-Smoker
Smoker
Non-Smoker
Smoker
Non-Smoker
Smoker
Non-Smoker
Smoker
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16
White
Female 3307.70 455.65 3074.78 551.68 3443.77 473.14 3157.85 506.02 3443.65 467.92 3266.11 451.56 3462.05 483.49 3291.39 491.03
Male 3441.71 551.25 3162.87 427.40 3540.63 496.32 3307.01 492.60 3536.32 482.28 3329.67 511.32 3580.05 497.23 3272.19 460.49
Black
Female2 3146.62 410.12 3088.33 621.93 3134.96 530.18 3081.22 325.96 3274.58 483.86 2626.25 405.43 3232.06 462.91 3289.00 NA
Male2 3178.39 466.66 2922.80 570.12 3184.26 523.40 2829.20 505.37 3369.71 268.79 3873.33 626.44 3393.50 364.43 2778.00 NA

Question Four

4. Produce box plots for the same groups. Hint: Use the asyvars option and the over() option for each grouping variable except the last (starting with over(education)), and use the by() option for the last grouping variable. Use the nooutsides option to suppress the display of outliers, making the graph easier to interpret. What do you observe?

###################################
#https://www.r-bloggers.com/box-plot-with-r-tutorial/
boxplot(birwt~smoke+male+black+education, outline=FALSE, las=2, par(mar = c(10, 4, 2, 2)+ 0.1,cex.axis=.7), at =c(1,2,3,4,5,6,7,8,10,11,12,13,14,15,16,17,19,20,21,22,23,24,25,26,28,29,30,31,32,33,34,35), col=c(rep("chartreuse",8),rep("gold",8),rep("skyblue",8),rep("thistle1",8)),data = smoking.data2)
legend('toprigh',legend= c("Other","HS","CLG","ALUM"), lty=1, col=c("chartreuse", "gold","skyblue","thistle1"), bty='n', cex=.75)

ggplot(aes(y = birwt, x = smoke, fill = education), data = smoking.data2) + geom_boxplot(outlier.color = NA)+ facet_wrap(~black+male)

Question Five

5. Regress birwt on smoke and interpret the estimated regression coefficients.

reg.model1 <- lm(birwt ~ smoke, data = smoking.data2)
m0.or <- outreg(list("Linear Model" = reg.model1), type = "html", browse  = FALSE)

You asked for html output, but set browser = FALSE.As a result, we suspect you know what you are doing.Inspect the output object, write it to a file with cat, as in:myreg <- outreg(modelList = list(Linear Model = reg.model1), type = “html”, browser = FALSE)

cat(myreg, file = “reg.html”) Then open reg.html in a word processor or web browser.

m0.or <- gsub("[ ]{2,}", " ", m0.or)
cat(m0.or)
 
Linear Model
Estimate
   (S.E.)
(Intercept) 3477.902***
( 8.495)
smokeSmoker -289.763***
(22.620)
 
N 3978
RMSE 496.561
R2 0.040
 
  • p ≤0.05** p ≤0.01*** p ≤0.001

Question Six

6. Add mage, male, black, hsgrad, somecoll, and collgrad to the model in step 5.

reg.model2 <- lm(birwt ~ smoke+mage+male+black+education, data = smoking.data2)
m0.or <- outreg(list("Linear Model" = reg.model2), type = "html", browse  = FALSE)

You asked for html output, but set browser = FALSE.As a result, we suspect you know what you are doing.Inspect the output object, write it to a file with cat, as in:myreg <- outreg(modelList = list(Linear Model = reg.model2), type = “html”, browser = FALSE)

cat(myreg, file = “reg.html”) Then open reg.html in a word processor or web browser.

m0.or <- gsub("[ ]{2,}", " ", m0.or)
cat(m0.or)
 
Linear Model
Estimate
   (S.E.)
(Intercept) 3214.389***
(45.941)
smokeSmoker -239.292***
(23.758)
mage 5.133**
( 1.713)
maleMale 102.218***
(15.463)
blackBlack -237.016***
(30.597)
educationHSGrad 81.563**
(28.275)
educationSomeColl 92.746**
(30.559)
educationColGrad 100.044**
(31.742)
 
N 3978
RMSE 487.172
R2 0.077
adj R2 0.075
 
  • p ≤0.05** p ≤0.01*** p ≤0.001

Question Seven

7. Interpret each of the estimated regression coefficients from step 6.

Question Eight

8. Discuss the difference in the estimated coefficient of smoke from steps 5 and6.

Question Nine

9. Use the margins command to produce a table of estimated population means for girls born to white mothers of average age by smoking status and education. (This requires you to run the regress command again with the factor variables i.smoke and i.education.)

Question Ten

10. Extend the model from step 6 to investigate whether the adjusted difference in mean birthweight between boys and girls differs between black and white mothers. Is there any evidence at the 5% level that it does?