Smoking-and-birthweight data

Here we consider a subset of data on birth outcomes, provided by Abrevaya (2006), which is analyzed in chapter 3. The data were derived from birth certificates by the U.S. National Center for Health Statistics. The variables in smoking.dta that we will consider here are:

momid: mother identifier
idx: chronological numbering of multiple children to the same mother in the database (1: first child; 2: second child; 3: third child)
birwt: birthweight (in grams)
mage: mother’s age at the birth of the child (in years)
smoke: dummy variable for mother smoking during pregnancy (1: smoking; 0: not smoking)
male: dummy variable for baby being male (1: male; 0: female)
hsgrad: dummy variable for mother having graduated from high school
somecoll: dummy variable for mother having some college education (but no degree)
collgrad: dummy variable for mother having graduated from college
black: dummy variable for mother being black (1: black; 0: white)

smoking.data <- read.dta("http://www.stata-press.com/data/mlmus3/smoking.dta")
head(smoking.data)

##   momid idx stateres mage meduc mplbir nlbnl gestat birwt cigs     smoke
## 1    14   1       AL   16    10     VA     0     24  2790    0 Nonsmoker
## 2    14   2       AL   17    10     VA     1     42  2693    0 Nonsmoker
## 3    14   3       AL   20    10     VA     2     39  3600    0 Nonsmoker
## 4    25   1       AL   23    11     NJ     2     41  2778    0 Nonsmoker
## 5    25   2       AL   24    11     NJ     3     37  2835    0 Nonsmoker
## 6    25   3       AL   26    11     NJ     4     40  3402    0 Nonsmoker
##     male year married hsgrad somecoll collgrad magesq black kessner2
## 1 Female    0       0      0        0        0    256 Black        0
## 2 Female    1       0      0        0        0    289 Black        0
## 3 Female    4       0      0        0        0    400 Black        0
## 4   Male    2       0      0        0        0    529 Black        1
## 5   Male    3       0      0        0        0    576 Black        1
## 6   Male    5       0      0        0        0    676 Black        1
##   kessner3 novisit pretri2 pretri3
## 1        1       0       0       1
## 2        1       0       0       1
## 3        0       0       0       0
## 4        0       0       1       0
## 5        0       0       1       0
## 6        0       0       1       0

Question One

1. Keep only the data on each mother’s first birth, that is, where idx is 1.

smoking.data2 <- smoking.data[which(smoking.data$idx==1),]
dim(smoking.data2)

## [1] 3978   24

Question Two

2. Create the variable education, taking the value 1 if hsgrad is 1, the value 2 if somecoll is 1, the value 3 if collgrad is 1, and the value 0 otherwise.

smoking.data2$education<- factor(ifelse(smoking.data2$hsgrad==1,1, ifelse(smoking.data2$somecoll==1, 2, ifelse(smoking.data2$collgrad==1,3,0))), labels=c("Other","HSGrad","SomeColl","ColGrad")) 

##If you are like me and don't like the tedious ifelse statements above, use this code instead:
##smoking.data2$education <- smoking.data2$hsgrad*1+smoking.data2$somecoll*2+smoking.data2$collgrad*3

with(smoking.data2, table(male,smoke,black,education))

## , , black = White, education = Other
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        99     95
##   Male         115     78
## 
## , , black = Black, education = Other
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        26     12
##   Male          23     15
## 
## , , black = White, education = HSGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female       394    100
##   Male         436     98
## 
## , , black = Black, education = HSGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        45      9
##   Male          57     10
## 
## , , black = White, education = SomeColl
## 
##         smoke
## male     Nonsmoker Smoker
##   Female       414     45
##   Male         369     51
## 
## , , black = Black, education = SomeColl
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        33      4
##   Male          17      3
## 
## , , black = White, education = ColGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female       662     18
##   Male         699     21
## 
## , , black = Black, education = ColGrad
## 
##         smoke
## male     Nonsmoker Smoker
##   Female        16      1
##   Male          12      1

Question Three

3. Produce a table of the means and standard deviations of birwt for all the subgroups defined by smoke, education, male, and black. Hint: Use the table command with smoke as rowvar, education as colvar, and male and black as superrowvars; see help table.

##############################################################
##Either do the mean and SD in two separate steps like this:##
##############################################################

# means <- aggregate(smoking.data2$birwt, list(smoking.data2$smoke, smoking.data2$education, smoking.data2$male, smoking.data2$black),mean)
# SDs <- aggregate(smoking.data2$birwt, list(smoking.data2$smoke, smoking.data2$education, smoking.data2$male, smoking.data2$black),sd)
# descriptives <- cbind(means, SDs)

############################################################
#Or you may just do both steps at once like the following:##
############################################################

descriptives <- do.call(data.frame, aggregate(birwt ~ smoke+education+male+black, data = smoking.data2, FUN = function(x) c(mn = mean(x), sd = sd(x))))

#Now we populate the 'xx' vector from the 'descriptives' data frame
q=0
xx<-c()
for(i in 1:32){
  xx<- c(xx,descriptives[i,5])
  xx <- c(xx,descriptives[i,6])
  q=q+2
}

#create a matrix with 4 rows and populate it by row.
m.sd <- matrix(data=round(xx,digits = 2), ncol = 16,nrow = 4, byrow = TRUE)

ddd<-data.frame(m.sd)

rownames(ddd) <- c("Female", "Male", "Female2", "Male2")


ddd %>%
kable("html", escape=FALSE)%>%
  kable_styling(c("striped", "bordered", "hover"))%>%
  add_header_above(c(" ", "Non-Smoker" = 2, "Smoker"=2, "Non-Smoker" = 2, "Smoker"=2,"Non-Smoker" = 2, "Smoker"=2,"Non-Smoker" = 2, "Smoker"=2))%>%
  add_header_above(c(" ", "Other" = 4, "High School Graduate" = 4, "Some Collge" = 4, "College Graduate"=4)) %>%
  add_header_above(c(" ", "Education" = 16))%>%
  group_rows("White", 1,2)%>%
  group_rows("Black",3,4)

	Education
	Other				High School Graduate				Some Collge				College Graduate
	Non-Smoker		Smoker		Non-Smoker		Smoker		Non-Smoker		Smoker		Non-Smoker		Smoker
	X1	X2	X3	X4	X5	X6	X7	X8	X9	X10	X11	X12	X13	X14	X15	X16
White
Female	3307.70	455.65	3074.78	551.68	3443.77	473.14	3157.85	506.02	3443.65	467.92	3266.11	451.56	3462.05	483.49	3291.39	491.03
Male	3441.71	551.25	3162.87	427.40	3540.63	496.32	3307.01	492.60	3536.32	482.28	3329.67	511.32	3580.05	497.23	3272.19	460.49
Black
Female2	3146.62	410.12	3088.33	621.93	3134.96	530.18	3081.22	325.96	3274.58	483.86	2626.25	405.43	3232.06	462.91	3289.00	NA
Male2	3178.39	466.66	2922.80	570.12	3184.26	523.40	2829.20	505.37	3369.71	268.79	3873.33	626.44	3393.50	364.43	2778.00	NA

Question Four

4. Produce box plots for the same groups. Hint: Use the asyvars option and the over() option for each grouping variable except the last (starting with over(education)), and use the by() option for the last grouping variable. Use the nooutsides option to suppress the display of outliers, making the graph easier to interpret. What do you observe?

###################################
#https://www.r-bloggers.com/box-plot-with-r-tutorial/
boxplot(birwt~smoke+male+black+education, outline=FALSE, las=2, par(mar = c(10, 4, 2, 2)+ 0.1,cex.axis=.7), at =c(1,2,3,4,5,6,7,8,10,11,12,13,14,15,16,17,19,20,21,22,23,24,25,26,28,29,30,31,32,33,34,35), col=c(rep("chartreuse",8),rep("gold",8),rep("skyblue",8),rep("thistle1",8)),data = smoking.data2)
legend('toprigh',legend= c("Other","HS","CLG","ALUM"), lty=1, col=c("chartreuse", "gold","skyblue","thistle1"), bty='n', cex=.75)

ggplot(aes(y = birwt, x = smoke, fill = education), data = smoking.data2) + geom_boxplot(outlier.color = NA)+ facet_wrap(~black+male)

Question Five

5. Regress birwt on smoke and interpret the estimated regression coefficients.

reg.model1 <- lm(birwt ~ smoke, data = smoking.data2)
m0.or <- outreg(list("Linear Model" = reg.model1), type = "html", browse  = FALSE)

You asked for html output, but set browser = FALSE.As a result, we suspect you know what you are doing.Inspect the output object, write it to a file with cat, as in:myreg <- outreg(modelList = list(Linear Model = reg.model1), type = “html”, browser = FALSE)

cat(myreg, file = “reg.html”) Then open reg.html in a word processor or web browser.

m0.or <- gsub("[ ]{2,}", " ", m0.or)
cat(m0.or)

Linear Model

Estimate

(S.E.)

(Intercept)

3477.902***

( 8.495)

smokeSmoker

-289.763***

(22.620)

3978

RMSE

496.561

R²

0.040

p ≤0.05** p ≤0.01*** p ≤0.001

Question Six

6. Add mage, male, black, hsgrad, somecoll, and collgrad to the model in step 5.

reg.model2 <- lm(birwt ~ smoke+mage+male+black+education, data = smoking.data2)
m0.or <- outreg(list("Linear Model" = reg.model2), type = "html", browse  = FALSE)

cat(myreg, file = “reg.html”) Then open reg.html in a word processor or web browser.

m0.or <- gsub("[ ]{2,}", " ", m0.or)
cat(m0.or)

Linear Model

Estimate

(S.E.)

(Intercept)

3214.389***

(45.941)

smokeSmoker

-239.292***

(23.758)

mage

5.133**

( 1.713)

maleMale

102.218***

(15.463)

blackBlack

-237.016***

(30.597)

educationHSGrad

81.563**

(28.275)

educationSomeColl

92.746**

(30.559)

educationColGrad

100.044**

(31.742)

3978

RMSE

487.172

R²

0.077

adj R²

0.075

p ≤0.05** p ≤0.01*** p ≤0.001

Question Seven

7. Interpret each of the estimated regression coefficients from step 6.

Question Eight

8. Discuss the difference in the estimated coefficient of smoke from steps 5 and6.

Question Nine

9. Use the margins command to produce a table of estimated population means for girls born to white mothers of average age by smoking status and education. (This requires you to run the regress command again with the factor variables i.smoke and i.education.)

Question Ten

10. Extend the model from step 6 to investigate whether the adjusted difference in mean birthweight between boys and girls differs between black and white mothers. Is there any evidence at the 5% level that it does?

Abdulrazzag M. Falah

Data Scientist & Education Specialist.

POLS 909 - Assignment 1