Counting number of observations per household

Lilly Tee

Join Date: May 2022

Posts: 37
#16

05 Aug 2023, 14:44

I have also just realised that using the code in #8 that will also keep grandparents in the dataset. Since in the dataset the parents - parents could also be in the dataset. Is there any way of solving this
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

#17

05 Aug 2023, 15:58

I believe the following code distinguishes grandparents as well as parents. (Every grandparent is, of course, also a parent, but not vice versa.) However, as best I can tell, in the example data you gave, there are no instances of grandparents in the household. So I have not been able to properly test this code. If it produces incorrect results, please post back with new example data that does include grandparents but fails to correctly identify who is and is not a grandparent.

Code:

frame put g_hidp g_pno parent*_pno, into(parents)
frame parents {
    reshape long parent@_pno, i(g_hidp g_pno)
    drop if parent_pno == 0
    drop g_pno _j
    duplicates drop
    
    frlink 1:1 g_hidp parent_pno, frame(default g_hidp g_pno)
    frget parent*_pno, from(default)
    frame put g_hidp parent*pno, into(grandparents)
    frame grandparents {
        drop parent_pno
        gen `c(obs_t)' obs_no = _n
        reshape long parent@_pno, i(obs_no)
        drop _j obs_no
        drop if missing(parent_pno) | parent_pno == 0
        capture duplicates drop
        if c(rc) != 2000    { // UNEXPECTED ERROR
            display as error "Unexpected Error"
            exit c(rc)
        }
    }
    des
    frlink 1:1 g_hidp parent_pno, frame(grandparents)
    gen byte is_grandparent = !missing(grandparents)
    drop grandparents
    frame drop grandparents
}
frlink m:1 g_hidp g_pno, frame(parents g_hidp parent_pno)
frget is_grandparent, from(parents)
replace is_grandparent = 0 if missing(is_grandparent)
gen byte is_parent = !missing(parents)
drop parents
frame drop parents

Comment

Lilly Tee

Join Date: May 2022

Posts: 37
#18

06 Aug 2023, 04:44

I think I was able to solve it in an alternative way. In the original dataset of just individuals, using the individual identifier, I was able to see whether the parents and grandparents were in the same sample. If the parent_pno was identified then surely it would mean that the parents and their parents were in the same sample. So I deleted the observations where the parent_pno was identified in this dataset.

Then I merged with the child dataset as usual and re-run the code from #8, and it seemed to make sense when I would browse the dataset. I hope this makes sense
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

#19

06 Aug 2023, 09:43

I don't quite follow what you are doing in #18, so I'm not going to offer an opinion on whether it is correct.

If you find that your method is not working, you can revert to what I show in #17--except that I noticed there is an error in it. It should be:

Code:

frame put g_hidp g_pno parent*_pno, into(parents)
frame parents {
    reshape long parent@_pno, i(g_hidp g_pno)
    drop if parent_pno == 0
    drop g_pno _j
    duplicates drop
    
    frlink 1:1 g_hidp parent_pno, frame(default g_hidp g_pno)
    frget parent*_pno, from(default)
    frame put g_hidp parent*pno, into(grandparents)
    frame grandparents {
        drop parent_pno
        gen `c(obs_t)' obs_no = _n
        reshape long parent@_pno, i(obs_no)
        drop _j obs_no
        drop if missing(parent_pno) | parent_pno == 0
        capture duplicates drop
        if !inlist(c(rc), 0, 2000)    { // UNEXPECTED ERROR
            display as error "Unexpected Error"
            exit c(rc)
        }
    }
    des
    frlink 1:1 g_hidp parent_pno, frame(grandparents)
    gen byte is_grandparent = !missing(grandparents)
    drop grandparents
    frame drop grandparents
}
frlink m:1 g_hidp g_pno, frame(parents g_hidp parent_pno)
frget is_grandparent, from(parents)
replace is_grandparent = 0 if missing(is_grandparent)
gen byte is_parent = !missing(parents)
drop parents
frame drop parentss

Last edited by Clyde Schechter; 06 Aug 2023, 09:47.

Comment

Lilly Tee

Join Date: May 2022

Posts: 37
#20

07 Aug 2023, 06:39

Hi, it worked; thank you! I appreciate the guidance.

I was now considering running the regression; it would be interesting to see if a certain policy may that increases parental income may have an impact on child health for mothers and fathers. So it would be a diff-in-diff model I would be running. I was wondering if the following code would be correct in running the regression. Below is my regression model:

Where post is a dummy for time period after policy, Treated is a dummy for those who receive the treatment, and X is a vector of parental controls such as age, gender, qualification, etc. Would my above regression be correct if I ran it separately for mothers and fathers, or would I also have to include the is_mother dummy and the interaction term of it in the above regression? I wondered whether I should include another vector variable controlling for child characteristics such as age and gender.

The code for mothers and fathers separate diff in diff regressions are:
regress child_health post treat (post*treat) mother_age qualification marital status, robust

regress child_health post treat (post*treat) father_age qualification marital status, robust

In addition, I was wondering if there may be more than one child in the household, and I would run this regression for each child, so I was wondering if I have to cluster the standard errors or how I would do this.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#21

07 Aug 2023, 09:06

Is the intervention applied at the household level? Or is it applied at the level of the parent? Or is it applied at the level of the child?

You have multi-level data here: children within parents in a multiple-membership model, all of which is nested within households. It is unlikely that the model you show, which has no such nesting structure, will be adequate. To use the model you show you would have to collapse the data to provide only one observation per household at each time point. That would involve somehow calculating an aggregate child health outcome measure out of the individual child measures. And it would also only be workable if the intervention is applied at the household level. It would also mean giving up the parent-level covariates ("control variables.")
Comment
Lilly Tee

Join Date: May 2022

Posts: 37
#22

07 Aug 2023, 09:14

The intervention is applied at the level of the parent. The parents either see an increase in their income or not due to the policy.

To use the model you show, you would have to collapse the data to provide only one observation per household at each time point. That would involve somehow calculating an aggregate child health outcome measure out of the individual child measures. And it would also only be workable if the intervention is applied at the household level. It would also mean giving up the parent-level covariates ("control variables.")

I understand what you're saying to an extent. But I thought I could run the model I have indicated above for each child in the household, or would this not be possible. That is why I thought if I were to do each child in the household, I would need to include child control variables as well

Last edited by Lilly Tee; 07 Aug 2023, 09:16.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment