Hi,
I have an unbalanced panel dataset (N=2976, T=13), using survey responses.
My dependent variable is the household's ability to save (saving=1 if able to save, 0 otherwise), and I intend to use -xtprobit, re- to run my model.
hhid is the Household's unique identifier, and the data is yearly.
The variable -position- tells me the position in the household of the interviewee:
I would like to drop those who are not household heads (as the literature I am basing my work on uses data solely from household heads, and I think the financial data, e.g. amount saved, would be more accurate coming from household heads, because they may be better-informed than their children for example, on the household's financial affairs).
Q1: I wonder, have I now biased my sample by dropping observations that were not household heads?
Q2: If there is sample selection bias, please could you recommend how I may test for it? Is there a t-test that I could conduct, for example, to compare the difference in means before and after dropping observations?
Q3: Would you recommend that I look into Heckman models, or is Heckman not relevant here?
Many thanks
I have an unbalanced panel dataset (N=2976, T=13), using survey responses.
My dependent variable is the household's ability to save (saving=1 if able to save, 0 otherwise), and I intend to use -xtprobit, re- to run my model.
hhid is the Household's unique identifier, and the data is yearly.
Code:
. xtset hhid year panel variable: hhid (unbalanced) time variable: year, 2004 to 2016, but with gaps delta: 1 unit . . xtdes hhid: 6, 21, ..., 89972 n = 3316 year: 2004, 2005, ..., 2016 T = 13 Delta(year) = 1 unit Span(year) = 13 periods (hhid*year uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 1 1 1 3 6 13 13 Freq. Percent Cum. | Pattern ---------------------------+--------------- 280 8.44 8.44 | ...........11 247 7.45 15.89 | ............1 211 6.36 22.26 | 1111111111111 164 4.95 27.20 | 1............ 95 2.86 30.07 | ..........111 81 2.44 32.51 | ...........1. 80 2.41 34.92 | ..........1.. 77 2.32 37.24 | .1........... 74 2.23 39.48 | 11........... 2007 60.52 100.00 | (other patterns) ---------------------------+--------------- 3316 100.00 | XXXXXXXXXXXXX
Code:
. codebook position ---------------------------------------------------------------------------------- position position in the household ---------------------------------------------------------------------------------- type: numeric (double) label: positie range: [1,7] units: 1 unique values: 7 missing .: 1/14,145 tabulation: Freq. Numeric Label 13,217 1 head of the household 684 2 spouse 225 3 permanent partner (not married) 10 4 parent (in law) 3 5 child living at home 2 6 housemate 3 7 family member or border 1 .
Code:
. drop if (position==2 | position==3 | position==4 | position==5 | position==6 | p > osition==7 | position==.) (928 observations deleted)
Q2: If there is sample selection bias, please could you recommend how I may test for it? Is there a t-test that I could conduct, for example, to compare the difference in means before and after dropping observations?
Q3: Would you recommend that I look into Heckman models, or is Heckman not relevant here?
Many thanks
Comment