Country and Survey Wave fixed effects logistic regression model

Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#16

11 Feb 2019, 10:53

Dear Clyde,

I also have similar datasets. The datasets were collected in 2 waves or 2 different years. The data was collected on same randomly villages or neighborhoods nested under three different geographic zones, but not same randomly households within those villages or neighborhoods. Child-related data were collected in those selected households. Therefore, children-household. I would like to compare the health outcomes (dependent variable is continuous) of children from these households or villages.

Is it considered as cross-sectional since it did not follow the same individuals/children? Is it a unbalanced panel? Should I be comparing the outcomes at the group-level i.e. at village level? Will you Kindly guide me the most appropriate analysis I can pursue?

Please let me know if I did not articulate properly.

Thank you and look forward to reading your guidance.
BR,
Aye Aye
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#17

11 Feb 2019, 11:37

Is it considered as cross-sectional since it did not follow the same individuals/children?

I would describe it as serial cross-sectional data. It is not truly longitudinal because you do not have recurring observations of the same people over time. But it is not simply cross-sectional either because the data are varying across time.

Is it a unbalanced panel?

It is not a panel at all, in the strict sense of the term panel. It is serial cross-sectional data, not panel data. Moreover, it appears that you have three levels here: individuals within villages within zones. So even though the -xt- commands are often used with serial cross-sectional data when there are two levels, their use here may be inappropriate.

Should I be comparing the outcomes at the group-level i.e. at village level?

That depends on your specific research questions. Formulate those questions clearly, if you have not already done so. Re-read them carefully if you have. Then you should be able to answer this question directly from that on your own.

Will you Kindly guide me the most appropriate analysis I can pursue?

Ditto.

If after you have carefully thought about your research questions you still need assistance with the last two questions, post back, including the research questions themselves.
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#18

11 Feb 2019, 11:43

Dear Clyde,

I also have similar datasets. The datasets were collected in 2 waves or 2 different years. The data was collected on same randomly villages or neighborhoods nested under three different geographic zones, but not same randomly households within those villages or neighborhoods. Child-related data were collected in those selected households. Therefore, children-household. I would like to compare the health outcomes (dependent variable is continuous) of children from these households or villages.

Is it considered as cross-sectional since it did not follow the same individuals/children? Is it a unbalanced panel? Should I be comparing the outcomes at the group-level i.e. at village level? Will you Kindly guide me the most appropriate analysis I can pursue?

Please let me know if I did not articulate properly.

Thank you and look forward to reading your guidance.
BR,
Aye Aye
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#19

11 Feb 2019, 11:51

Dear Clyde,
Sorry for double-posting as I caught page-2 later with your response.

Since I cannot compare at the individual-child level, my thinking is that comparing the outcome at the group level : i.e. at village level. Are the health of children from the same villages improving over time (i.e. from say comparing between 2015 and 2017) after receiving sets of interventions?

I am thinking that I can also compare the outcomes across geographic zones ..e.g. middle, east, west regions. Does it make sense?

I would love to get your further guidance. I would also need your credentials for formal citations/reference in my research work.

I look forward to reading you again.

BR,
Aye Aye
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#20

11 Feb 2019, 11:56

Dear Clyde,

Sorry for double-posting as I caught page-2 later with your response. Please DISREGARD my last post as it did not give more information of the data.

Since I cannot compare at the individual-child level, my thinking is that comparing the outcome at the group level : i.e. at village level.

Are the health of children from the same villages improving over time (i.e. from say comparing between 2015 and 2017) after receiving sets of interventions?

There are also control villages or non-intervention villages as well in the data. How are they doing comparing to control villages? Any sample STATA syntax I can follow given the limited information I provide here.

I am thinking that I can also compare the outcomes across geographic zones ..e.g. middle, east, west regions. Does it make sense?

I would love to get your further guidance.

I would also need your credentials for formal citations/reference in my research work.

I look forward to reading you again.

BR,
Aye Aye
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#21

11 Feb 2019, 12:36

Hi Clyde,
I meant it for transparency and integrity purpose to reference. It is okay if I use statalist as something I consulted with. Is it also possible for off-line consultation? Where shall I drop my contact email?
Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#22

11 Feb 2019, 13:02

Yes, it seems like it would make sense to compare some measure(s) of health in the villages over. It is unclear to me whether the 2015 wave of data in the villages that received the interventions are from a time before the interventions began, or if all of the data on these villages is post-intervention. If you have pre-intervention data, then you can do a difference-in-differences analysis, which is much stronger for causal inference than a simple comparison of intervention and control villages. The particular code for a difference-in-differences design would depend on whether the interventions began in all of the intervention villages at the same time, or at different times.

Your description leaves other questions unanswered. You refer to intervention(s). Did every village that received interventions receive all of them? Did they all begin at the same time as a single package? If different interventions were used at different times in different places, would the design permit a separate analysis of each intervention?

I cannot tell whether comparisons across regions is sensible or not. Did each region have both intervention and control villages? Is there some reason to think that the intervention effects might differ appreciably between the two regions (as opposed to baseline health levels differing between the regions)?
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#23

12 Feb 2019, 12:03

Dear Clyde,

On your first point:

Collection Nutrition data of household-children (outcome of interest for my research) was done in 2015. Even if the interventions began before 2015, I would not have outcome data.

Therefore, can we do two types of analysis based on two situations:
that 2015 data from villages are post-intervention data (and the data from the same villages were collected in 2017 again); then how shall I proceed with the analysis assuming post-intervention data?

that the 2015 data from villages are from a time before the interventions began (and the data from the save villages were collected in 2017); then, do a difference-in-differences analysis (am I reading you right?)

On your second point [and above (ii)]:
“The particular code for a difference-in-differences design would depend on whether the interventions began in all of the intervention villages at the same time, or at different times. Your description leaves other questions unanswered. You refer to intervention(s). Did every village that received interventions receive all of them? Did they all begin at the same time as a single package? ”

It is understood that the intervention began at the same time.
The intervention was given to the entire village. In the questionnaire, it asked “did you or any of your household members received any type of (health, hygiene, nutrition) intervention?”

On your following question:
If different interventions were used at different times in different places, would the design permit a separate analysis of each intervention?
What if different interventions were used or introduced after 2015 data collection (and then another data collection done in 2017) in the same villages, how can I deal with this?

On your last point:
I cannot tell whether comparisons across regions is sensible or not. Did each region have both intervention and control villages? Is there some reason to think that the intervention effects might differ appreciably between the two regions (as opposed to baseline health levels differing between the regions)?

It is really not sensible since geographic zones are different because they are different in terms of topography, plants each region can grow. I should be comparing within geographic zone between intervention and control villages. Am I right thinking this?

Thanks so much again.

With best regards,
Aye Aye
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#24

12 Feb 2019, 12:50

Collection Nutrition data of household-children (outcome of interest for my research) was done in 2015. Even if the interventions began before 2015, I would not have outcome data.

I'm confused. In #16 and #19 you said you want to compare health outcomes. How can you do that if you don't have any outcome data?

[quote]Therefore, can we do two types of analysis based on two situations:
that 2015 data from villages are post-intervention data (and the data from the same villages were collected in 2017 again); then how shall I proceed with the analysis assuming post-intervention data?

that the 2015 data from villages are from a time before the interventions began (and the data from the save villages were collected in 2017); then, do a difference-in-differences analysis (am I reading you right?)[/qutoe]

Yes, in the sense that there are two alternative situations on the ground regarding 2015, one of which mandates doing analysis 2, and the other of which requires some other analysis. Just what that analysis would be is a bit hard to say, but it would have to be some uncontrolled comparison that assesses the change in health outcomes over time subsequent to the intervention, and we could contrast those changes in the intervention and control villages. But this is a weak design for causal inferences, because, for all we know, the health outcomes were changing differently over time even before the interventions.

Actually, as I re-read the last paragraph, I realize that with only two years worth of data, even if 2015 is a pre-intervention year, we would not really be able to verify that the treatment and control villages were experiencing similar ("parallel") trends before the intervention because we have only that one pre-intervention time point.

So the actual analysis seems to boil down to more or less the same thing, though with slightly different interpretations. Probably we're looking at a multi-level model something along the lines of:

Code:

regression_command outcome_variable i.treatment##i.year i.region || village:

The specific regression command would depend on the type of outcome variable in question. treatment would be a variable that is coded 1 for every observation in any village that received intervention, and 0 for every observation in the control villages. year would be a variable that is either 2015 or 2017. Region would be a variable whose values are the two regions the villages come from. (This code does not look for differences in intervention effects by region, just adjusts for baseline regional differences in the health outcome variable.) There may be other individual or villlage-level variables that you would want to include in the model as well.

If different interventions were used at different times in different places, would the design permit a separate analysis of each intervention?
What if different interventions were used or introduced after 2015 data collection (and then another data collection done in 2017) in the same villages, how can I deal with this?

This really depends on whether the same interventions applied to everyone in a given village, but differed across villages, or whether different interventions were used for different people within the same villages. It also depends on how many different interventions there were and how many different people and villages received them.

It is really not sensible since geographic zones are different because they are different in terms of topography, plants each region can grow. I should be comparing within geographic zone between intervention and control villages. Am I right thinking this?

Possibly. Or maybe not. First, let's distinguish two things. One is whether the regions themselves differ in baseline levels of the health outcome variables in question. They almost certainly do: topography and local flora (and therefore also local fauna) can influence health in a variety of ways. That's why I included i.region in the code suggested a few paragraphs above. A separate question is whether the interventions would have different effectiveness in the two regions. This question has to be answered by thinking about the interventions themselves and how they play out in each area. If the topography of one area, for instance, makes transportation difficult and limits the distances people can travel in short time periods, but the other's topography facilitates getting around, then an intervention that involves traveling someplace to get something (a test, education, a vaccine, whatever) would probably be more effective in the second region than in the first. That's just one example. So you have to really dissect the interventions down to their components and the demands they place on the people who receive them and think about how being in the different regions might help or hinder.
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#25

15 Feb 2019, 10:37

Hi Clyde,

On the command below, is i.treatment##i.year an interaction term? How shall I explain that in lay man language?

regression_command outcome_variable i.treatment##i.year i.region || village: On the rest of the points, I will synthesize my thinkings first and consult with you again. Thanks so so much! Hope to hear back from you on the command. Best Regards, Aye Aye
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#26

15 Feb 2019, 19:03

Please read -help fvvarlist- for a full explanation of factor variable notation in Stata.

Briefly, i.treatment##i.year is an expression that Stata will expand to i.treatment i.year i.treatment#i.year, that is, the "main" effects of treatment and year as well as their interaction.
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#27

10 Mar 2019, 16:26

Dear Clyde,

Hi again.

I was finally able to clean up two data sets and merged them. Though the data is not panel data as you rightly said, I felt I needed to make the data more useful as it had nested structure: children nested under household/family, household nested under villages, villages nested under ecological zones. With that in mind, I run the following model and got the error message:

// at village level & family level//
// DV: zlen (Height for Age Z score) IV:hlth_hyg_nut2 (intervention)//
// Other IVs or covariate: educational level, child sex, dietary diversity, income, number of drinking sources a household has in each season//
// zlen_flag==0 is excluding extreme values as suggested by WHO//
// with dietary diversity//

mixed zlen i.hlth_hyg_nut2 i.caregiver_edu3 i.child_sex2 i.dietarydiversity_category i.fiveincomecat2 i.total_drinkingsource3 ///
i.ecologicalzone i.year##i.treat_control if zlen_flag==0 ||vlgname2:||family_id:

estimates store zlen_hlygnut_vlgfamilywithD

estat icc

//creating a variable that indicate sample used in the above model//
gen in_model=e(sample). // this was created after getting the not-testable error message and rerun the models and contrast command, yet gave the not-testable error message for vars: year, treat_control i.year##i.treat_control

contrast hlth_hyg_nut2 caregiver_edu3 child_sex2 dietarydiversity_category fiveincomecat2 total_drinkingsource3 ///
ecologicalzone year treat_control i.year##i.treat_control
// ??????? year treat_control i.year##i.treat_control: not testable??? after running contrast command//

Another clarification I would like to get is that ecological zone is placed under fixed part of the model (left side of this || ), does that mean that I am comparing zone 2, 3, 4, to zone 1. Is that what I should be doing? Does it mean that I am comparing whether the interventions would have different effectiveness in each zones comparing to zone 1?

Also, how do I add mediation (and moderation) syntax? Whether dietary diversity might be mediating between the children's height for age outcome (Y) and intervention (X)? What else can I do pre and post estimations?

Would you also point me to an example article or reading that have used this model including interpreting the coefficients, icc, contrast outputs? That would really help me how I should write up my results.
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#28

26 Jan 2021, 13:36

Can someone clarify me what's the role played by (wave) in the mode? I mean, what "wave" between parentheses does?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#29

26 Jan 2021, 15:20

The parentheses do nothing at all in this context. They are unnecessary. The variable wave is included in the model just as it would be if shown without parentheses. I have no idea why O.P. chose to use the parentheses. They are syntactically legal and semantically empty, so I didn't change them. I wouldn't have used them myself in writing my own code.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment