Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pseudo panel data analysis with categorical variable

    Hello everyone,
    I am currently conducting a pseudo data analysis based on six waves of DHS survey data for Ghana. I will like to look into the effect of socioeconomic inequality on child health. Following the existing discussion for pseudo analysis on the STATA platform, I plan on creating cohort based on regions and year of birth. But since in practice the observed variables of interest will be replaced by the means of these variables in each cohort, I am unsure of how deal the categorical and ordinal nature of the variables of interest.
    What can I do in this instance?
    Fever

    Tabulation: Freq. Numeric Label
    15,488 10 no
    3,535 21 yes, fever in last 2 weeks
    1,291 22 yes, fever in last 4 weeks
    318 97 don't know
    59 98 missing
    1,667 99 niu (not in universe)


    Mom's education

    Tabulation: Freq. Numeric Label
    9,267 0 no education
    6,638 1 primary
    6,053 2 secondary
    400 3 higher


    Wealth

    Tabulation: Freq. Numeric Label
    5,710 1 poorest
    4,016 2 poorer
    3,326 3 middle
    2,871 4 richer
    2,320 5 richest
    4,115 .

    Ps: I will like to conduct a fixed effects regression of mom's education and wealth(measure of socioeconomic status) on fever(measure of child health)
    Last edited by Vanessa Owusu Ansah; 20 Dec 2023, 05:06.

  • #2
    To use the within-estimator, you might consider redefining your outcome, for example, as the proportion of mothers with secondary school education or higher. However, if there is minimal or no variation in this outcome over time, it becomes challenging to identify the parameters in your model. The handling of categorical explanatory variables is less ambiguous and will be explained in what follows:

    In pseudo panels, units are classified into cohorts and then their observations are averaged. The grouping model is as described in Devereux (JBES 2007):
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	199.3 KB
ID:	1737841









    With categorical explanatory variables, the standard approach is to use these variables as the criteria for grouping. For instance, if you begin by calculating the average for all individuals in county \(c\) at time \(t\), \((c=1, \cdots, J; \; t= 1, \cdots, T)\), incorporating gender would involve computing averages separately for males and females in county \(c\) at time \(t\). Introducing marital status would further segment the data: you'd calculate averages for married and unmarried males, as well as married and unmarried females in county \(c\) at time \(t\). It's important to note that each additional subdivision results in a larger sample size, thus increasing the number of observations. An empirical example can be found in Ha et al. (2017), where their results table is presented below:

    Click image for larger version

Name:	Capture2.PNG
Views:	1
Size:	82.9 KB
ID:	1737842







    References:

    Devereux, P. J. (2007). Improved Errors-in-Variables Estimators for Grouped Data. Journal of Business & Economic Statistics, 25(3), 278–287.
    Ha, H., Han, C., & Kim, B. (2017). Can obesity cause depression? A pseudo-panel analysis. Journal of Preventive Medicine and Public Health, 50(4), 262.
    Last edited by Andrew Musau; 20 Dec 2023, 21:45.

    Comment


    • #3
      Thank you Andrew. I am even reconsidering the pseudo analysis since the data I have might not yield precise estimates. I have data on six years; 1988,1993,1998,2003,2008 and 2014 across ten regions. Creation of cohorts might not be ideal since the number of indiviudals in each chort is likely to be less than 100 and that can yield imprecise estimators according to Guillerm(2015) and Verbeek and Nijman(1992,1993).

      Comment


      • #4
        Also, since each region-year will serve as an observation in the pseudo-panel, 10 regions are insufficient. While you could have attempted a lower level of aggregation, as you've mentioned, the sample size already poses challenges when aggregating at the regional level. It's probably best to proceed with a pooled analysis with region and wave dummies.
        Last edited by Andrew Musau; 21 Dec 2023, 10:59.

        Comment

        Working...
        X