Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Poisson regression

    Hello all,

    I'm seeking advice on analyzing my data using (Poisson) regression methods but not very sure of the correct approach. It is a unbalanced panel (2941 observations), but sparse data. I also understand it as multilevel (with multiple membership). It spans 15 years. While the data is yearly and about private sector companies, I will simplify it as consisting of students and groups.

    Each observation is about a student participating (or not) in zero or multiple group (projects). Participation is voluntary. A student can be a member of zero or more groups in any year. There are about 200 groups active over the years, with some groups forming at different points in time during the 15-year span.

    Dependent variable: Number of groups a student chooses to participate in year ‘t’
    Independent variable: We construct Student Performance Assessment (SPA) using the Group Performance Assessment (textual comments) received by an entire group in the previous year (t-1). The SPA for a student is computed by aggregating the GPA of all groups he/she is a member of in year t-1, and then categorizing them as negative, positive (and also creating other sentiment categories by hand-coding the textual content). This way we created 4 SPA categories as predictors.
    Model: We want to examine the impact of SPA categories (in year t-1) on the number of groups a student chooses to be a member of in year t.

    A few more peculiarities of our dataset:
    • The panel is unbalanced due to varying student entry times, with most observations concentrated in the first 8-9 years of the data.
    • Many zeros in my independent variables because those students did not receive any feedback for their group project(s).
    • The correlations are low and within acceptable ranges.
    Analysis: Since we have a count variable as DV we are using Poisson regression but also OLS for consistency check as follows:
    1. Poisson Pseudo-Likelihood High Dimensional Fixed Effects (PPMLHDFE command). We are using:
    • the ‘absorb’ option for student ID and year and
    • clustered errors using the vce option for student ID.
    This model showed all four SPA predictors as significant and positive.

    2. For consistency check, we also ran an OLS (xtreg with the FE option for student ID, added i.year to control for year effects, and clustered errors by student ID).

    However, this OLS model resulted in only one SPA predictor being significant and positive.

    We would really appreciate advice on whether we have run the models correctly. Here is the code:

    PPMLHDFE code: ppmlhdfe DV l.Control_1 l.Control_2 Control_3 l.Apercent l.Bpercent l.Cpercent l.Dpercent, absorb(student_id year) vce(cluster student_id)
    OLS code: xtreg DV l.Control_1 l.Control_2 Control_3 i.year l.Apercent l.Bpercent l.Cpercent l.Dpercent, fe vce(cluster student_id)


    Why are we seeing this inconsistency in the results? We are particularly concerned about the multilevel nature of the data, where GPA is provided to groups and then aggregated for individual students depending on their group memberships. Could this be affecting the estimation?



  • #2
    Dear nilesh saraf,

    First of all, note that for linear regression with multiple fixed effects, you can use the user-written command reghdfe.

    The two models you are estimating are very different, and I am not surprised that they lead to different results (so, in my view, running both models does not provide a "consistency check" because I do not expect the two models to provide similar answers).

    First of all, Poisson regression estimates a multiplicative model in which the effect of each regressor depends on the levels of the other ones. In contrast, OLS is a linear model in which each regressor has a constant effect on the outcome.

    Second, the Poisson regression disregards units for which the outcome is always zero because these units are not informative about the slope parameters; OLS considers all observations. If you want to see the difference this makes, you can run the OLS regression on the sub-sample used by the Poisson regression: after ppmlhdfe, run the OLS regression with "if e(sample)==1".

    Both models are likely to be misspecified, but the exponential model estimated by Poisson regression is likely to provide a better approximation to the correct functional form when you have non-negative data, and therefore that would be my preferred approach. For more on this, see this paper.

    Best wishes,

    Joao

    Comment

    Working...
    X