Hello all,
I'm seeking advice on analyzing my data using (Poisson) regression methods but not very sure of the correct approach. It is a unbalanced panel (2941 observations), but sparse data. I also understand it as multilevel (with multiple membership). It spans 15 years. While the data is yearly and about private sector companies, I will simplify it as consisting of students and groups.
Each observation is about a student participating (or not) in zero or multiple group (projects). Participation is voluntary. A student can be a member of zero or more groups in any year. There are about 200 groups active over the years, with some groups forming at different points in time during the 15-year span.
Dependent variable: Number of groups a student chooses to participate in year ‘t’
Independent variable: We construct Student Performance Assessment (SPA) using the Group Performance Assessment (textual comments) received by an entire group in the previous year (t-1). The SPA for a student is computed by aggregating the GPA of all groups he/she is a member of in year t-1, and then categorizing them as negative, positive (and also creating other sentiment categories by hand-coding the textual content). This way we created 4 SPA categories as predictors.
Model: We want to examine the impact of SPA categories (in year t-1) on the number of groups a student chooses to be a member of in year t.
A few more peculiarities of our dataset:
2. For consistency check, we also ran an OLS (xtreg with the FE option for student ID, added i.year to control for year effects, and clustered errors by student ID).
However, this OLS model resulted in only one SPA predictor being significant and positive.
We would really appreciate advice on whether we have run the models correctly. Here is the code:
PPMLHDFE code: ppmlhdfe DV l.Control_1 l.Control_2 Control_3 l.Apercent l.Bpercent l.Cpercent l.Dpercent, absorb(student_id year) vce(cluster student_id)
OLS code: xtreg DV l.Control_1 l.Control_2 Control_3 i.year l.Apercent l.Bpercent l.Cpercent l.Dpercent, fe vce(cluster student_id)
Why are we seeing this inconsistency in the results? We are particularly concerned about the multilevel nature of the data, where GPA is provided to groups and then aggregated for individual students depending on their group memberships. Could this be affecting the estimation?
I'm seeking advice on analyzing my data using (Poisson) regression methods but not very sure of the correct approach. It is a unbalanced panel (2941 observations), but sparse data. I also understand it as multilevel (with multiple membership). It spans 15 years. While the data is yearly and about private sector companies, I will simplify it as consisting of students and groups.
Each observation is about a student participating (or not) in zero or multiple group (projects). Participation is voluntary. A student can be a member of zero or more groups in any year. There are about 200 groups active over the years, with some groups forming at different points in time during the 15-year span.
Dependent variable: Number of groups a student chooses to participate in year ‘t’
Independent variable: We construct Student Performance Assessment (SPA) using the Group Performance Assessment (textual comments) received by an entire group in the previous year (t-1). The SPA for a student is computed by aggregating the GPA of all groups he/she is a member of in year t-1, and then categorizing them as negative, positive (and also creating other sentiment categories by hand-coding the textual content). This way we created 4 SPA categories as predictors.
Model: We want to examine the impact of SPA categories (in year t-1) on the number of groups a student chooses to be a member of in year t.
A few more peculiarities of our dataset:
- The panel is unbalanced due to varying student entry times, with most observations concentrated in the first 8-9 years of the data.
- Many zeros in my independent variables because those students did not receive any feedback for their group project(s).
- The correlations are low and within acceptable ranges.
- Poisson Pseudo-Likelihood High Dimensional Fixed Effects (PPMLHDFE command). We are using:
- the ‘absorb’ option for student ID and year and
- clustered errors using the vce option for student ID.
2. For consistency check, we also ran an OLS (xtreg with the FE option for student ID, added i.year to control for year effects, and clustered errors by student ID).
However, this OLS model resulted in only one SPA predictor being significant and positive.
We would really appreciate advice on whether we have run the models correctly. Here is the code:
PPMLHDFE code: ppmlhdfe DV l.Control_1 l.Control_2 Control_3 l.Apercent l.Bpercent l.Cpercent l.Dpercent, absorb(student_id year) vce(cluster student_id)
OLS code: xtreg DV l.Control_1 l.Control_2 Control_3 i.year l.Apercent l.Bpercent l.Cpercent l.Dpercent, fe vce(cluster student_id)
Why are we seeing this inconsistency in the results? We are particularly concerned about the multilevel nature of the data, where GPA is provided to groups and then aggregated for individual students depending on their group memberships. Could this be affecting the estimation?
Comment