Panel data - Regression Analysis across indivduals

Mary Burckhette

Join Date: May 2023

Posts: 35
#1

Panel data - Regression Analysis across indivduals

05 Jun 2023, 08:18

Dear Stata Forum,

I have data on 200 people and their participation in a game over 4 rounds. The game consists of participants allocating a specific budget into three categories (wheat, lavender flowers, oranges). Next, changes to the allocation are presented (e.g., a drought negatively affects wheat and oranges but positively impacts lavender flowers). The participant can then adjust the allocation (for example return to the original). After this, this round is complete, and a new round begins. The rounds are independent of each other. In essence, one participant (as measured by one unique ID) completes four rounds, so I have approximately 800 observations.

My dependent variable is categorical (say 1 if they return to the original choice, 0 if not). Later, I plan to improve my analysis to more scenarios: e.g., 2 if it’s in between, 3 if it’s larger, etc.
The independent variables are round -categorical, risk (low to very high risk of severe weather impact) - categorical, successful past harvests (in %) - continuous, percentage points differences between the original and changed allocation- continuous, gender- categorical, assignment to TC/CG – categorical.

Previously, I only analyzed changes per category through Chi2 tests of independence. Now, my adviser recommended I look at horizontal changes, how individuals perform across all 4 rounds.

Now my question arises on how to analyze the performance of IDs in all four rounds (horizontally) instead of just (vertically) through categories.

The Statistics Department at UCLA https://stats.oarc.ucla.edu/stata/wh...s-using-stata/ recommends logistic regression and Chi2 tests.

When performing logistic regression or Chi2 tests, does Stata analyze the observations across one individual or category? Can I force it to analyze across individuals?

I am most grateful for your comments, insights, and help.

Thank you in advance, and kind regards!
Tags: None
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#2

05 Jun 2023, 09:32

xtreg, be will only use variation across individuals and not over time within individuals. In the same spirit, you can create time-averages of your variables, and then run ologit; the only remaining identifying variation will be over time.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2160
#3

05 Jun 2023, 09:58

I assume you're interested in how the choice in periods 2, 3, and 4 reacts to the shock in that period. Assuming the shock has been randomized across participants in each time period, you can use probit or logit (binary case) or the ordered versions (ologit, as Maxence suggested). You just need to pool the data and the cluster standard errors at the student level. I am wondering how the first period will fit into this, but it seems like it won't appear in the pooled analysis across time because it is setting the stage as the initial allocation. If I understand, this is before a shock has occurred. Essentially, the initial allocation reflects prices, income, and taste.
Comment

Mary Burckhette

Join Date: May 2023
Posts: 35

06 Jun 2023, 04:46

Maxence Morlet and Jeff Wooldridge

Dear Maxence and dear Professor Wooldridge,

Thank you very much for your posts and helpful suggestions. I have tried using xtreg on my data. CASE is the ID (unique for every participant), reb_sa is the categorical dependent variable (0 if the participant does not return to the original allocation, 1 if so), T_C is the categorical variable for TG/CG (1 means TG, 0 is CG), round is the categorical variable for round (4 rounds in total), gender is the gender.

I'm not sure I have performed the analysis correctly.

My questions so far are:
1) Why is T_C omitted if I want to understand whether being in the treatment group has any effect on the behavior (i.e., if more people return to their original choice)
2) How to interpret the coefficients in "round" if I have categorical variables? Does it mean that in the second round participants are .1271704 more likely to return to their initial choice, or does it rather mean they stay away from their original choice (as .12 is closer to zero than to 1)?
3) Can I really trust the significance of the model if my coefficients (low p-values) and overall prob> F seem promising despite the low R^2?
4) Can I include all three categorical variables (wheat, lavender flowers, oranges) in the analysis and understand how the participants behave in all three categories simultaneously per round?

Code:

. xtreg reb_sa i.T_C i.round gender, fe
note: 1.T_C omitted because of collinearity.
note: gender omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =        368
Group variable: CASE                            Number of groups  =        167

R-squared:                                      Obs per group:
     Within  = 0.0438                                         min =          1
     Between = 0.0134                                         avg =        2.2
     Overall = 0.0104                                         max =          4

                                                F(3,198)          =       3.03
corr(u_i, Xb) = -0.0693                         Prob > F          =     0.0307

------------------------------------------------------------------------------
      reb_sa | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         T_C |
         TG  |          0  (omitted)
             |
       round |
          2  |   .1271704   .0728836     1.74   0.083    -.0165574    .2708982
          3  |   .1645303   .0717068     2.29   0.023     .0231232    .3059374
          4  |   .2075197   .0724106     2.87   0.005     .0647247    .3503148
             |
      gender |          0  (omitted)
       _cons |   .4708467   .0497941     9.46   0.000     .3726519    .5690415
-------------+----------------------------------------------------------------
     sigma_u |  .41574854
     sigma_e |  .41453171
         rho |  .50146556   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(166, 198) = 1.88                    Prob > F = 0.0000

.
end of do-file

After this, I decided to swap round with risk, as the first round is not necessarily linked to the lowest risk category:

Code:

. xtreg reb_sa i.T_C i.risk, fe
note: 1.T_C omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =        368
Group variable: CASE                            Number of groups  =        167

R-squared:                                      Obs per group:
     Within  = 0.0057                                         min =          1
     Between = 0.0006                                         avg =        2.2
     Overall = 0.0005                                         max =          4

                                                F(3,198)          =       0.38
corr(u_i, Xb) = -0.0431                         Prob > F          =     0.7697

------------------------------------------------------------------------------
      reb_sa | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         T_C |
         TG  |          0  (omitted)
             |
        risk |
          2  |   .0277373   .0583276     0.48   0.635    -.0872858    .1427604
          3  |   .0803536   .0767873     1.05   0.297    -.0710724    .2317795
          4  |    .045642   .0833952     0.55   0.585    -.1188147    .2100987
             |
       _cons |   .5667601    .040439    14.02   0.000     .4870136    .6465066
-------------+----------------------------------------------------------------
     sigma_u |  .40725065
     sigma_e |  .42272092
         rho |  .48136696   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(166, 198) = 1.79                    Prob > F = 0.0000

.
end of do-file

However, nothing is significant anymore..

May I kindly clarify that the only way to systematically analyze whether the participants return to their original allocation in each of the three categories in one round is through fixed effects?

Thank you very much in advance for all your help!

Kind regards!

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2160
#5

06 Jun 2023, 09:15

Mary: Is your treatment variable assigned once, in the first period, and then that's it? That would explain why it's dropping out using FE. With random assignment of the treatment, you should use RE or just regular OLS. In any case, cluster your standard errors by individual.
1 like
Comment

Mary Burckhette

Join Date: May 2023
Posts: 35

07 Jun 2023, 04:28

Jeff Wooldridge

Dear Professor Wooldridge, Thank you very much for your comments. Yes, either treatment or control (balanced throughout the sample) is assigned once at the beginning of the experiment (before round 1) and does not change throughout the experiment, meaning that one participant allocated to the treatment group stays in the treatment group throughout the experiment.

For the clustering on standard errors at the individual level (variable called "CASE"), I tried

Code:

 vce(clustvar cluster CASE)

and other variants, such as vce(clustvar CASE) or vce(cluster CASE) but all gave me the error message "varlist not allowed."

Hence, I resorted to (hoping for the same results)

Code:

. xtreg reb_sa T_C i.round, re cluster(CASE)

Random-effects GLS regression                   Number of obs     =        920
Group variable: CASE                            Number of groups  =        230

R-squared:                                      Obs per group:
     Within  = 0.0101                                         min =          4
     Between = 0.0051                                         avg =        4.0
     Overall = 0.0084                                         max =          4

                                                Wald chi2(4)      =       9.00
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0612

                                 (Std. err. adjusted for 230 clusters in CASE)
------------------------------------------------------------------------------
             |               Robust
      reb_sa | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         T_C |    .038783   .0358797     1.08   0.280    -.0315399    .1091059
             |
       round |
          2  |   .0956522   .0399977     2.39   0.017     .0172581    .1740463
          3  |   .0826087   .0408316     2.02   0.043     .0025801    .1626372
          4  |   .0347826   .0394761     0.88   0.378    -.0425891    .1121544
             |
       _cons |   .2498365   .0303459     8.23   0.000     .1903595    .3093134
-------------+----------------------------------------------------------------
     sigma_u |  .16361309
     sigma_e |  .43735278
         rho |   .1227684   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Does this mean that the treatment group is 3.9% more likely to return to their original choice than the control group? However, is this important at all as T_C seems statistically insignificant?

I also tried:

Code:

 xtreg reb_sa return_numb i.risk i.round, re cluster(CASE)

Random-effects GLS regression                   Number of obs     =        920
Group variable: CASE                            Number of groups  =        230

R-squared:                                      Obs per group:
     Within  = 0.1765                                         min =          4
     Between = 0.0009                                         avg =        4.0
     Overall = 0.1167                                         max =          4

                                                Wald chi2(7)      =     122.69
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                 (Std. err. adjusted for 230 clusters in CASE)
------------------------------------------------------------------------------
             |               Robust
      reb_sa | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
 return_numb |  -.0112241   .0050201    -2.24   0.025    -.0210633    -.001385
             |
        risk |
          2  |  -.0210847   .0399023    -0.53   0.597    -.0992918    .0571224
          3  |  -.2812996   .0383366    -7.34   0.000    -.3564379   -.2061613
          4  |  -.3432079   .0403586    -8.50   0.000    -.4223094   -.2641065
             |
       round |
          2  |   .0822424   .0362052     2.27   0.023     .0112815    .1532033
          3  |   .0875063   .0380291     2.30   0.021     .0129706    .1620419
          4  |   .0433181   .0362522     1.19   0.232    -.0277349    .1143711
             |
       _cons |   .4921656   .0454648    10.83   0.000     .4030561     .581275
-------------+----------------------------------------------------------------
     sigma_u |   .1867828
     sigma_e |  .40006416
         rho |  .17896772   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.
end of do-file

Thank you very much in advance for all your help and kind suggestions!

Kind regards,
Mary

Announcement