Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing between Panel Data Models

    Hello
    I have three variables:
    Pointyear = Sum of publication points in a specific demartment of my university. (DEPENDENT VARIABLE)
    Pointpers = sum of publication points per year of a specific individual (INDEPENDENT VARIABLE )
    Iidyear = ln (number of researchers per year)
    I want to find whether the share of (within) variation in total point per year is due to (within) variation in points per person or due to (within) variation of the number of researchers per year.

    Does the selection of variabels for the purpose mentioned above look ok? I ask this because the Hausman test supports random effects model but the Breusch Pagan test supports pooled Ols model. I dont know what is wrong here
    Can anybody help me here?


    Thanks

  • #2
    Here is an Example of my data :

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID int YEAR float(lPointyear Pointpers lIDyear)
     1 2008 4.0325375 .23333333 3.3322046
     1 2010 4.3550897       .35 3.6635616
     1 2014  4.436357  .3333333    3.7612
     2 2005  2.924952        .5  2.484907
     2 2011 4.3062696       .35  3.610918
     3 2010 4.3550897         1 3.6635616
     3 2011 4.3062696         8  3.610918
     4 2005  2.924952       2.5  2.484907
     4 2006   3.60934       3.4  2.995732
     4 2007  3.831897         1  3.178054
     4 2008 4.0325375         1 3.3322046
     4 2009 4.2785854         6  3.583519
     4 2010 4.3550897         7 3.6635616
     4 2011 4.3062696         1  3.610918
     4 2012 4.4252367       2.1   3.73767
     4 2013 4.5299973         1 3.8066626
     4 2015   4.20762         1  3.713572
     5 2005  2.924952         2  2.484907
     5 2006   3.60934       .35  2.995732
     5 2007  3.831897 1.2666667  3.178054
     5 2008 4.0325375       .75 3.3322046
     5 2009 4.2785854       1.5  3.583519
     5 2010 4.3550897  3.791667 3.6635616
     5 2011 4.3062696  2.416667  3.610918
     5 2012 4.4252367      3.25   3.73767
     5 2013 4.5299973 3.9916666 3.8066626
     5 2015   4.20762       1.6  3.713572
     6 2014  4.436357      .625    3.7612
     6 2015   4.20762  .8214286  3.713572
     7 2011 4.3062696        .5  3.610918
     7 2012 4.4252367       .25   3.73767
     8 2010 4.3550897      .175 3.6635616
     9 2011 4.3062696 .16666667  3.610918
    10 2005  2.924952        .5  2.484907
    10 2006   3.60934         2  2.995732
    10 2007  3.831897       2.5  3.178054
    10 2008 4.0325375       7.5 3.3322046
    10 2009 4.2785854         2  3.583519
    10 2010 4.3550897      1.05 3.6635616
    10 2011 4.3062696         9  3.610918
    10 2012 4.4252367         4   3.73767
    10 2013 4.5299973         2 3.8066626
    10 2014  4.436357        .5    3.7612
    10 2015   4.20762         2  3.713572
    11 2005  2.924952         3  2.484907
    11 2009 4.2785854       5.7  3.583519
    11 2012 4.4252367      2.45   3.73767
    11 2013 4.5299973         4 3.8066626
    11 2014  4.436357       .35    3.7612
    12 2006   3.60934       1.4  2.995732
    end

    Comment


    • #3
      .

      Comment


      • #4
        Ludmilla:
        even though -hausman- outocome favours -re- specification, -theta- in -re- specification is always=0.
        Besides, -xtreg,fe- outcome tells you that you cannot rule out that jointly individual effect differs from zero at 5% critical value.
        Both these pieces of information points you towards pooled OLS (with standard errors clustered on -panelid-).
        I suppose that the -lIDyear- predictor explains most part of -depvar- variation (hence, there's a liitle to -regress- about)
        For more details, please take a look at:
        Code:
        . xtreg lPointyear Pointpers lIDyear, fe
        
        Fixed-effects (within) regression               Number of obs     =         50
        Group variable: ID                              Number of groups  =         12
        
        R-sq:                                           Obs per group:
             within  = 0.9812                                         min =          1
             between = 0.9824                                         avg =        4.2
             overall = 0.9809                                         max =         11
        
                                                        F(2,36)           =     937.20
        corr(u_i, Xb)  = -0.0753                        Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
          lPointyear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
           Pointpers |   .0059839   .0053075     1.13   0.267    -.0047803    .0167481
             lIDyear |   1.155069   .0269564    42.85   0.000     1.100399    1.209739
               _cons |   .0957277   .0929864     1.03   0.310    -.0928573    .2843128
        -------------+----------------------------------------------------------------
             sigma_u |  .03677125
             sigma_e |  .07005047
                 rho |  .21602224   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(11, 36) = 0.51                      Prob > F = 0.8821
        
        . estimates store fe
        
        . xtreg lPointyear Pointpers lIDyear, re theta
        
        Random-effects GLS regression                   Number of obs     =         50
        Group variable: ID                              Number of groups  =         12
        
        R-sq:                                           Obs per group:
             within  = 0.9811                                         min =          1
             between = 0.9826                                         avg =        4.2
             overall = 0.9809                                         max =         11
        
                                                        Wald chi2(2)      =    2410.59
        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
        
        ------------------- theta --------------------
          min      5%       median        95%      max
        0.0000   0.0000     0.0000     0.0000   0.0000
        
        ------------------------------------------------------------------------------
          lPointyear |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
           Pointpers |    .005108   .0042961     1.19   0.234    -.0033123    .0135282
             lIDyear |   1.151098   .0235733    48.83   0.000     1.104896    1.197301
               _cons |   .1113703   .0817369     1.36   0.173     -.048831    .2715715
        -------------+----------------------------------------------------------------
             sigma_u |          0
             sigma_e |  .07005047
                 rho |          0   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . estimates store re
        
        . hausman fe re
        
                         ---- Coefficients ----
                     |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                     |       fe           re         Difference          S.E.
        -------------+----------------------------------------------------------------
           Pointpers |    .0059839      .005108        .0008759        .0031166
             lIDyear |    1.155069     1.151098        .0039706        .0130748
        ------------------------------------------------------------------------------
                                   b = consistent under Ho and Ha; obtained from xtreg
                    B = inconsistent under Ha, efficient under Ho; obtained from xtreg
        
            Test:  Ho:  difference in coefficients not systematic
        
                          chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                  =        0.22
                        Prob>chi2 =      0.8971
        
        
        
        
        . reg lPointyear Pointpers lIDyear, vce(cluster ID)
        
        Linear regression                               Number of obs     =         50
                                                        F(2, 11)          =    3605.56
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.9809
                                                        Root MSE          =     .06593
        
                                            (Std. Err. adjusted for 12 clusters in ID)
        ------------------------------------------------------------------------------
                     |               Robust
          lPointyear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
           Pointpers |    .005108   .0022206     2.30   0.042     .0002204    .0099956
             lIDyear |   1.151098   .0150869    76.30   0.000     1.117892    1.184305
               _cons |   .1113703   .0494209     2.25   0.046     .0025957    .2201448
        ------------------------------------------------------------------------------
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks Carlo

          Two things I still do not understand:

          1: If the Pooled OLS is the correct model, does this mean that I cannot use ( between) and/or (Withi n) estimators for dependent and independent variables?
          I am interested especially in within estimators to check whether the variations in dependent variable is due to within variations of independent variables.

          2: My panelID is individual ID, but I have two types of regressors: one of them varies across (and within) indviduals, in my case "Pointpers", the other varies across and within years . My dependent variable also is a "year" based variable, is it ok to use variables that not all of them vary across individuals in the same regression ?

          Comment


          • #6
            Thanks Carlo

            Two things I still do not understand:

            1: If the Pooled OLS is the correct model, does this mean that I cannot use ( between) and/or (Withi n) estimators for dependent and independent variables?
            I am interested especially in within estimators to check whether the variations in dependent variable is due to within variations of independent variables.

            2: My panelID is individual ID, but I have two types of regressors: one of them varies across (and within) indviduals, in my case "Pointpers", the other varies across and within years . My dependent variable also is a "year" based variable, is it ok to use variables that not all of them vary across individuals in the same regression ?

            Comment


            • #7
              Ludmilla:
              you can probably plug a categorical variable among the pooled OLS predictors for, say, year and see what happens.
              But, set aside the "right" regression model for a while, I woud think that the main issue with your model is that the -lIDyear- predictor explains most part of -depvar- variation.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                How is that an issue ?
                I kind of want that to explain most part of -depvar- .

                Comment


                • #9
                  Ludmilla:
                  not a negative issue, really.
                  I meant: in the real world, it's unusual that one predictor only can explain most part of the variation in the dependent variable (R-sq are really high).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment

                  Working...
                  X