Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What model best suitable for data with dependent variable being a proportion with values from 0 to 1, both inclusive?

    Dear All,

    I have two questions on what models and Stata commands to use regarding the following two cases:


    1. A crosssectional data and my response variable is proportion with values from 0 to 1, both inclusive?

    2. A panel data and my response variable is proportion with values from 0 to 1, both inclusive?


    Please note that the proportions come from the ratio cultivated acre of land divided by total acre of land available to a farmer.


    My initial thought was to use beta regression, but then I realize that both 0 and 1 are included. In my readings from Statalist, I came across the fracreg and fracglm commands.


    With regard to (1), the cross-sectional data, I would like to confirm betareg is not suitable and also which of fracreg and fracglm is more suitable.


    With regards to (2), the panel data, I have not seen any command for it. Is there any that I can be referred to?



    Thank you.



    Cobby.

  • #2
    Originally posted by Cobby Stoneson View Post
    With regards to (2), the panel data, I have not seen any command for it. Is there any that I can be referred to?
    Have you considered something like
    Code:
    xtgee cultivated_acres <other predictors>, offset(total_acres_available) ///
        i(farmer_id) t(<maybe>) ///
            family(<whatever>) link(<whatever>) ///
                corr(<whatever>)
    with choice of distribution family, link function and working correlation structure chosen on the basis of either subject matter knowledge (i.e., precedent) or examination of the response variable's distribution?

    Maybe start with Gaussian distribution, identity link and exchangeable correlation structure, and go from there. (The left side is zero-bounded, and so other candidate distribution families could be entertained, as well.)

    Comment


    • #3
      Thank you Joseph Coveney for your response. I do not understand why a population averaged model should be used though. I will read a bit more on the xtgee command to see if I understand why you suggested it.

      Comment


      • #4
        Cobby: I'm providing a link to Leslie Papke's website, who was my coauthor on fractional response methods for both cross section and panel data. There's a relatively new Stata command, fracreg, that can be used for cross section or panel data; in the latter case, you need to cluster your standard errors. But I usually use glm and xtgee. If you want to use fractional logit:

        Code:
        glm y x1 ... xK, fam(bin) link(logit) vce(robust)
        xtset id year
        glm y x1 ... xK i.year, fam(bin) link(logit) vce(cluster id)
        xtgee y x1 ... xK, fam(bin) link(logit) corr(uns) vce(robust)
        In the panel data case, the correlated random effects approach requires you to compute the time averages of the time-varying covariates and add them. This is supposed to mimic fixed effects.

        If you want to use a fully parametric approach, you can try two-limit Tobit models, but in my experience they offer little over fractional logit or probit.

        JW

        Papke

        Comment


        • #5
          Originally posted by Cobby Stoneson View Post
          I do not understand why a population averaged model should be used though.
          It doesn't have to be used, for example, take a look at the subject-specific estimation command meglm as an alternative.
          Code:
          help meglm
          There are some advantages to the population average GEE if you're looking at longitudinal data, however, and I think that there is a richer set of distribution-family / link-function combinations in Stata's implementation of GEE than what you will find available with meglm.

          My point was that you can look at the [0, 1] fractional proportion problem in other ways, expanding the modeling choices.

          Comment


          • #6
            Thank you very much Jeff Wooldridge and Joseph Coveney for the clarifications. I will give it a try.

            Comment


            • #7
              Hello
              Jeff Wooldridge, Joseph Coveney and everyone, I have few more questions relative to my question in #1 and the suggestion in #4 above.

              1. First, I have run my panel model using the glm and the CRE approach. It took strangely about 8 hours to run. I use Stata 15 on a 32GB RAM laptop.

              Though I got some reasonable estimates, I got the warning that
              Code:
              Warning: convergence not achieved
              What does this mean for my estimates? I read from the help file that the binomial family models sometimes have convergence difficulties and mu(varname) specifies varname as the initial estimate for the mean of depvar can be useful. How do I specify this in my situation?

              2. I also want to add an important predictor variable to the model: irrigwater which is the amount of irrigation water used by a farmer in the growing season. I am thinking this variable is endogenous since the dependent variable in my model is the fraction
              cultivated_acres/total_acres_available.
              The reasoning is that the higher the acres cultivated the more amount of irrigation water that will be used. First is my reasoning correct? If correct, I would like to find out if there is a way either the xtgee or the glm can handle a model with an endogenous variable. My reading of the help files for both does not say anything about endogenous predictors.

              I have seen some people mention fracivp and cmp as being capable of handling endogenous predictors. Please what would you suggest in my particular circumstance?

              Thanks.

              Cobby.

              Comment


              • #8
                Cobby: As stated in the FAQ, it is much more likely you'll get a useful answer if you show your Stata commands and the Stata output. GLM should not take very long at all to run even with a lot of data. And fractional logit and probit and very smooth and concave estimation problems so convergence should have been achieved. I'm afraid I can't say more without seeing output.

                JW

                Comment


                • #9
                  Jeff Wooldridge, please below are my Stata command s and the output

                  Code:
                  glm propacre height irrigwater rain dist_rain L_cropA L_cropB L_cropC L_cropD L_cropE L_cropF ///
                   L_acre year1 - year5 well2 - well120 flow depth non_irrig dist heightbar irrigwaterbar rainbar ///
                   dist_rainbar L_cropAbar L_cropBbar L_cropCbar L_cropDbar L_cropEbar L_cropFbar L_acrebar, ///
                   fam(bin) link(logit) vce(cluster ind)
                  The output is as below:
                  Code:
                  Iteration 15997: log pseudolikelihood = -2815.3657  (backed up)
                  Iteration 15998: log pseudolikelihood = -2815.3657  (backed up)
                  Iteration 15999: log pseudolikelihood = -2815.3657  (backed up)
                  Iteration 16000: log pseudolikelihood = -2815.3657  (backed up)
                  convergence not achieved
                  
                  Generalized linear models                         No. of obs      =     10,520
                  Optimization     : ML                             Residual df     =     10,376
                                                                    Scale parameter =          1
                  Deviance         =  2490.802689                   (1/df) Deviance =   .2400542
                  Pearson          =  10553157.05                   (1/df) Pearson  =   1017.074
                  
                  Variance function: V(u) = u*(1-u/1)               [Binomial]
                  Link function    : g(u) = ln(u/(1-u))             [Logit]
                  
                                                                    AIC             =   .5626171
                  Log pseudolikelihood = -2815.365711               BIC             =  -93601.68
                  
                                                     (Std. Err. adjusted for 2,104 clusters in unit)
                  ----------------------------------------------------------------------------------
                                   |               Robust
                          propacre |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -----------------+----------------------------------------------------------------
                      height |   .0014783   .0011986     1.23   0.217     -.000871    .0038276
                   irrigwater |  -2.83e-07   9.09e-06    -0.03   0.975    -.0000181    .0000175
                            rain|    .000014   3.20e-06     4.38   0.000     7.75e-06    .0000203
                     dist_rain|  -.0000111   3.08e-06    -3.60   0.000    -.0000171   -5.04e-06
                       L_cropA |  -.0185442   .0025841    -7.18   0.000    -.0236089   -.0134795
                       L_cropB |  -.0235062   .0020068   -11.71   0.000    -.0274394    -.019573
                       L_cropC |  -.0238975   .0023497   -10.17   0.000    -.0285027   -.0192922
                       L_cropD |  -.0217629    .002253    -9.66   0.000    -.0261788   -.0173471
                       L_cropE |  -.0210248   .0021394    -9.83   0.000     -.025218   -.0168317
                       L_cropF |  -.0210018   .0021784    -9.64   0.000    -.0252714   -.0167323
                       L_acre |    .004219    .000839     5.03   0.000     .0025747    .0058633
                            y1 |   1.518006   .0895908    16.94   0.000     1.342411      1.6936
                            y2 |   .9526341   .0800289    11.90   0.000     .7957803    1.109488
                            y3 |   .4961963   .0550863     9.01   0.000     .3882292    .6041633
                            y4 |    .167186   .0411008     4.07   0.000     .0866299     .247742
                           y5 |          0  (omitted)
                          well2 |          0  (omitted)
                          well3 |   .1587566   .4048352     0.39   0.695    -.6347058     .952219
                          well4 |   -.840581   .4929497    -1.71   0.088    -1.806745    .1255827
                          well5 |  -.1256395   .6213401    -0.20   0.840    -1.343444    1.092165
                          well6 |  -.9413922    .365579    -2.58   0.010    -1.657914   -.2248705
                          well7 |  -.1524204   .4067942    -0.37   0.708    -.9497225    .6448816
                          well8 |   2.366458   1.146192     2.06   0.039     .1199631    4.612952
                          well9 |   .7833357   .3232377     2.42   0.015     .1498015     1.41687
                         well10 |  -1.439659   .2908365    -4.95   0.000    -2.009688   -.8696298
                         well11 |  -.5160404   .4481278    -1.15   0.250    -1.394355     .362274
                         well12 |   .1428277   .4674681     0.31   0.760     -.773393    1.059048
                         well13 |  -.5633792   .3490018    -1.61   0.106     -1.24741    .1206517
                         well14 |   .9383278   .5184029     1.81   0.070    -.0777231    1.954379
                         well15 |  -1.943116   .7555717    -2.57   0.010     -3.42401   -.4622232
                         well16 |   2.120409   1.650985     1.28   0.199    -1.115462     5.35628
                         well17 |  -.2581691   .3857028    -0.67   0.503    -1.014133    .4977944
                         well18 |   9.681786   .8383586    11.55   0.000     8.038633    11.32494
                         well19 |  -.6226852   .6666516    -0.93   0.350    -1.929298    .6839281
                         well20 |  -.7725589   1.372557    -0.56   0.574    -3.462721    1.917603
                         well21 |  -4.448445   1.322417    -3.36   0.001    -7.040334   -1.856556
                         well22 |   .2251493   .4817887     0.47   0.640    -.7191391    1.169438
                         well23 |  -1.214884   .5905349    -2.06   0.040    -2.372312   -.0574571
                         well24 |    10.7267   .6603218    16.24   0.000     9.432491     12.0209
                         well25 |   10.12026   .8433614    12.00   0.000     8.467305    11.77322
                         well26 |  -1.357864   .2659795    -5.11   0.000    -1.879174   -.8365535
                         well27 |   10.74573   .9417412    11.41   0.000     8.899953    12.59151
                         well28 |   -.847239   .3192901    -2.65   0.008    -1.473036    -.221442
                         well29 |  -.5304181   .3223245    -1.65   0.100    -1.162163    .1013264
                         well30 |   -1.06397   .4283934    -2.48   0.013    -1.903605    -.224334
                         well31 |   9.982597   .9279339    10.76   0.000      8.16388    11.80131
                         well32 |  -.2336443   .5237135    -0.45   0.656    -1.260104    .7928154
                         well33 |  -1.497088   .6202264    -2.41   0.016    -2.712709   -.2814661
                         well34 |   2.453743   .3686779     6.66   0.000     1.731147    3.176338
                         well35 |   .6601056   .6490485     1.02   0.309    -.6120061    1.932217
                         well36 |  -.1846427   .6608929    -0.28   0.780    -1.479969    1.110684
                         well37 |  -2.252678   .5273467    -4.27   0.000    -3.286259   -1.219098
                         well38 |   10.60292    .836918    12.67   0.000      8.96259    12.24325
                         well39 |   .3808678   .7596696     0.50   0.616    -1.108057    1.869793
                         well40 |  -.7596918   .9594287    -0.79   0.428    -2.640138    1.120754
                         well41 |  -.9970682   .3104718    -3.21   0.001    -1.605582   -.3885547
                         well42 |  -.5174205   .4052281    -1.28   0.202    -1.311653    .2768119
                         well43 |  -.1311243   .2744623    -0.48   0.633    -.6690605    .4068119
                         well44 |  -.0554436   .4684164    -0.12   0.906    -.9735229    .8626357
                         well45 |  -1.394826   .6476645    -2.15   0.031    -2.664225   -.1254267
                         well46 |  -1.196629   .3850075    -3.11   0.002     -1.95123   -.4420286
                         well47 |  -.4959528    .274609    -1.81   0.071    -1.034177    .0422709
                         well48 |   -.415412    .403892    -1.03   0.304    -1.207026    .3762017
                         well49 |   10.86287   .8484043    12.80   0.000     9.200023    12.52571
                         well50 |   9.982092   .9320126    10.71   0.000     8.155381     11.8088
                         well51 |   -.294841   .3681439    -0.80   0.423     -1.01639    .4267078
                         well52 |  -1.616135   .3214889    -5.03   0.000    -2.246242   -.9860286
                         well53 |  -1.634085   .3001578    -5.44   0.000    -2.222384   -1.045787
                         well54 |  -5.761073   1.348861    -4.27   0.000    -8.404793   -3.117353
                         well55 |   .4509854   .3018928     1.49   0.135    -.1407136    1.042684
                         well56 |   10.63153   .8859203    12.00   0.000     8.895158     12.3679
                         well57 |    .529351   .5499958     0.96   0.336     -.548621    1.607323
                         well58 |  -.2315854   .4480099    -0.52   0.605    -1.109669    .6464978
                         well59 |  -1.058468   .3200695    -3.31   0.001    -1.685792   -.4311429
                         well60 |  -.6915931   .3577143    -1.93   0.053      -1.3927     .009514
                         well61 |  -.2697233   .6536955    -0.41   0.680    -1.550943    1.011496
                         well62 |   4.722433   2.915365     1.62   0.105    -.9915766    10.43644
                         well63 |   1.475113   .5220074     2.83   0.005     .4519974    2.498229
                        well64 |   .3993802   1.190248     0.34   0.737    -1.933464    2.732224
                         well65 |  -1.031068    .356752    -2.89   0.004    -1.730289   -.3318467
                         well66 |   1.179816   .3525384     3.35   0.001     .4888535    1.870779
                         well67 |   .7100473   .3233407     2.20   0.028     .0763111    1.343783
                         well68 |   .3147742    .581148     0.54   0.588     -.824255    1.453803
                         well69 |   9.668157    .956256    10.11   0.000      7.79393    11.54238
                         well70 |   .2362967   .5068969     0.47   0.641     -.757203    1.229796
                         well71 |  -.1049205   .3812576    -0.28   0.783    -.8521718    .6423307
                         well72 |   .9210159   .5053931     1.82   0.068    -.0695364    1.911568
                         well73 |  -.7437257   .3373147    -2.20   0.027     -1.40485   -.0826011
                         well74 |  -1.263464   .3032526    -4.17   0.000    -1.857828   -.6690995
                         well75 |  -.6923265    .499085    -1.39   0.165    -1.670515    .2858621
                         well76 |     8.1471   .8672196     9.39   0.000     6.447381    9.846819
                         well77 |  -2.708121   1.237143    -2.19   0.029    -5.132876   -.2833654
                         well78 |   9.899902   .8534103    11.60   0.000     8.227249    11.57256
                         well79 |  -.4032227   .4341065    -0.93   0.353    -1.254056    .4476105
                         well80 |   .8340149   .7073147     1.18   0.238    -.5522963    2.220326
                         well81 |   -1.18597   .5763343    -2.06   0.040    -2.315565    -.056376
                         well82 |  -.4256317   .3935234    -1.08   0.279    -1.196923    .3456601
                         well83 |  -2.567376   .3636436    -7.06   0.000    -3.280104   -1.854647
                         well84 |          0  (omitted)
                         well85 |  -1.835656    .343175    -5.35   0.000    -2.508267   -1.163045
                         well86 |  -.1730572   .5783679    -0.30   0.765    -1.306637    .9605231
                         well87 |  -.5666832   .5914742    -0.96   0.338    -1.725951    .5925849
                         well88 |    10.6395   .9332012    11.40   0.000     8.810461    12.46854
                         well89 |   .0114854   .4837329     0.02   0.981    -.9366138    .9595845
                         well90 |  -1.122747   .5066094    -2.22   0.027    -2.115683   -.1298108
                         well91 |  -.4495044   .4952356    -0.91   0.364    -1.420148    .5211395
                         well92 |  -6.693612   1.412398    -4.74   0.000    -9.461861   -3.925363
                         well93 |  -.3716779   .5068747    -0.73   0.463    -1.365134    .6217783
                         well94 |  -.6793818   .4598074    -1.48   0.140    -1.580588     .221824
                         well95 |   .3915141   .4860451     0.81   0.421    -.5611167    1.344145
                         well96 |  -.4413959   .3285901    -1.34   0.179    -1.085421    .2026288
                         well97 |   .0512539   .6222265     0.08   0.934    -1.168288    1.270795
                         well98 |   1.219779   .3519419     3.47   0.001     .5299854    1.909572
                         well99 |  -1.583495   .4839093    -3.27   0.001     -2.53194   -.6350506
                        well100 |  -.7445649   .3983597    -1.87   0.062    -1.525336    .0362059
                        well101 |  -.4763745   .4233858    -1.13   0.261    -1.306195    .3534464
                        well102 |  -1.120368   .4376532    -2.56   0.010    -1.978153   -.2625837
                        well103 |  -.9478207   .3417004    -2.77   0.006    -1.617541   -.2781003
                        well104 |  -1.716621   .6332147    -2.71   0.007    -2.957699   -.4755431
                        well105 |  -.0395526   .3649816    -0.11   0.914    -.7549034    .6757982
                        well106 |    .531655   .3649923     1.46   0.145    -.1837169    1.247027
                        well107 |   10.04995   .8537285    11.77   0.000     8.376675    11.72323
                        well108 |          0  (omitted)
                        well109 |   .4372897   .2968051     1.47   0.141    -.1444377    1.019017
                        well110 |  -.2854296   .4963601    -0.58   0.565    -1.258278    .6874183
                        well111 |   10.26064   .8466135    12.12   0.000     8.601306    11.91997
                        well112 |  -1.725716   .3477101    -4.96   0.000    -2.407215   -1.044216
                        well113 |  -.1874231   .3583547    -0.52   0.601    -.8897854    .5149393
                       well114 |   .1067635    .459912     0.23   0.816    -.7946476    1.008174
                        well115 |   .1828337   .6124381     0.30   0.765    -1.017523     1.38319
                        well116 |  -.7089789   .2764445    -2.56   0.010      -1.2508   -.1671575
                        well117 |  -2.434948   .4066878    -5.99   0.000    -3.232041   -1.637854
                        well118 |   10.51865   .8536635    12.32   0.000     8.845498     12.1918
                        well119 |   1.354932   .3853227     3.52   0.000     .5997132    2.110151
                        well120 |  -3.751669   .8577699    -4.37   0.000    -5.432867   -2.070471
                         flow |   .0076173   .0047233     1.61   0.107    -.0016402    .0168749
                        depth |  -.0001434   .0001146    -1.25   0.211     -.000368    .0000813
                     non_irrig |   2.296887   1.130283     2.03   0.042      .081573    4.512201
                         dist |   .0063921   .1322039     0.05   0.961    -.2527228     .265507
                     heightbar |  -.0009361   .0011407    -0.82   0.412    -.0031717    .0012996
                  irrigwaterbar |  -.0003223   .0000601    -5.36   0.000    -.0004402   -.0002045
                       rainbar |  -.0000213   3.97e-06    -5.35   0.000    -.0000291   -.0000135
                  dist_rainbar |   .0000205   4.73e-06     4.33   0.000     .0000112    .0000298
                    L_cropAbar |   .0329176   .0042349     7.77   0.000     .0246174    .0412178
                    L_cropBbar |   .0379258   .0038225     9.92   0.000     .0304338    .0454178
                    L_cropCbar |   .0315687   .0050965     6.19   0.000     .0215798    .0415575
                    L_cropDbar |   .0304284   .0045177     6.74   0.000     .0215739    .0392829
                    L_cropEbar |   .0300944   .0042948     7.01   0.000     .0216767    .0385121
                    L_cropFbar |         0  (omitted)
                    L_acrebar |     .00026   .0026289     0.10   0.921    -.0048925    .0054125
                             _cons |   .960434   .3856369     2.49   0.013     .2045994    1.716268
                  ----------------------------------------------------------------------------------
                  Warning: convergence not achieved
                  
                  .
                  end of do-file
                  
                  .

                  Comment


                  • #10
                    Dear Statalisters, Can someone please look into my issues raised in #7?

                    I'd be grateful, please.

                    Cobby.

                    Comment


                    • #11
                      I see now you’re including lots of well dummies, which can certainly be the source of computational problems, especially if the dummies perfectly predict the outcome in some cases. The unit of observation is not the well, right? What is it?

                      You can try fracteg to see if that works better.

                      Also, you might want to take logs of large, positive explanatory variables to make coefficients have a more natural interpretation.

                      Comment


                      • #12
                        The unit of observation is not the well. It is acres cultivated by farmers. I took logs of the explanatory variables and the glm run within seconds. Thank you very much
                        Jeff Wooldridge

                        Comment


                        • #13


                          In the panel data case, the correlated random effects approach requires you to compute the time averages of the time-varying covariates and add them. This is supposed to mimic fixed effects.

                          Dear Jeff Wooldridge , could you please elaborate on this in a bit more detail?

                          Comment


                          • #14
                            Here is the link to the Stata files. Hopefully it answers your question.

                            http://econ.msu.edu/faculty/papke/

                            JW

                            Comment


                            • #15
                              Originally posted by Jeff Wooldridge View Post
                              Cobby: I'm providing a link to Leslie Papke's website, who was my coauthor on fractional response methods for both cross section and panel data. There's a relatively new Stata command, fracreg, that can be used for cross section or panel data; in the latter case, you need to cluster your standard errors. But I usually use glm and xtgee. If you want to use fractional logit:

                              Code:
                              glm y x1 ... xK, fam(bin) link(logit) vce(robust)
                              xtset id year
                              glm y x1 ... xK i.year, fam(bin) link(logit) vce(cluster id)
                              xtgee y x1 ... xK, fam(bin) link(logit) corr(uns) vce(robust)
                              In the panel data case, the correlated random effects approach requires you to compute the time averages of the time-varying covariates and add them. This is supposed to mimic fixed effects.

                              If you want to use a fully parametric approach, you can try two-limit Tobit models, but in my experience they offer little over fractional logit or probit.

                              JW

                              Papke
                              Dear Professor Jeff Wooldridge,

                              I have a cross sectional data and my dependent variable is not bound between 0 and 1, it goes like 0.029 (smallest), 0.031, 0.033, ..., 0.221 (largest). Would it be appropriate to use -glm- command in #4 in my case?
                              Code:
                              glm y x1 ... xK, fam(bin) link(logit) vce(robust)
                              Thank you.

                              Comment

                              Working...
                              X