Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding command "fracreg" (fractional logistic regression)

    Hello everyone,

    I would like to apply a fractional logistic regression using the command "fracreg" using Stata 14.2. However, I do not understand how the fracreg command works with dependent variables stored as fractions.

    My dependent variable is called "dvfrac" and I created it using the following command where "cnt_infavor" stands for the number of Y values==1 and "cnt_total" is a count of all Y values (zeros and ones) by an actor.

    Code:
    gen dvfrac = cnt_infavor / cnt_total
    The result looks like this (sorry for the print screen, could not run dataex):
    Click image for larger version

Name:	tab dvfrac.png
Views:	1
Size:	8.0 KB
ID:	1491981



    Does "fracreg" work only with the final value of a fraction (e.g. 2nd row .030303) ? Or is there any type of weighting in place by the frequencies or percentages?

    I am asking because from a theoretical point of view, it makes difference for my data if the fraction value 1 is based on 1 observation or is a result of 100 observations (e.g. 100 observations with y-values==1)

    Thanks a lot!

    All the best,
    Pavel

  • #2
    fracreg looks at individual observations. A value of say 0.125 occurring in 8 observations will get entered 8 times into the analysis. After all, the values on any other variables might be quite different in those 8 observations.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      fracreg looks at individual observations. A value of say 0.125 occurring in 8 observations will get entered 8 times into the analysis. After all, the values on any other variables might be quite different in those 8 observations.
      Thank you for your instant reply. Maybe you could give me a hint how to structure the data properly as well since I have not grasped the issue completely (yet).

      After creating the dependent variable "dvfrac", my data structure looks like this. "v2x_libdem" is an e.g. for an independent variable. The values are repeated according to the original number of y-values (originaly coded as zeros and ones) as is represented by "cnt_total" (see above).

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float dvfrac str52 country_name double(year v2x_libdem)
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2006  .16163026785839066
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2010  .14625647743363865
      .071428575 "Algeria"    2006  .16163026785839066
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
               1 "Argentina"  2010   .5934596723878987
             .75 "Armenia"    2006  .19382418467668425
             .75 "Armenia"    2006  .19382418467668425
             .75 "Armenia"    2006  .19382418467668425
             .75 "Armenia"    2006  .19382418467668425
             .75 "Armenia"    2006  .19382418467668425
             .75 "Armenia"    2006  .19382418467668425
             .75 "Armenia"    2006  .19382418467668425
             .75 "Armenia"    2006  .19382418467668425
             .75 "Australia"  2010   .8571495966761781
             .75 "Australia"  2010   .8571495966761781
             .75 "Australia"  2010   .8571495966761781
             .75 "Australia"  2010   .8571495966761781
             .75 "Australia"  2010   .8571495966761781
             .75 "Australia"  2010   .8571495966761781
             .75 "Australia"  2010   .8571495966761781
             .75 "Australia"  2010   .8571495966761781
             .95 "Austria"    2006    .776027648608974
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2006    .776027648608974
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2006    .776027648608974
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2006    .776027648608974
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2006    .776027648608974
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .95 "Austria"    2010   .7907100406165659
             .25 "Azerbaijan" 2010 .059621722575294904
             .25 "Azerbaijan" 2010 .059621722575294904
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2006   .0603815614510732
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2006   .0603815614510732
             .25 "Azerbaijan" 2010 .059621722575294904
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2006   .0603815614510732
             .25 "Azerbaijan" 2010 .059621722575294904
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2010 .059621722575294904
             .25 "Azerbaijan" 2006   .0603815614510732
             .25 "Azerbaijan" 2006   .0603815614510732
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2010 .059621722575294904
             .25 "Azerbaijan" 2007  .05849706883735425
             .25 "Azerbaijan" 2010 .059621722575294904
             .25 "Bahrain"    2010  .09711520049330387
             .25 "Bahrain"    2010  .09711520049330387
             .25 "Bahrain"    2010  .09711520049330387
             .25 "Bahrain"    2010  .09711520049330387
        .2173913 "Bangladesh" 2010  .23437784016012264
        .2173913 "Bangladesh" 2006  .11574706318205816
        .2173913 "Bangladesh" 2007  .11146987843881365
        .2173913 "Bangladesh" 2006  .11574706318205816
        .2173913 "Bangladesh" 2006  .11574706318205816
        .2173913 "Bangladesh" 2007  .11146987843881365
        .2173913 "Bangladesh" 2006  .11574706318205816
        .2173913 "Bangladesh" 2010  .23437784016012264
        .2173913 "Bangladesh" 2006  .11574706318205816
        .2173913 "Bangladesh" 2006  .11574706318205816
        .2173913 "Bangladesh" 2006  .11574706318205816
        .2173913 "Bangladesh" 2007  .11146987843881365
        .2173913 "Bangladesh" 2010  .23437784016012264
        .2173913 "Bangladesh" 2010  .23437784016012264
        .2173913 "Bangladesh" 2010  .23437784016012264
        .2173913 "Bangladesh" 2006  .11574706318205816
      end
      The corresponding code for a fractional logistic regression would go like this where "country_id" identifies cluster, here countries.
      Code:
      fracreg logit dvfrac v2x_libdem, vce(cluster country_id)
      This would result into 1039 observations:

      Click image for larger version

Name:	2.png
Views:	1
Size:	17.1 KB
ID:	1492004


      OR

      Do I need to collapse the data? This would create means of my IV observations from all years (2006,7,10)

      Code:
      collapse v2x_libdem dvfrac, by(country_id)
      Resulting in:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(country_id v2x_libdem) float dvfrac
        3   .5054888428752317   .6666667
        5   .8763593186742849         .8
        6   .8426826050046065   .7692308
        7   .6487724909870771          1
        8   .6747077415263735   .4285714
        9   .7809480355843593   .8333333
       11   .1512666259562761  .15384616
       14  .14060458579400864          0
       15   .4406554051727211   .3333333
       19   .7531078911470123   .4347826
       20   .8205290668096626   .7435898
       24   .1511082240461222   .2173913
       25   .4728555089600151   .3333333
       30   .5625495087972406       .875
       33  .05739406727155364          0
       37   .5934596723878985          1
       39   .5678291485219498   .1724138
       41 .001916913047696001          0
       42    .757833504740866   .9047619
       45  .27951503248126813        .25
       46  .36368311262988934   .3214286
       49  .24670479658479133   .2857143
       51   .1749250176138388          0
       56   .5047176908637956  .14814815
       58  .24774389235040517        .25
       62  .11771656103593667          0
       66   .8077772581246824   .9181818
       67   .8571495966761782        .75
       72   .8031991822638336          1
       73   .8534274057273265          1
       74   .4964621876899172         .5
       75   .3180911799143146         .2
       76   .8307702418582841          1
       77   .8630210035952781          1
       78  .39537427807396114        .25
       79   .1329600086449393 .074074075
       81   .7967591288195515   .9285714
       88  .37858363834604364          1
       90   .2362923453117127   .3636364
       91   .8137986932814519          1
       94  .09645386387495866       .125
       96    .775977414995091          1
       99   .4837215448353287   .6666667
      100  .34025179325971483          1
      101   .8489623940933386      .9375
      102    .859884841961045   .8333333
      103   .1484527332086032 .071428575
      105  .19382418467668422        .75
      106 .059361820799063214        .25
      107  .08015445327165008          0
      110  .04826320281806435   .0882353
      121  .12264783754643162   .3333333
      124 .043481913328874806          0
      126   .4825093068058886          1
      129  .16207737875074293   .2857143
      131   .2611658418224249          0
      144   .7870394426146683        .95
      146  .09711520049330387        .25
      148   .8227599246931526          1
      155  .03511180562944344  .03030303
      157    .820547375626767   .9047619
      158   .8812663398245516          1
      169   .5855258147029441   .8571429
      177  .19774334628402368          0
      185    .814124014856306         .8
      186   .8731326557414982   .9651163
      189   .5031270487378436   .8333333
      197 .033805387298271296          0
      200   .3085674458468786  .06896552
      210   .6505113046546774          1
      end
      And the corresponding fractional regression command would be:
      Code:
      fracreg logit dvfrac v2x_libdem
      Here, the number of observations would be 70.

      Click image for larger version

Name:	1.png
Views:	1
Size:	15.3 KB
ID:	1492003


      Thank you very much!

      Comment


      • #4
        If your outcome and covariates of interest are the same for a number of records, then you may use a collapsed dataset, and setting the frequency weight ([fw= ...]) to a variable that represents the frequency of observations with that exact outcome-covariate pattern. Or, you may use a dataset in which each row represents a single observation, and therefore not setting a frequency weight. This is much the same as the other glm regression commands.

        Comment


        • #5
          Originally posted by Leonardo Guizzetti View Post
          If your outcome and covariates of interest are the same for a number of records, then you may use a collapsed dataset, and setting the frequency weight ([fw= ...]) to a variable that represents the frequency of observations with that exact outcome-covariate pattern. Or, you may use a dataset in which each row represents a single observation, and therefore not setting a frequency weight. This is much the same as the other glm regression commands.
          This helps a lot, Leonardo! I will probably use the collapsed dataset with a frequency weighting.

          Comment


          • #6
            You're welcome, Pavel.

            Comment


            • #7
              Watch out if using an older version of STATA. The command
              Code:
              fracreg
              was reporting wrong no. of observations, see update 20feb2019:

              "5. fracreg with frequency weights reported the wrong number of
              observations. This has been fixed. The reported coefficients,
              standard errors, test statistics, and confidence intervals were
              correct."

              It runs correctly on updated STATA 15.1.

              Comment

              Working...
              X