Understanding command "fracreg" (fractional logistic regression)

Pavel Satra

Join Date: Feb 2019

Posts: 28
#1

Understanding command "fracreg" (fractional logistic regression)

05 Apr 2019, 07:21

Hello everyone,

I would like to apply a fractional logistic regression using the command "fracreg" using Stata 14.2. However, I do not understand how the fracreg command works with dependent variables stored as fractions.

My dependent variable is called "dvfrac" and I created it using the following command where "cnt_infavor" stands for the number of Y values==1 and "cnt_total" is a count of all Y values (zeros and ones) by an actor.

Code:

gen dvfrac = cnt_infavor / cnt_total

The result looks like this (sorry for the print screen, could not run dataex):
$Click image for larger version Name: tab dvfrac.png Views: 1 Size: 8.0 KB ID: 1491981$

Does "fracreg" work only with the final value of a fraction (e.g. 2nd row .030303) ? Or is there any type of weighting in place by the frequencies or percentages?

I am asking because from a theoretical point of view, it makes difference for my data if the fraction value 1 is based on 1 observation or is a result of 100 observations (e.g. 100 observations with y-values==1)

Thanks a lot!

All the best,
Pavel
Tags: fracreg, fractional logit, fractions
Nick Cox

Join Date: Mar 2014

Posts: 35715
#2

05 Apr 2019, 07:31

fracreg looks at individual observations. A value of say 0.125 occurring in 8 observations will get entered 8 times into the analysis. After all, the values on any other variables might be quite different in those 8 observations.
Comment

Pavel Satra

Join Date: Feb 2019
Posts: 28

05 Apr 2019, 08:46

Originally posted by Nick Cox View Post

fracreg looks at individual observations. A value of say 0.125 occurring in 8 observations will get entered 8 times into the analysis. After all, the values on any other variables might be quite different in those 8 observations.

Thank you for your instant reply. Maybe you could give me a hint how to structure the data properly as well since I have not grasped the issue completely (yet).

After creating the dependent variable "dvfrac", my data structure looks like this. "v2x_libdem" is an e.g. for an independent variable. The values are repeated according to the original number of y-values (originaly coded as zeros and ones) as is represented by "cnt_total" (see above).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float dvfrac str52 country_name double(year v2x_libdem)
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2006  .16163026785839066
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2010  .14625647743363865
.071428575 "Algeria"    2006  .16163026785839066
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
         1 "Argentina"  2010   .5934596723878987
       .75 "Armenia"    2006  .19382418467668425
       .75 "Armenia"    2006  .19382418467668425
       .75 "Armenia"    2006  .19382418467668425
       .75 "Armenia"    2006  .19382418467668425
       .75 "Armenia"    2006  .19382418467668425
       .75 "Armenia"    2006  .19382418467668425
       .75 "Armenia"    2006  .19382418467668425
       .75 "Armenia"    2006  .19382418467668425
       .75 "Australia"  2010   .8571495966761781
       .75 "Australia"  2010   .8571495966761781
       .75 "Australia"  2010   .8571495966761781
       .75 "Australia"  2010   .8571495966761781
       .75 "Australia"  2010   .8571495966761781
       .75 "Australia"  2010   .8571495966761781
       .75 "Australia"  2010   .8571495966761781
       .75 "Australia"  2010   .8571495966761781
       .95 "Austria"    2006    .776027648608974
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2006    .776027648608974
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2006    .776027648608974
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2006    .776027648608974
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2006    .776027648608974
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .95 "Austria"    2010   .7907100406165659
       .25 "Azerbaijan" 2010 .059621722575294904
       .25 "Azerbaijan" 2010 .059621722575294904
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2006   .0603815614510732
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2006   .0603815614510732
       .25 "Azerbaijan" 2010 .059621722575294904
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2006   .0603815614510732
       .25 "Azerbaijan" 2010 .059621722575294904
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2010 .059621722575294904
       .25 "Azerbaijan" 2006   .0603815614510732
       .25 "Azerbaijan" 2006   .0603815614510732
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2010 .059621722575294904
       .25 "Azerbaijan" 2007  .05849706883735425
       .25 "Azerbaijan" 2010 .059621722575294904
       .25 "Bahrain"    2010  .09711520049330387
       .25 "Bahrain"    2010  .09711520049330387
       .25 "Bahrain"    2010  .09711520049330387
       .25 "Bahrain"    2010  .09711520049330387
  .2173913 "Bangladesh" 2010  .23437784016012264
  .2173913 "Bangladesh" 2006  .11574706318205816
  .2173913 "Bangladesh" 2007  .11146987843881365
  .2173913 "Bangladesh" 2006  .11574706318205816
  .2173913 "Bangladesh" 2006  .11574706318205816
  .2173913 "Bangladesh" 2007  .11146987843881365
  .2173913 "Bangladesh" 2006  .11574706318205816
  .2173913 "Bangladesh" 2010  .23437784016012264
  .2173913 "Bangladesh" 2006  .11574706318205816
  .2173913 "Bangladesh" 2006  .11574706318205816
  .2173913 "Bangladesh" 2006  .11574706318205816
  .2173913 "Bangladesh" 2007  .11146987843881365
  .2173913 "Bangladesh" 2010  .23437784016012264
  .2173913 "Bangladesh" 2010  .23437784016012264
  .2173913 "Bangladesh" 2010  .23437784016012264
  .2173913 "Bangladesh" 2006  .11574706318205816
end

The corresponding code for a fractional logistic regression would go like this where "country_id" identifies cluster, here countries.

Code:

fracreg logit dvfrac v2x_libdem, vce(cluster country_id)

This would result into 1039 observations:

Click image for larger version

Name: 2.png
Views: 1
Size: 17.1 KB
ID: 1492004

OR

Do I need to collapse the data? This would create means of my IV observations from all years (2006,7,10)

Code:

collapse v2x_libdem dvfrac, by(country_id)

Resulting in:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(country_id v2x_libdem) float dvfrac
  3   .5054888428752317   .6666667
  5   .8763593186742849         .8
  6   .8426826050046065   .7692308
  7   .6487724909870771          1
  8   .6747077415263735   .4285714
  9   .7809480355843593   .8333333
 11   .1512666259562761  .15384616
 14  .14060458579400864          0
 15   .4406554051727211   .3333333
 19   .7531078911470123   .4347826
 20   .8205290668096626   .7435898
 24   .1511082240461222   .2173913
 25   .4728555089600151   .3333333
 30   .5625495087972406       .875
 33  .05739406727155364          0
 37   .5934596723878985          1
 39   .5678291485219498   .1724138
 41 .001916913047696001          0
 42    .757833504740866   .9047619
 45  .27951503248126813        .25
 46  .36368311262988934   .3214286
 49  .24670479658479133   .2857143
 51   .1749250176138388          0
 56   .5047176908637956  .14814815
 58  .24774389235040517        .25
 62  .11771656103593667          0
 66   .8077772581246824   .9181818
 67   .8571495966761782        .75
 72   .8031991822638336          1
 73   .8534274057273265          1
 74   .4964621876899172         .5
 75   .3180911799143146         .2
 76   .8307702418582841          1
 77   .8630210035952781          1
 78  .39537427807396114        .25
 79   .1329600086449393 .074074075
 81   .7967591288195515   .9285714
 88  .37858363834604364          1
 90   .2362923453117127   .3636364
 91   .8137986932814519          1
 94  .09645386387495866       .125
 96    .775977414995091          1
 99   .4837215448353287   .6666667
100  .34025179325971483          1
101   .8489623940933386      .9375
102    .859884841961045   .8333333
103   .1484527332086032 .071428575
105  .19382418467668422        .75
106 .059361820799063214        .25
107  .08015445327165008          0
110  .04826320281806435   .0882353
121  .12264783754643162   .3333333
124 .043481913328874806          0
126   .4825093068058886          1
129  .16207737875074293   .2857143
131   .2611658418224249          0
144   .7870394426146683        .95
146  .09711520049330387        .25
148   .8227599246931526          1
155  .03511180562944344  .03030303
157    .820547375626767   .9047619
158   .8812663398245516          1
169   .5855258147029441   .8571429
177  .19774334628402368          0
185    .814124014856306         .8
186   .8731326557414982   .9651163
189   .5031270487378436   .8333333
197 .033805387298271296          0
200   .3085674458468786  .06896552
210   .6505113046546774          1
end

And the corresponding fractional regression command would be:

Code:

fracreg logit dvfrac v2x_libdem

Here, the number of observations would be 70.

Click image for larger version

Name: 1.png
Views: 1
Size: 15.3 KB
ID: 1492003

Thank you very much!

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#4

05 Apr 2019, 09:51

If your outcome and covariates of interest are the same for a number of records, then you may use a collapsed dataset, and setting the frequency weight ([fw= ...]) to a variable that represents the frequency of observations with that exact outcome-covariate pattern. Or, you may use a dataset in which each row represents a single observation, and therefore not setting a frequency weight. This is much the same as the other glm regression commands.
1 like
Comment
Pavel Satra

Join Date: Feb 2019

Posts: 28
#5

05 Apr 2019, 10:16

Originally posted by Leonardo Guizzetti View Post

If your outcome and covariates of interest are the same for a number of records, then you may use a collapsed dataset, and setting the frequency weight ([fw= ...]) to a variable that represents the frequency of observations with that exact outcome-covariate pattern. Or, you may use a dataset in which each row represents a single observation, and therefore not setting a frequency weight. This is much the same as the other glm regression commands.

This helps a lot, Leonardo! I will probably use the collapsed dataset with a frequency weighting.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#6

05 Apr 2019, 11:23

You're welcome, Pavel.
Comment
Pavel Satra

Join Date: Feb 2019

Posts: 28
#7

16 Apr 2019, 00:37

Watch out if using an older version of STATA. The command

Code:

fracreg

was reporting wrong no. of observations, see update 20feb2019:

"5. fracreg with frequency weights reported the wrong number of
observations. This has been fixed. The reported coefficients,
standard errors, test statistics, and confidence intervals were
correct."

It runs correctly on updated STATA 15.1.
Comment

Announcement