Fractional Response Dependent Variable

Monica White

Join Date: Jan 2017

Posts: 98
#1

Fractional Response Dependent Variable

17 Apr 2017, 08:12

I have a dependent variable that is a rate, and I'm currently using a fractional logit estimation.

Code:

xtgee DV IV, family(binomial 100) link(logit) corr(exchangeable) vce(robust)

my unit of analysis is directed dyad year, with the dyads being countries. My panel is unbalanced, and it seems that there is not much that I can do about it in the meantime. The data are also time-series, cross-sectional (the data covers 14 years).

I was reading Wooldridge and Papke (1996), among others, and it seems that fractional logit should incorporate balanced data.

I read another article that treated the data as pooled cross-sectional, with appropriate time controls (however, they used survey data). Is this the proper way to do so, and if so how does one go about it in stata with xtgee command? I found some slides on stata.com but nothing with regards to GEE http://www.stata.com/meeting/wcsug07/cameronwcsug.pdf

When I originally set up the panel with xtset, I made a variable for the dyads used "dyad year". The most I ever came across with pooling cross-sectional data is with surveys. It seems a little bit of an overkill given the unit of analysis already, but I'm not sure if just to make do with the unbalanced panel, or go about it another way.

Thanks in advance for any help!
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#2

17 Apr 2017, 11:02

Monica:

There's no problem with unbalanced panels; the command you use above is fine. It does assume that the data are essentially "missing at random," but this is the same as all panel data methods.

It gets trickier in microeconometric settings with large N and small T when you want to explicitly account for heterogeneity. Putting in dummies to estimate the fixed effects leads to bias unless T is pretty big. In this attached paper I show how to extend the correlated random effects approach to allow for unbalanced panels. You can still use the xtgee command above, but you would include functions of time averages and the number of time periods available for each cross-sectional unit.

Attached Files

cre1_hausman_20160131.pdf (194.3 KB, 2 views)
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#3

17 Apr 2017, 12:37

Originally posted by Jeff Wooldridge View Post

Monica:

There's no problem with unbalanced panels; the command you use above is fine. It does assume that the data are essentially "missing at random," but this is the same as all panel data methods.

It gets trickier in microeconometric settings with large N and small T when you want to explicitly account for heterogeneity. Putting in dummies to estimate the fixed effects leads to bias unless T is pretty big. In this attached paper I show how to extend the correlated random effects approach to allow for unbalanced panels. You can still use the xtgee command above, but you would include functions of time averages and the number of time periods available for each cross-sectional unit.

Hello Professor Wooldridge,

Thank you for replying! I'm a little bit embarrassed to say, though, that some aspects of your paper went a bit over my head. I'm also not sure what you mean by including functions of time averages and the number of time periods for all of the cross-sectional units. (i'm assuming here the cross sectional units are the dyads, specifically the directed dyads). I saw the equation in the paper, but I'm not sure what it means, and how I would go about it.

Unfortunately, these are the only years for which the data are available. My missing data on the DV are "missing," but not missing per se. The values just don't exist, so the rates can't be calculated if that makes any sense. I even reached out to the statistics unit which provides the data, and they simply said that missing in this case is just NA.

I've been going back and forth with this for some time now, but I was happy to come across the fractional logit model thanks to Nick Cox, and the readings were very interesting and easy to follow along. I really would like to stay with the GEE if I can given the directed dyads, but it has been a bit of a headache. But I believe theoretically it is the best approach.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#4

17 Apr 2017, 19:21

It will be much easier to provide suggestions if you can show several rows of your data. For example, it depends whether you have a line for each year for every cross sectional unit, even if the data are missing. You can create the time averages using the -egen- command in Stata. I can help more if you show a bit of your data.
1 like
Comment

Monica White

Join Date: Jan 2017
Posts: 98

17 Apr 2017, 20:07

Originally posted by Jeff Wooldridge View Post

It will be much easier to provide suggestions if you can show several rows of your data. For example, it depends whether you have a line for each year for every cross sectional unit, even if the data are missing. You can create the time averages using the -egen- command in Stata. I can help more if you show a bit of your data.

Sure, here's snippet that I was able to fit into dataex. I actually have a lot of variables, but to answer your question, I do have rows for all of the years for the dyads. Off the top of my head, I believe the rivalry variable is available only until 2010, alliances until 2012 and trade data (from the Correlates of War project) has missing data. The asylum_prop (the DV) is the DV that I mentioned earlier, where the values are "missing" but it is because the values are not there to calculate the rate. Unfortunately, a good chunk of them are missing, but there is not much that I can do, and it is the only dataset that provides the data for which I need.

Code:

input double ccode1 float(ccode2 year dyad asylum_prop exchange logtrade) byte rivalry_code float alliance byte polity2_host int exelec float(democracy autocracy) byte(conflict colonial_dummy) float average byte numberofborders float border
2 20 2000 1          . 1 12.894857 0 1 10 1 1 0 0 0   1 5 1
2 20 2001 1         .5 1 12.835595 0 1 10 0 1 0 0 0   1 5 1
2 20 2002 1          0 1 12.812908 0 1 10 0 1 0 0 0 1.5 5 1
2 20 2003 1   .3636364 1  12.86691 0 1 10 0 1 0 0 0   1 5 1
2 20 2004 1   .3333333 1 12.985546 0 1 10 1 1 0 0 0   1 5 1
2 20 2005 1   .5555556 1 13.096643 0 1 10 0 1 0 0 0   1 5 1
2 20 2006 1   .3076923 1 13.159528 0 1 10 0 1 0 0 0   1 5 1
2 20 2007 1         .4 1 13.206733 0 1 10 0 1 0 0 0 1.5 5 1
2 20 2008 1  .14285713 1 13.262457 0 1 10 1 1 0 0 0 1.5 5 1
2 20 2009 1          0 1 12.920918 0 1 10 0 1 0 0 0   1 5 1
2 20 2010 1      .6875 1  13.11756 0 1 10 0 1 0 0 0   1 5 1
2 20 2011 1   .5555556 1 13.247954 . 1 10 0 1 0 0 0   1 5 1
2 20 2012 1   .4782609 1 13.281425 . 1 10 1 1 0 0 0   1 5 1
2 20 2013 1   .4117647 1 13.309654 . . 10 . 1 0 0 0   1 5 1
2 31 2000 2          . 1  7.269875 0 1 10 1 1 0 0 0   1 5 1
2 31 2001 2          0 1  7.280966 0 1 10 0 1 0 0 0   1 5 1
2 31 2002 2          0 1  7.346275 0 1 10 0 1 0 0 0   2 5 1
2 31 2003 2          0 1  7.434098 0 1 10 0 1 0 0 0   2 5 1
2 31 2004 2          0 1  7.584626 0 1 10 1 1 0 0 0 1.5 5 1
2 31 2005 2          0 1  7.890616 0 1 10 0 1 0 0 0 1.5 5 1
2 31 2006 2         .3 1  8.003801 0 1 10 0 1 0 0 0 1.5 5 1
2 31 2007 2          0 1  8.084442 0 1 10 0 1 0 0 0 2.5 5 1
2 31 2008 2   .3333333 1  8.205992 0 1 10 1 1 0 0 0   2 5 1
2 31 2009 2          1 1  8.173499 0 1 10 0 1 0 0 0   2 5 1
2 31 2010 2       .125 1  8.380542 0 1 10 0 1 0 0 0 1.5 5 1
2 31 2011 2          0 1  8.426463 . 1 10 0 1 0 0 0   2 5 1
2 31 2012 2  .06666666 1    8.4216 . 1 10 1 1 0 0 0   2 5 1
2 31 2013 2   .4615385 1  8.459081 . . 10 . 1 0 0 0   2 5 1
2 40 2000 3          . 1 1.3376292 1 0 10 1 1 0 0 0   3 5 1
2 40 2001 3    .408805 0 2.0122328 1 0 10 0 1 0 0 0   3 5 1
2 40 2002 3   .4197531 1  5.069847 1 0 10 0 1 0 0 0 2.5 5 1
2 40 2003 3  .29535866 1  5.660527 1 0 10 0 1 0 0 0 2.5 5 1
2 40 2004 3  .26217228 0  6.087774 1 0 10 1 1 0 0 0 2.5 5 1
2 40 2005 3   .3208556 0  5.986125 1 0 10 0 1 0 0 0 2.5 5 1
2 40 2006 3   .4265403 1  5.947173 1 0 10 0 1 0 0 0   3 5 1
2 40 2007 3   .3837838 1  6.198479 1 0 10 0 1 0 0 0   3 5 1
2 40 2008 3   .5384615 0   6.67178 1 0 10 1 1 0 0 0   3 5 1
2 40 2009 3  .58715594 0  6.374582 1 0 10 0 1 0 0 0   3 5 1
2 40 2010 3   .3034483 1    6.0109 1 0 10 0 1 0 0 0   3 5 1
2 40 2011 3   .1970803 0  5.958941 . 0 10 0 1 0 0 0   3 5 1
2 40 2012 3        .25 1  6.238188 . 0 10 1 1 0 0 0   3 5 1
2 40 2013 3   .3298969 0  5.976402 . . 10 . 1 0 0 0   3 5 1
2 41 2000 4          . 1  6.369106 0 1 10 1 1 0 0 0 3.5 5 0
2 41 2001 4       .152 1  6.252753 0 1 10 0 1 0 0 0   3 5 0
2 41 2002 4   .2223009 1   6.19663 0 1 10 0 1 0 0 0   3 5 0
2 41 2003 4   .2167614 1  6.319992 0 1 10 0 1 0 0 0   3 5 0
2 41 2004 4  .25355285 1  6.541104 0 1 10 1 1 0 1 0 4.5 5 0
2 41 2005 4  .27866742 1  7.102837 0 1 10 0 1 0 0 0   4 5 0
2 41 2006 4  .31651255 1   7.24332 0 1 10 0 1 0 0 0 3.5 5 0
2 41 2007 4   .3076149 1  7.156153 0 1 10 0 1 0 0 0 3.5 5 0
2 41 2008 4  .28411338 1  7.315385 0 1 10 1 1 0 0 0 2.5 5 0
2 41 2009 4  .28038472 1  7.270146 0 1 10 0 1 0 0 0 2.5 5 0
2 41 2010 4   .3454618 1  7.551838 0 1 10 0 1 0 0 0 2.5 5 0
2 41 2011 4  .54769474 1  7.564166 . 1 10 0 1 0 0 0   2 5 0
2 41 2012 4   .7350428 1  7.579633 . 1 10 1 1 0 0 0   3 5 0
2 41 2013 4   .7240964 1   7.70659 . . 10 . 1 0 0 0   3 5 0
2 42 2000 5          . 1  9.286952 0 1 10 1 1 0 0 0 2.5 5 0
2 42 2001 5  .07142857 1  9.202277 0 1 10 0 1 0 0 0   3 5 0
2 42 2002 5      .1875 1 9.1837635 0 1 10 0 1 0 0 0 2.5 5 0
2 42 2003 5  .12121212 1 9.1166115 0 1 10 0 1 0 0 0   3 5 0
2 42 2004 5  .04081633 1  9.112053 0 1 10 1 1 0 0 0 2.5 5 0
2 42 2005 5  .12121212 1  9.182662 0 1 10 0 1 0 0 0   3 5 0
2 42 2006 5 .023809524 1  9.274787 0 1 10 0 1 0 0 0   3 5 0
2 42 2007 5  .05555555 1  9.289117 0 1 10 0 1 0 0 0   3 5 0
2 42 2008 5         .5 1  9.303569 0 1 10 1 1 0 0 0   4 5 0
2 42 2009 5   .3333333 1  9.100718 0 1 10 0 1 0 0 0   4 5 0
2 42 2010 5       .125 1  9.259859 0 1 10 0 1 0 0 0 3.5 5 0
2 42 2011 5  .27906978 1  9.393679 . 1 10 0 1 0 0 0 3.5 5 0
2 42 2012 5  .15873015 1  9.404538 . 1 10 1 1 0 0 0   3 5 0
2 42 2013 5      .1875 1  9.363814 . . 10 . 1 0 0 0   3 5 0

Thank you again for your help. Please let me know if you need anything else.

Comment

Pa Pi

Join Date: Apr 2018

Posts: 2
#6

21 Dec 2018, 03:44

Originally posted by Jeff Wooldridge View Post

It will be much easier to provide suggestions if you can show several rows of your data. For example, it depends whether you have a line for each year for every cross sectional unit, even if the data are missing. You can create the time averages using the -egen- command in Stata. I can help more if you show a bit of your data.

Dear Prof. Jeff Wooldridge,
I have some questions about xtgee command, I run xtgee with the time averages as your suggest, the command is:
xtgee Yit Xit averageXit, family(binomial 1) link(logit) corr(exchangeable)
but I don't see values of "working correlation" and "scale factor" (that appear in table 4, p129, Leslie E. Papke, Jeffrey M. Wooldridge (2008)). Do these results appear in xtgee command with logit link?
I have read the formulas 3.9 - 3.11 in your paper but I don't know how to estimate APE in Stata. I have tried to run "mfx" command and Stata worked. Can I use the results of Marginal effects after xtgee?

Thank you,
Comment
Panika Jain

Join Date: Mar 2019

Posts: 8
#7

16 Mar 2019, 04:49

Dear Prof. Jeff Wooldridge,
My dependent variable is in the form of score ranging between zero and one. like( 0.32, 0.98). I have unbalanced panel dataset (n=119 t=18). According to your paper, one should add time dummies in X covariates. So, should I add 18 time dummies in my model. Also, I am not sure how to technically do in STATA, like how to create averages of all explanatory variables with different T and how to specify selection indicator. You have used meap94_98 in one of your presentations to explain how we can do it in STATA. From where I can get access to this datafile?
.
Comment
Anthony Macedo

Join Date: Sep 2017

Posts: 38
#8

28 May 2019, 09:12

Dear Stata users,

Following Wooldridge (2018), I want to compute the FE estimator as a pooled OLS estimator using the Mundlak device (i.e. using the original data and adding the time averages of the covariates as additional explanatory variables).

I have 31000 pairs of countries (panel variable) during 17 years (time variable) and my dependent variable is a kind of market share, so it is a fractional within [0, 1].
My model has variables that vary by exporter (e.g. production), by importer (e.g. purchasing power) and by pairs (e.g. exchange rate), so to compute time averages of the covariates I used the following code:

Code:

gen sample=0 qui xtreg weight_usd lgdpcap_d fta_wto ler eu_d lprod1000hl_o lprod1000hl_d yr*, re replace sample=1 if e(sample) foreach v in lgdpcap_d eu_d lprod1000hl_d { bysort iso_d : egen double mean_`v' = mean(`v') if sample==1 } foreach v in lprod1000hl_o { bysort iso_o : egen double mean_`v' = mean(`v') if sample==1 } foreach v in fta_wto ler { bysort pairid_a : egen double mean_`v' = mean(`v') if sample==1 }

However, I do not know how to deal with my time dummies "yr". The paper says "[...] in obtaining the FE estimator, any aggregate time variables – in particular, time dummies – should be part of x_it , and their time averages must be included in xbar_i".
How should I include time averages of the time dummies?
I don't think I should use the following code because it would create some repeated variables:

Code:

foreach v in yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 yr14 yr15 yr16 yr17 { bysort pairid_a : egen double mean_`v' = mean(`v') if sample==1 }

Thank you for your attention.

Reference: J.M. Wooldridge (2018) "Correlated random effects models with unbalanced panels", Journal of Econometrics, https://doi.org/10.1016/j.jeconom.2018.12.010.
Comment

Announcement