Logit regression discrete dependent variable panel data

Victoria Zorzi

Join Date: Mar 2020

Posts: 18
#1

Logit regression discrete dependent variable panel data

04 Apr 2020, 09:36

Hi

I am having difficulties with my dataset in regards to my dependent variable (pctagreechina) being discrete. The variable states the percentage of how much a country voted in accordance with China in UNGA in a given year. Thus fx 1 = 100% and .83 = 83%. I have previously tried to use xtreg for my statistical analysis but the results did not make much sense. However, after searching through the internet I found that it is because with xtreg STATA considers my dependent variable as continous.

How can I run my regression (I wish to used fe)? And is there a command that I should use to tell STATA that variable 'pctagreechina' is in percentage?

For clarification, I wish to explore the effect of Chinese foreign aid on the recipient country's voting behaviour in UNGA from 2000-2014. I expect to find that countries who have received more in the given period will to higher degree have changed their voting behaviour to be in more accordance with China's.

Thank you in advance!

Code:

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- country | 645 22.16279 12.6588 1 44 year | 645 2007 4.323847 2000 2014 pctagreechina | 633 .8329394 .0751723 .3333333 1 amount | 645 9.51e+09 7.46e+10 0 1.34e+12 Population | 645 1.81e+07 2.69e+07 81131 1.76e+08 -------------+--------------------------------------------------------- GDP | 645 2.36e+10 6.42e+10 3.50e+08 5.68e+11 NaturalRes~s | 645 3.20e+09 8.87e+09 0 7.87e+10 PolityIV | 612 2.542484 4.990645 -7 10
Tags: discrete variable, fixed effects, logit, panel data, xtlogit
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

04 Apr 2020, 12:16

You cannot use logistic regression, as implemented by -logit- or -logistic- with an outcome that is a proportion between 0 and 1. Those commands require a dichotomous outcome variable.

Your data were, at some point, observations of individual opportunities to vote with or against China, and then they were totaled up. If you have, or can get access to, the data in that state, you can run -logistic- or -logit- with that data. Or, if you can get totaled data that contains the actual number (not proportion) of votes with China and the total number of opportunities to vote, then there is -glogit-. -help glogit-

If the data are only available as proportions, then you should look at -fracreg- for fitting a logistic model. -help fracreg-.

But even these solutions will not deal with the panel structure in your data.

If you can get the data with total number of votes with China and total number of opportunities to vote, there is -meglm- which will let you fit a random effects logistic model with the number of votes as the outcome variable, if you specify -link(logit)- and -family(binomial x)-, where you replace "x" by the name of the variable giving the number of opportunities to vote. Or, you could get a population-averaged logistic model with -xtgee- and those same specifications of -link()- and -family()- options.

Last edited by Clyde Schechter; 04 Apr 2020, 12:24.
Comment
Victoria Zorzi

Join Date: Mar 2020

Posts: 18
#3

05 Apr 2020, 16:04

Originally posted by Clyde Schechter View Post

You cannot use logistic regression, as implemented by -logit- or -logistic- with an outcome that is a proportion between 0 and 1. Those commands require a dichotomous outcome variable.

Your data were, at some point, observations of individual opportunities to vote with or against China, and then they were totaled up. If you have, or can get access to, the data in that state, you can run -logistic- or -logit- with that data. Or, if you can get totaled data that contains the actual number (not proportion) of votes with China and the total number of opportunities to vote, then there is -glogit-. -help glogit-

If the data are only available as proportions, then you should look at -fracreg- for fitting a logistic model. -help fracreg-.

But even these solutions will not deal with the panel structure in your data.

If you can get the data with total number of votes with China and total number of opportunities to vote, there is -meglm- which will let you fit a random effects logistic model with the number of votes as the outcome variable, if you specify -link(logit)- and -family(binomial x)-, where you replace "x" by the name of the variable giving the number of opportunities to vote. Or, you could get a population-averaged logistic model with -xtgee- and those same specifications of -link()- and -family()- options.

Hi Clyde

Thank you very much for your comments.

I have the totaled data that contains the actual number of votes with China and number of opportunities. How do I create one variable with this information that I can use as my dependent variable?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

05 Apr 2020, 16:59

No, you don't create one variable with that. Except for -fracreg- (which just uses the proportion you had in the first place), the alternatives I suggested in #2 require keeping the number of votes with China and the number of opportunities as two separate variables. For -meglm- or -xtgee-, you use the count of votes with China as the outcome variable. Then, the number of opportunities variable is used as "x" in the -family(binomial "x")-. If you are going to use -glogit-, then the count of votes with China is the outcome variable and the number of opportunities should be listed between the outcome and the predictor variables.

Before you use any of these approaches, be sure to read the -help- files for these commands so you will get the syntax right and understand what they do.
Comment

Announcement

Logit regression discrete dependent variable panel data

Comment

Comment

Comment