Xtlogit vs Logit , Cluster , observations

Mansour Mohammad

Join Date: Nov 2020

Posts: 11
#1

Xtlogit vs Logit , Cluster , observations

19 Nov 2020, 14:16

Dear respected Members,
I’m not professional in Stata. Sorry, Pls could anyone help and offer me an explanation with regards to the below question(s);
In regards to panel data set, can I use logit instead of xtlogit? The idea here is, I want to cluster standard error at (id) level, and I have seen in the Google sites that if I want to cluster standard error at (id) I must use logit not Xtlogit! Pls, anyone help for this case?

Well, as I indicated above about the logit model, I have seen in this website that the e (r2-p) for the conditional fixed effect, and e (chi2) for random-effect specification. However, I saw one paper related to my topic (they used logit model ) and in the paper table I have seen this (Pseudo R2 (%) , LR chi2, Prob > chi2) . well, in this case what they used : random or fixed , logit or xtlogit ?

The reason that brought me to some concerns, when I used fixed effect, almost all observations dropped because I don’t have a lot of (Yes) in the dep Dummy and many years without yes ? In regards to the question one, if I must use xtlogit, can I use random effect?

Or you recommended me to reduce the sample even to have balance between yes and no? Take into account that I have 750 instituions only 300 out of them have yes but not all the years and the rest all of them have NO. Is it ok with statistics standpoint? Thanks a lot. I really appreciated.

Kind Regards

Last edited by Mansour Mohammad; 19 Nov 2020, 14:18.
Tags: None

1 like
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17734
#2

20 Nov 2020, 04:41

Mansour:
1. you're correct that you cannot use -vce(cluster)- option with -xtlogit-. Hence, if you want to go -cluster- you should perform a pooled -logit-;
2. It is difficult to reply without knowing the paper you refer to,
3. & 4. As per FAQ, please post an example/excerpt of your data via -dataex-. Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Mansour Mohammad

Join Date: Nov 2020

Posts: 11
#3

20 Nov 2020, 05:25

Dear Carlo Lazzaro
Thank you so much for your reply,
I have seen before that you are in the field of economics. I hope you can give me some suggestions.
I am working with a model where the dependent variable (y=0 or 1) (0 manes institution doesn't issue financial note, 1 means institution issue financial note). From 2010 to 2017. After filtering the data well, I have 420 financial intuitions, it was more but I reduced. I want to see the propensity of these financial intuitions to Issue this financial not. However, I have in the total 3360 obs, but I have only 150 issuance events (1) and the rest of obs not issue (0).
That is means that I have only 100 institutions out of 420 issue financial notes and not all the years. This topic is very rare in previous literature approximately 3 paper one of them used logit model but without information’s, only the information’s that I have wrote them in the previous post.
When I used fixed effect, almost all observations dropped because I don’t have a lot of (1) in the dep Dummy and many years without 1. Then I went to xtlogit random with clustering standard error at intuitions level because the prior litr used clustering standard error at intuitions. Well I don’t know which one is correct fixed or random / logit or xtlogit.
Or do you recommend me to use the command for rare event? Because I don’t have 1 a lot comparing to 0. Many Thanks; I really appreciated any suggestions from you.
This is an example of my data
clear
input byte depvar double(var1 var2 var3 var4)
0 .52986 .010780000000000001 .0023799999999999997 .006490000000000001
0 .5806 .00809 .00241 .00588
0 .6817699999999999 .01106 .00227 .007200000000000001
0 .6913499999999999 .014990000000000002 .0020499999999999997 .006060000000000001
0 .71831 .01513 .00148 .0077
0 .7335299999999999 .02505 .00135 .007300000000000001
0 .73512 .02841 .00096 .00519
0 .7532799999999998 .12678 .00079 .0043500000000000014
0 .45280000000000004 .16973 .01257 .
0 .51658 .10385 .00963 .
0 .59943 .09439 .01008 .023090000000000003
0 .59881 .09066 .00875 .024620000000000003
0 .61117 .12711 .007560000000000001 .03177
0 .6762299999999999 .09924 .006750000000000002 .02682
0 .7016499999999999 .12316 .006280000000000001 .025019999999999997
0 .71724 .15306 .006940000000000001 .02308
0 . . . .
0 . . . .
0 .69845 .036610000000000004 .07896 .34456000000000003
0 .69146 .05389 .07477 .27182
0 .68539 .09246999999999998 .07561999999999999 .25839
0 .6898000000000001 .08289999999999999 .07835 .23716
0 .7306 .05513 .059890000000000006 .22605
0 .7420300000000001 .050890000000000005 .042069999999999996 .08852000000000002
0 . . . .
0 . . . .
0 .64533 .26319 .10658 .25386
0 .65886 .2305 .11283 .26871
0 .6573100000000001 .19786 .13079000000000002 .28268
0 .61674 .21511 .12763 .27221
0 .597 .20812 .11964999999999999 .2731
0 .61663 .19396000000000002 .11247 .26985
0 . . . .
0 . . . .
0 .45039 .45997 .23001000000000002 .38913
0 .3229600000000001 .58235 .23904 .49747
0 .4127 .45433999999999997 .23904 .48310000000000003
0 .38939 .48310999999999993 .23904 .44323999999999997
0 .40969 .44878 .23904 .36442
0 .37445 .46486 .15136 .07473
0 .62727 .3245 .1103 .19254000000000002
0 .54382 .42116 .13559 .22253
0 .59305 .35720000000000013 .17907 .28217
0 .5282399999999999 .41552 .18797 .31128
0 .5153099999999999 .43099 .18902000000000002 .27856000000000003
0 .54589 .4035899999999999 .14334 .20529
0 .56802 .38749 .12093999999999999 .17662
0 .56287 .38942 .09420999999999999 .18826
0 .59945 .21818 .0421 .08167999999999999
0 .54217 .27502 .04853999999999999 .08990000000000001
0 .55969 .25085 .055330000000000004 .10117000000000001
0 .54021 .25986 .06359 .09907
0 .51507 .30535 .069 .10792
0 .48582000000000003 .31431000000000003 .06959000000000001 .0995
0 .50363 .24178 .050390000000000004 .06727
0 .45748 .51715 .05367999999999999 .05266999999999999
0 .6340699999999999 .28226 .03869 .09675
0 .58579 .33936999999999995 .04381 .19041
0 .58797 .33385 .06687 .19937000000000002
0 .5895699999999998 .33433 .08 .2003
0 .57898 .34719 .07143999999999999 .19499
0 .59463 .33799999999999997 .07623999999999999 .18576
0 .6149600000000001 .32348 .059980000000000006 .12824
0 .62164 .30177 .060939999999999994 .14351
0 .6887899999999999 .22038 .028229999999999998 .06793
0 .6418699999999999 .27127 .030539999999999998 .09906000000000001
0 .61227 .30039 .04226 .12997999999999998
0 .63448 .29332 .05282 .
0 .6570199999999999 .27143999999999996 .05582 .
0 .58774 .3827200000000001 .06589 .
0 .60421 .34076 .048659999999999995 .
0 .59619 .35710000000000003 .05504 .
0 . . . .
0 . . . .
0 .45905 .43398000000000003 .08569 .15464
0 .42289 .46774 .09879000000000002 .16225
0 .4035099999999999 .46192999999999995 .12979 .23317
0 .37015 .48563 .1157 .21742
0 .34760000000000013 .47935 .12122999999999999 .20898
0 .36678 .46012000000000003 .10151 .16798
0 .8193100000000001 .13425 .0059 .0128
0 .82807 .12294000000000001 .005869999999999999 .01272
0 .8356 .12987 .003340000000000001 .01524
0 .81557 .14195 .00299 .0173
1 .8429700000000001 .1235 .00244 .01858
0 .8428 .12782 .00215 .01786
0 .85536 .12284 .00187 .014419999999999999
0 .86662 .11503000000000001 .0022 .0023899999999999998
0 .74003 .23018 .01422 .0227
0 .71654 .25764 .01577 .02328
0 .65729 .31444 .01536 .026709999999999998
0 .63961 .33163 .0141 .02016
0 .69512 .27686 .01015 .01638
0 .68181 .2923 .01016 .012709999999999999
0 .6593100000000001 .3175 .00767 .007590000000000001
0 .6159300000000001 .36382 .008270000000000001 .0036
0 .72634 .24806 .048150000000000005 .062410000000000014
0 .7026600000000001 .27539 .056739999999999985 .06909000000000001
0 .70722 .27363 .05777 .09135
0 .7027100000000001 .27767 .05851 .08235999999999999
end
[/CODE]
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17734
#4

20 Nov 2020, 05:45

Mansour:
you may want to consider -exlogistic-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mansour Mohammad

Join Date: Nov 2020

Posts: 11
#5

20 Nov 2020, 05:56

Dear Carlo Lazzaro
Thanks for your reply .
logit model with clustring standrad error at id level dose not work in my case ? Thanks
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17734
#6

20 Nov 2020, 06:09

Mansour:
yes, it may work conditional on the 0/1 presence in your regressand.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mansour Mohammad

Join Date: Nov 2020

Posts: 11
#7

20 Nov 2020, 06:24

Dear Carlo Lazzaro
I am sorry to bother you, but the last question.
I understand you. However, if I reduced the sample to 250 institutions which mean that remove 170 institutions that count as 0. In this case, logit model work? If Yes. With clustering standard error at institution level as you said in pooled logit. Here do you mean that normal logit? and fixed or random? Thank you so much
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17734
#8

20 Nov 2020, 06:55

Mansour:
any way you manipulate your original sample means ending up with a made-up subsample that, in all likelihood, differs from the starting one.
I mean pooled -logit- (which is not purely -fixed- or -random-).
Please also niote that -xtlogit- -fe- option means conditional -fe-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mary Burckhette

Join Date: May 2023

Posts: 35
#9

29 Jun 2023, 08:27

Originally posted by Carlo Lazzaro View Post

Mansour:
any way you manipulate your original sample means ending up with a made-up subsample that, in all likelihood, differs from the starting one.
I mean pooled -logit- (which is not purely -fixed- or -random-).
Please also niote that -xtlogit- -fe- option means conditional -fe-.

Dear Carlo,

Thank you very much for your answer. Just like the original poster I have panel data of an experiment, where 250 individuals complete 4 rounds. Time-invariant variables are i.round, i.TG/CG (whether the individual was assigned to the treatment or control group), and i.gender. Time-varying variables are the respective decisions in that round and other variables. In addition, my dependent variable is binary (whether individuals agree with certain statements or not) - meaning simpler regress models do not work.

After much reading on the internet and posts on this forum (https://www.statalist.org/forums/for...-and-r-squared) (https://www.statalist.org/forums/for...for-panel-data), I am confused whether to select xtlogit, logit or pooled logit for my analysis. Previously, I used xtlogit but now I fear the standard re option is unsuitable for my data, because I feel TG or CG will influence the dependent binary variable - something I believe means re (and therefore xtlogit) can no longer be used. Therefore, I thought a normal logit model (without xtset ID round), which does not have either random effects or fixed effects attached to it, works best. However, now I believe I may have time-invariant attributes of the panels as confounding variables in my experiment (https://www.statalist.org/forums/for...-and-r-squared - particularly Clyde's (#2) comment).

Any help or advice would greatly be appreciated! Thank you very much in advance!

Kind regards,
Mary
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17734
#10

30 Jun 2023, 06:00

Mary:
1) how can -i.round- be a (within-panel) time-invariant variable?
2) it is a good thing that predictors influence your dependent variable, as this is what regressions are for: trying to teasing out the contribution of each predictor to variation of the regressand when adjusted for the remaining predictors;
3) a pooled logit model would work better than -xtlogit- provided that you do not have a panel-wise effect.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mary Burckhette

Join Date: May 2023

Posts: 35
#11

30 Jun 2023, 06:52

Dear Carlo,

Thank you very much for your comment.
1) Yes, of course, you are right - i.round is time-varying
2) I read on the internet (not peer-reviewed sources) that random effects models (standard option of xtlogit) have harsher requirements than standard models (without fe or re), and therefore, caution should be used, but generally, of course, I agree, else what's the point of a regression analysis
3) With panel-wise effect, do you mean serial correlation between the rounds or between the clusters (of participants)?

May I kindly ask why I should disregard the panel features of my data by using logit, vce(cluster ID) instead of xtlogit if my observations are not truly independent (as 1 person completes 4 rounds)?

Thank you so very much for your support!
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#12

30 Jun 2023, 07:25

Mary: Is the treatment assigned once at the beginning, and then you observe outcomes over four rounds? Or does the treatment itself change over time? If the latter, is it randomized in each round?

There's no reason to use xtlogit unless you want to do a "fixed effects" version, but you don't need that if you have an experiment. Use logit pooled across all observations and cluster at the subject level. Use margins, dydx() to get the effect of the treatment on the probability.

And I would start with a linear model even though you response variable is binary. The linear model can provide a good approximation. It might not. So try both OLS (clustering standard errors) and logit with margins and see if they are similar.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17734
#13

30 Jun 2023, 07:39

Mary:
1 indeed!! ;
2) the usual requirements hold: group-wise effect (please see 3)) and correct specification; I'm not aware of "tighter" methodological constraints;
3) I meant -LR test of rho=0- reaches statistical significance (caution: this statistic is availabale with default standard errors only);
4) going pooled logit may be a suboptimal choice if you have a group-wise effect. Usually, pooled regression give back results that are half way between the -fe- and the -re- specification.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mary Burckhette

Join Date: May 2023

Posts: 35
#14

30 Jun 2023, 10:28

Originally posted by Jeff Wooldridge View Post

Mary: Is the treatment assigned once at the beginning, and then you observe outcomes over four rounds? Or does the treatment itself change over time? If the latter, is it randomized in each round?

There's no reason to use xtlogit unless you want to do a "fixed effects" version, but you don't need that if you have an experiment. Use logit pooled across all observations and cluster at the subject level. Use margins, dydx() to get the effect of the treatment on the probability.

And I would start with a linear model even though you response variable is binary. The linear model can provide a good approximation. It might not. So try both OLS (clustering standard errors) and logit with margins and see if they are similar.

Dear Jeff,

Thank you very much for your comment. Yes, the assignment to either the treatment or control group is random (but balanced) and occurs at the beginning of the experiment. The assignment does not change throughout the four rounds, such that treatment stays treatment and vice versa with the control group.

Could you please comment on why I don't need a fixed effects version in an experiment, i.e., why I don't need to use the -xt commands? I thought they were specifically designed for repeated observations of the same individuals.

Now I'm unsure whether I can combine time-varying and time-invariant independent variables in one regression, say pooled logit, vce(cluster ID). Could you kindly provide some insights?

Thank you so much for your attention and support.
Comment
Mary Burckhette

Join Date: May 2023

Posts: 35
#15

30 Jun 2023, 10:57

Originally posted by Carlo Lazzaro View Post

Mary:
1 indeed!! ;
2) the usual requirements hold: group-wise effect (please see 3)) and correct specification; I'm not aware of "tighter" methodological constraints;
3) I meant -LR test of rho=0- reaches statistical significance (caution: this statistic is availabale with default standard errors only);
4) going pooled logit may be a suboptimal choice if you have a group-wise effect. Usually, pooled regression give back results that are half way between the -fe- and the -re- specification.

Dear Carlo,

Thank you so much for your comment!

You mean if my rho statistic is close to 0, then I need to use panel commands and if rho is not close to zero, I am fine with pooled logit? My previous xtlogit, vce(cluster ID) reports a rho of around 0.41, meaning I can safely use pooled logit?

May I please also ask you about combining time-varying and time-invariant independent variables in one pooled logit regression - other than vce(cluster ID), would I need to specify anything in particular and obtain sensible results?

Thank you so much for your time and efforts!
Comment

Announcement

Xtlogit vs Logit , Cluster , observations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment