Reghdfe vs Xtlogit

Margarida Rodrigues

Join Date: Feb 2023

Posts: 6
#1

Reghdfe vs Xtlogit

19 Feb 2023, 04:25

Good morning.

I need to run a regression using both year and state fixed effects. I also need to cluster standard errors at a household level (variable ID). If possible, I would like to include family weights.
I have been using the command "reghdfe", but I just read online that this command cannot be used when the dependent variable is a binary variable. Is it true?
If it is, I should be using the "xtlogit" command which doesn't allow to have weights, and vcetype cluster is not allowed.

this is my initial command: reghdfe depvar indvars [weight=weight], absorb(year state) cluster(ID)

What would be the most correct approach?

Thank you!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

19 Feb 2023, 04:29

Margarida:
welcome to this forum:
1) you're right: the community-contributed command -.reghdfe- implies a continuous regressand;
2) -xtlogit- was deceloped for panel data regression when the regressand is a two-lebel categorical variable. The cluster-robust option for standard errors is available for the .re- specification only. If you go -fe-, you may want to consider the -boostrap- option for standard errors.

Kind regards,
Carlo
(Stata 19.0)
Comment
Margarida Rodrigues

Join Date: Feb 2023

Posts: 6
#3

19 Feb 2023, 04:47

Thank you for your quick reply.

Using "xtlogit" I would still have the problem with the family weights, since the command states that weights must be the same for all observations in a group. I have a panel data in which the family weights can change between years.

Also, using the bootstrap option for standard errors, how do I choose the adequate number of replications ?

Thank you
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

19 Feb 2023, 05:17

Margarida:
1) if you mean that you are dealing with an unbalanced panel dataset (that is, the number of observations differ across panels), this is not an issue for Stata. In addition, panel datasets are pretty frequently unbalanced;
2) 200 replications are enough for standard errors (see https://www.taylorfrancis.com/books/...ron-tibshirani, page 47).
On bootstrap in general, see also the valuable (and laudably coincise) textbook written by the authoritative Statalister Felix Bittmann: https://www.degruyter.com/document/d...110693348/html.

Kind regards,
Carlo
(Stata 19.0)
Comment
Margarida Rodrigues

Join Date: Feb 2023

Posts: 6
#5

19 Feb 2023, 05:52

Thank you for your answer.

Yes, I am dealing with an unbalanced panel dataset in which the family responses to a survey are collected in different years, but not all families answer in all years. I have tried to include the family weights in the regression, but I get an error message saying that weights must be constant within ID. How to address this issue?

I used the following code in order to bootstrap standard errors: xtlogit y x1 x2 x3 x4 x5 x6 x7 i.year i.state_num, fe vce(bootstrap ID) with 50 replications but it was taking a very long time. My dataset is composed of 85000 observations. Instead, I try to use: xtlogit i.year i.state_num, fe vce(jacknife) but it's taking a very long time as well. Is there a way to solve this?

Thanks in advance
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

19 Feb 2023, 06:06

Margarida:
1) I would deal with your panel dataset as it is. There well be sound reasons for famlies attrition during the time horizon the survey stretched over. If you try to fix the original structure of your dataset, you're inadvertently making up your data;
2) with such a sky-rocketing number of observations, no wonder that Stata takes forever. No way to increase the computational speed but reducing the numbero of bootstrap replications (as you already did). You can try with 25 replications, as it is the lower limit of the range suggested by https://www.taylorfrancis.com/books/...ron-tibshirani, page 47, but I'd stick with 50. The usual approach is to let your desk/laptop running during the night, hoping that the computational process is over when you are about to sip your first mug of coffee/tea/whatever the morning after .

Kind regards,
Carlo
(Stata 19.0)
Comment
Margarida Rodrigues

Join Date: Feb 2023

Posts: 6
#7

19 Feb 2023, 06:20

Thank you for your answers.

When I refer to the family weights, these are attributed taking into consideration that some families' answers are more representative than others. That's why I want to use the weights to compute the regressions.

Given that the only problem with the command "reghdfe" would be that my dependent variable is a binary variable, would there be a possible way to still use this command? I am just wondering because reghdfe depvar indvars [weight=weight], absorb(year state) cluster(ID) accounts for the family weights, can absorb two or more fixed effects, and standard errors are clustered at a household level. Additionally, running this regression takes only 1 second.

Thanks in advance
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#8

19 Feb 2023, 06:51

Margarida:
I see the issue.
Yes, you can estimate a so called linear probability model via -reghdfe- if you have a two-level categorical regressand.
The issue is that, why probability ranges between 0 and 1 (bounds included), -reghdfe- can give you back coefficients that do not respect this assumption (and legally so, as no logistic function is considered in -reghdfe-).
In addition, be sure to avoid mixing up panel regression with regression based on survey data, as they are two different beasts.

Kind regards,
Carlo
(Stata 19.0)
Comment
Margarida Rodrigues

Join Date: Feb 2023

Posts: 6
#9

19 Feb 2023, 07:14

I am sorry for being so demanding, but then reghdfe is not adequate and the right command should only be "xtlogit y x1 x2 x3 x4 x5 x6 x7 i.year i.state_num, fe vce(bootstrap)", correct? But another issue I encounter is that I cannot cluster standard errors at, for example, state level.

Also, how can I be sure to avoid mixing up panel regression with regression based on survey data? Should I only consider families who answer the survey every year? It doesn't sound correct. Or should I not consider the different weights into the regression?

I am sorry but I am getting very confused, as everything looked to be correct with the "reghdfe" command until I realized it can not be used with binary dependent variables. Would there be a possible way to use any kind of transformation in the regression in order to make reghdfe possible to use with binary dependent variables?

Thank you
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#10

19 Feb 2023, 07:48

Margarida:
1) your -xtlogit,fe- code is correct. I remind myself first that, due to the incidentala parameter bias (see, if interested, http://www.econ.brown.edu/Faculty/To...meters1948.pdf), -xtlogit,fe- implies conditional fixed effects (that differ from -fe- estimated via -xtreg- or -reghdfe-);
2) in a panel data regression, the same sample units are measured on the same set of variables at (theoretically) equally spaced time intervals (waves); in a survey, differerent units per each wave are measured (the inclusion of the same sample unit in >1 wave of data is a matter of casuality). In addition, surveys are very demanding as far as the sample size calculation is concerned;
3) again -reghdfe- can be used for linear probability model. No transformation is available for what you may have in mind, as far as I know.

Kind regards,
Carlo
(Stata 19.0)
Comment
Margarida Rodrigues

Join Date: Feb 2023

Posts: 6
#11

19 Feb 2023, 08:27

Thank you!

Additionally, what does it mean to have so many "x"?

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxx.xxx..x.xxxxxx.
Comment

Announcement

Reghdfe vs Xtlogit

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment