Logit wih fixed effects taking forever

Neg Kha

Join Date: Jun 2022

Posts: 68
#1

Logit wih fixed effects taking forever

14 Jun 2023, 06:40

Hi all,

I am trying to calculate the simple regression below:

Code:

logit female age i.office_id#i.year#i.d25, cluster(employee_id)

Outcome is the binary variable of the company's client being a female or not. The independent variable is the age of the employee of the company who is assigned to work with the client. For fixed effects, I have an interaction of the office the employee works at, year and d25 which is a binary variable (=1 if age of the client is above 25.)
I have also clustered at the employee level.

Now I have two questions:
1. Is it the correct way to include my fixed effects in a logit regression?
2. Why is it taking so so long to run?

I have about 1,400,000 clients, 3,362 unique employees, 190 offices (unique office ids), and 12 years.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(female employee_id office_id year age age_client d25) 1 111 40 2002 24 45 1 0 111 40 2002 24 29 1 1 112 41 2002 36 32 1 1 112 41 2003 37 23 0 1 112 41 2004 38 22 0 0 112 41 2004 38 23 0 0 113 41 2002 40 40 1 1 114 42 2006 20 37 1 0 114 42 2007 21 36 1 1 114 42 2007 21 19 0 0 115 43 2006 42 26 1 0 115 43 2006 42 29 1 1 116 41 2006 23 34 1 1 116 41 2007 24 42 1 end
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2490
#2

14 Jun 2023, 07:03

Hi Neg
In general, Maximum Likelihood/nonlinear models take longer to converge, because there is no "closed-form" solution to the problem. Instead, it needs to do an interactive process.\
Now, the more parameters you have, there more iterations Stata needs to find the correct solution. In your case, you have 190*12*2 parameters, which is a lot. On top of that, you have over a million observations. It would be no surprise that it takes a long time give you results.
Perhaps something you may want to consider, and may explain why your results take longer than expected to show anything. Make sure you have variation within each possible subgroup.
In other words, if you create a new variable that combines officeXyearXage_dummy, answer the following:
- do you still see both men and women across all subgroups?.
- Is the proportion roughly balanced between men and women (in other words, you do not have few groups with say 1 man 99 women. They will also be problematic.

A simple way of dealing with this. Estimate the model using OLS, and pay attention to groups that very high T-statistics (Low SE).

HTH
F
1 like
Comment
Neg Kha

Join Date: Jun 2022

Posts: 68
#3

14 Jun 2023, 07:51

Originally posted by FernandoRios View Post

Hi Neg
In general, Maximum Likelihood/nonlinear models take longer to converge, because there is no "closed-form" solution to the problem. Instead, it needs to do an interactive process.\
Now, the more parameters you have, there more iterations Stata needs to find the correct solution. In your case, you have 190*12*2 parameters, which is a lot. On top of that, you have over a million observations. It would be no surprise that it takes a long time give you results.
Perhaps something you may want to consider, and may explain why your results take longer than expected to show anything. Make sure you have variation within each possible subgroup.
In other words, if you create a new variable that combines officeXyearXage_dummy, answer the following:
- do you still see both men and women across all subgroups?.
- Is the proportion roughly balanced between men and women (in other words, you do not have few groups with say 1 man 99 women. They will also be problematic.

A simple way of dealing with this. Estimate the model using OLS, and pay attention to groups that very high T-statistics (Low SE).

HTH
F

Thank you Fernando!
You are right, for some of the subgroups, I do not have enough variation. Do you recommend dropping those and run an OLS?
By the way, what constitutes "enough variation"? Is there a threshold or rule of thumb?

Kind regards,
Negar
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2490
#4

14 Jun 2023, 07:57

As in most cases , it’s an empirical question
For example, what is the purpose of the regression ?
for most cases ols May work just fine.

alternatively, you could drop groups for which the fixed effects are too significant in the ols model. Say a t above 10?
if you look at the output you will notice them.

then try again using logit.

hth
1 like
Comment
Neg Kha

Join Date: Jun 2022

Posts: 68
#5

14 Jun 2023, 08:59

Originally posted by FernandoRios View Post

As in most cases , it’s an empirical question
For example, what is the purpose of the regression ?
for most cases ols May work just fine.

alternatively, you could drop groups for which the fixed effects are too significant in the ols model. Say a t above 10?
if you look at the output you will notice them.

then try again using logit.

hth

Thanks! I am using the reghdfe command to run the fe model. I know how to save the coefficients of the absorbed fixed effects but I do not have an idea of how to save their p-values or tstats so I can drop those with high values.
Do you know how?

Code:

eststo: reghdfe female age, absorb(i.office_id#i.year#i.d25,savefe) cluster(employee_id)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2490
#6

14 Jun 2023, 11:39

Right, you cant
Just like with logit, you need to expliclty estimate dummies (Dummy inclusion approach)
It will take a long time (huge matrix inversion), but will still be less time than doing the same with a logit model.
F
1 like
Comment
Neg Kha

Join Date: Jun 2022

Posts: 68
#7

16 Jun 2023, 05:28

Originally posted by FernandoRios View Post

Right, you cant
Just like with logit, you need to expliclty estimate dummies (Dummy inclusion approach)
It will take a long time (huge matrix inversion), but will still be less time than doing the same with a logit model.
F

Thanks Fernando. I have been struggling with it a bit. Do you know how I can save the standard errors of the dummies after I estimate them? Maybe there is a way I can save them in a matrix or in a variable so I can drop the big values after?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2490
#8

16 Jun 2023, 09:05

look into r(table).
All results from the regression estimates are stored there.
Otherwise, you can just use estout or outreg to export it to excel/word, and work from there
Comment

Announcement

Logit wih fixed effects taking forever

Comment

Comment

Comment

Comment

Comment

Comment

Comment