Dear Forum Users,
I am working on a binomial logistic regression with around 16mln observations (panel data). My model analyzes the investors' decision to invest. I use logit command and it involves fixed effects. My main fixed effects are investor and the firm (they invest in) IDs. There are more than 7k investor and 4k firm IDs. Running the whole data is very time-consuming. Therefore, I use random sampling to select 100k subsample out of my sample to test the model. It runs around 12-24hrs, depending on the number of independent variables and additional fixed effects. When I run a simple model without fixed effects the number of observations is around 94k. However, when I run the same model with firm ID fixed effects the number of observations drops to 10k. There are no missing values in ID dummies. Do you have any idea of why it is so few?
Kind regards,
Firangiz
I am working on a binomial logistic regression with around 16mln observations (panel data). My model analyzes the investors' decision to invest. I use logit command and it involves fixed effects. My main fixed effects are investor and the firm (they invest in) IDs. There are more than 7k investor and 4k firm IDs. Running the whole data is very time-consuming. Therefore, I use random sampling to select 100k subsample out of my sample to test the model. It runs around 12-24hrs, depending on the number of independent variables and additional fixed effects. When I run a simple model without fixed effects the number of observations is around 94k. However, when I run the same model with firm ID fixed effects the number of observations drops to 10k. There are no missing values in ID dummies. Do you have any idea of why it is so few?
Kind regards,
Firangiz
Comment