Hello!
I am working with a cross-sectional dataset, my dependent variable is binary, and my primary explanatory regressor is an index. I wanted to start with the most basic specification, Probit, as I was interested in explaining my results in terms of likelihood. However, I realised that even after adding relevant control regressors, unobservable village-level characteristics could affect both my primary dependent variable and regressor. Primarily, I wanted to compare the results from the above model to those resulting from village fixed effects. But I stumbled upon a lot of literature that warns you about using Probit Fixed Effects and how it could produce biased estimates. Could someone please suggest a way out? Do I change my empirical strategy? LPM or Logit?
I saw two papers with potential solutions - https://discovery.ucl.ac.uk/id/eprin...672/1/main.pdf and http://www.kevinstaub.com/ewExternal...Winkelmann.pdf - which propose a solution but both in regards to panel data. Additionally, it isn't clear how large N has to be for these to be applicable.
N = 43,000 HHs, and I am using data from one of the survey waves. The survey had two waves (six years apart)
Additionally, my primary regressor is endogenous. So, I would have to use an instrument ideally. Does your suggestion change in that case?
Thank you very much!
I am working with a cross-sectional dataset, my dependent variable is binary, and my primary explanatory regressor is an index. I wanted to start with the most basic specification, Probit, as I was interested in explaining my results in terms of likelihood. However, I realised that even after adding relevant control regressors, unobservable village-level characteristics could affect both my primary dependent variable and regressor. Primarily, I wanted to compare the results from the above model to those resulting from village fixed effects. But I stumbled upon a lot of literature that warns you about using Probit Fixed Effects and how it could produce biased estimates. Could someone please suggest a way out? Do I change my empirical strategy? LPM or Logit?
I saw two papers with potential solutions - https://discovery.ucl.ac.uk/id/eprin...672/1/main.pdf and http://www.kevinstaub.com/ewExternal...Winkelmann.pdf - which propose a solution but both in regards to panel data. Additionally, it isn't clear how large N has to be for these to be applicable.
N = 43,000 HHs, and I am using data from one of the survey waves. The survey had two waves (six years apart)
Additionally, my primary regressor is endogenous. So, I would have to use an instrument ideally. Does your suggestion change in that case?
Thank you very much!
Comment