I have a large panel dataset with individual and parental characteristics- I will be running the regressions on 4 samples 2 of which have over 40,000 observations, and 2 which have under 200 observations. There are missing variables given the data is taken from a household survey, but given typical attrition concerns I don't expect it to be the main cause?
I am running xtreg for dependent variables that are continuous and xtlogit for dummies (probability of inactivity). See below my code for the regressions. To avoid overfitting I am running one regression to observe the impact of the individuals characteristics on the outcome variable, and a separate one to see the relationship between parental characteristics (using equivalent variables) and the individuals outcome variable.
I keep getting 'omitted because of collinearity' for many of the most important independents 'i.hiqual_dv' highest qualification, 'occupation_group' occupation worked in etc. I dont see how there could be so much collinearity between these groups when, say for occupation I aggregated about 30 occupations down to 7 groups. For some of the regressions STATA runs 300 iterations giving the identical log-likelihood followed by (not concave), ending in 'convergence not achieved'.
How could I fix the problems outlined here?

I am running xtreg for dependent variables that are continuous and xtlogit for dummies (probability of inactivity). See below my code for the regressions. To avoid overfitting I am running one regression to observe the impact of the individuals characteristics on the outcome variable, and a separate one to see the relationship between parental characteristics (using equivalent variables) and the individuals outcome variable.
I keep getting 'omitted because of collinearity' for many of the most important independents 'i.hiqual_dv' highest qualification, 'occupation_group' occupation worked in etc. I dont see how there could be so much collinearity between these groups when, say for occupation I aggregated about 30 occupations down to 7 groups. For some of the regressions STATA runs 300 iterations giving the identical log-likelihood followed by (not concave), ending in 'convergence not achieved'.
How could I fix the problems outlined here?
Comment