Hi. I have a model that is theoretically and proven by prior researches has a potential endogeneity issue (corporate governance). Therefore I need to use instrument variables (IV) in running my regression.
I understand that stata is able in suggesting whether the variables/instruments are:
- endogenous (estat endog)
- strong/good (estat firststage)
- valid (estat overid) where this is for more than one IV available for one endogenous variable
After testing few variables to be the IV for my regression (there are 5 potential IV that I choose based on theories and my sampling condition), I noticed that there are few combinations of these IV that could give me a good result which fulfil the above 3 criteria.
For example, my chosen IV are x1, x2, x3, x4 and x5:
The 1st combination of x1, x3, x4 and x5 = meet all criteria.
The 2nd combination of x1, x3 and x5 also = meet all criteria
The 3rd combination of x2 and x5 = meet all criteria
The 4th combination of x1 and x2 = meet all criteria
I have also tested each of the IV separately (without any combination) and the variable is exogenous. However, I could not rely on this because the variables in my models are very much prone to endegeneity issue (accordingly to the theory)
My question are:
(1) How can I choose the best combination to use in my regression?
(2) If I can find one IV that could meet all criteria, can I just stop combining and just use that particular IV since the result is stable (stable means any combination of IV will also give the same result, hence I decided to take only one IV instead of a combination of few IV)?
(3) Is there any 'understanding' that using more IV will give a better model? (I don't think so because our aim is to get a stable model with a stable result - but I just want to clarify on this point)
Many thanks.
I understand that stata is able in suggesting whether the variables/instruments are:
- endogenous (estat endog)
- strong/good (estat firststage)
- valid (estat overid) where this is for more than one IV available for one endogenous variable
After testing few variables to be the IV for my regression (there are 5 potential IV that I choose based on theories and my sampling condition), I noticed that there are few combinations of these IV that could give me a good result which fulfil the above 3 criteria.
For example, my chosen IV are x1, x2, x3, x4 and x5:
The 1st combination of x1, x3, x4 and x5 = meet all criteria.
The 2nd combination of x1, x3 and x5 also = meet all criteria
The 3rd combination of x2 and x5 = meet all criteria
The 4th combination of x1 and x2 = meet all criteria
I have also tested each of the IV separately (without any combination) and the variable is exogenous. However, I could not rely on this because the variables in my models are very much prone to endegeneity issue (accordingly to the theory)
My question are:
(1) How can I choose the best combination to use in my regression?
(2) If I can find one IV that could meet all criteria, can I just stop combining and just use that particular IV since the result is stable (stable means any combination of IV will also give the same result, hence I decided to take only one IV instead of a combination of few IV)?
(3) Is there any 'understanding' that using more IV will give a better model? (I don't think so because our aim is to get a stable model with a stable result - but I just want to clarify on this point)
Many thanks.
Comment