I am using time-invariant binary DV having continuous IVs. the DV is whether a financial services company (may be a bank or any other) offer venture capital (VC) (dummy 1) or not (0). Once a company offers VC, it does so during the entire time of the data for each company and vice versa (i.e. time invariant). All IVs are continuous and most of them are time-variant. I have above 100,000 observations and more than 4500 companies operating in 60+ countries (the actual observations get reduce to around 35000 because not all variables have data for all groupd and t). IVs include company characteristics (such as return, size, debt) and country level characteristics (i.e. GDP, R&D, Financial development and so forth). I tried logistic reg but the Pseudo R2 is too low.
When I run simple pooled OLS, the R-squared is well below 0.1 while including variety of variables. I am not sure if I should use xtlogit re or logit. I feel that the data requires logit because the DV is time invariant. When I run "between" regression, obviously, variables appear to be significant as expected because the DV is time-demeaned. I read in a book that if you are sure that there is no individual effects in your data (or normal OLS assumptions are not voilated) then use logit other xtlogit. When I run xtlogit, the results look strange (they are not sig either unexpectedly).
So would anybody give comments
1. which model would be appropriate? logit or xtlogit ,re?
2. should I use cluster pid (which is the company code) or cluster cid (country code) to tackle hetero or would it be useful to use only vce (robust) when in fact i have already taken size of companies and size of countries ?. (Importantly, cid works for logit but does not work for xtreg ,re)
My code is
The statafile is attached.
I apologize if I have not clarified things enough.
logit vc rdexpend lnintan lnass lnextdebt lnroa lndeps lnrgdpna lnemp irr lnxr lntax ef ,vce(cluster p
> id)
Iteration 0: log pseudolikelihood = -9214.4682
Iteration 1: log pseudolikelihood = -8856.4005
Iteration 2: log pseudolikelihood = -8790.3987
Iteration 3: log pseudolikelihood = -8790.3133
Iteration 4: log pseudolikelihood = -8790.3133
Logistic regression Number of obs = 32,976
Wald chi2(12) = 107.77
Prob > chi2 = 0.0000
Log pseudolikelihood = -8790.3133 Pseudo R2 = 0.0460
(Std. Err. adjusted for 4,497 clusters in pid)
Robust
vc Coef. Std. Err. z P>z [95% Conf. Interval]
rdexpend .1669236 .0852872 1.96 0.050 -.0002362 .3340833
lnintan .2751894 .0486187 5.66 0.000 .1798985 .3704803
lnass -.0208578 .026323 -0.79 0.428 -.0724499 .0307343
lnextdebt -.3319592 .0498951 -6.65 0.000 -.4297518 -.2341666
lnroa .0638105 .5441491 0.12 0.907 -1.002702 1.130323
lndeps .2949721 .2057622 1.43 0.152 -.1083144 .6982586
lnrgdpna -.1490657 .063379 -2.35 0.019 -.2732862 -.0248451
lnemp -.7388525 .474698 -1.56 0.120 -1.669244 .1915385
irr -1.309219 2.035959 -0.64 0.520 -5.299625 2.681188
lnxr -.102139 .0455526 -2.24 0.025 -.1914204 -.0128576
lntax .2445949 .3868812 0.63 0.527 -.5136783 1.002868
ef -.0248256 .0088889 -2.79 0.005 -.0422476 -.0074036
_cons 1.184486 2.692227 0.44 0.660 -4.092182 6.461154
> id)
Iteration 0: log pseudolikelihood = -9214.4682
Iteration 1: log pseudolikelihood = -8856.4005
Iteration 2: log pseudolikelihood = -8790.3987
Iteration 3: log pseudolikelihood = -8790.3133
Iteration 4: log pseudolikelihood = -8790.3133
Logistic regression Number of obs = 32,976
Wald chi2(12) = 107.77
Prob > chi2 = 0.0000
Log pseudolikelihood = -8790.3133 Pseudo R2 = 0.0460
(Std. Err. adjusted for 4,497 clusters in pid)
Robust
vc Coef. Std. Err. z P>z [95% Conf. Interval]
rdexpend .1669236 .0852872 1.96 0.050 -.0002362 .3340833
lnintan .2751894 .0486187 5.66 0.000 .1798985 .3704803
lnass -.0208578 .026323 -0.79 0.428 -.0724499 .0307343
lnextdebt -.3319592 .0498951 -6.65 0.000 -.4297518 -.2341666
lnroa .0638105 .5441491 0.12 0.907 -1.002702 1.130323
lndeps .2949721 .2057622 1.43 0.152 -.1083144 .6982586
lnrgdpna -.1490657 .063379 -2.35 0.019 -.2732862 -.0248451
lnemp -.7388525 .474698 -1.56 0.120 -1.669244 .1915385
irr -1.309219 2.035959 -0.64 0.520 -5.299625 2.681188
lnxr -.102139 .0455526 -2.24 0.025 -.1914204 -.0128576
lntax .2445949 .3868812 0.63 0.527 -.5136783 1.002868
ef -.0248256 .0088889 -2.79 0.005 -.0422476 -.0074036
_cons 1.184486 2.692227 0.44 0.660 -4.092182 6.461154
So would anybody give comments
1. which model would be appropriate? logit or xtlogit ,re?
2. should I use cluster pid (which is the company code) or cluster cid (country code) to tackle hetero or would it be useful to use only vce (robust) when in fact i have already taken size of companies and size of countries ?. (Importantly, cid works for logit but does not work for xtreg ,re)
My code is
logit vc rdexpend lnintan lnass lnextdebt lnroa lndeps lnrgdpna lnemp irr lnxr lntax ef ,vce(cluster pid)
I apologize if I have not clarified things enough.
Comment