Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Model selection for binary time-invariant dependent variable in panel data/tackling heteroschedasticity at multi-levels

    I am using time-invariant binary DV having continuous IVs. the DV is whether a financial services company (may be a bank or any other) offer venture capital (VC) (dummy 1) or not (0). Once a company offers VC, it does so during the entire time of the data for each company and vice versa (i.e. time invariant). All IVs are continuous and most of them are time-variant. I have above 100,000 observations and more than 4500 companies operating in 60+ countries (the actual observations get reduce to around 35000 because not all variables have data for all groupd and t). IVs include company characteristics (such as return, size, debt) and country level characteristics (i.e. GDP, R&D, Financial development and so forth). I tried logistic reg but the Pseudo R2 is too low.
    logit vc rdexpend lnintan lnass lnextdebt lnroa lndeps lnrgdpna lnemp irr lnxr lntax ef ,vce(cluster p
    > id)

    Iteration 0: log pseudolikelihood = -9214.4682
    Iteration 1: log pseudolikelihood = -8856.4005
    Iteration 2: log pseudolikelihood = -8790.3987
    Iteration 3: log pseudolikelihood = -8790.3133
    Iteration 4: log pseudolikelihood = -8790.3133

    Logistic regression Number of obs = 32,976
    Wald chi2(12) = 107.77
    Prob > chi2 = 0.0000
    Log pseudolikelihood = -8790.3133 Pseudo R2 = 0.0460

    (Std. Err. adjusted for 4,497 clusters in pid)

    Robust
    vc Coef. Std. Err. z P>z [95% Conf. Interval]

    rdexpend .1669236 .0852872 1.96 0.050 -.0002362 .3340833
    lnintan .2751894 .0486187 5.66 0.000 .1798985 .3704803
    lnass -.0208578 .026323 -0.79 0.428 -.0724499 .0307343
    lnextdebt -.3319592 .0498951 -6.65 0.000 -.4297518 -.2341666
    lnroa .0638105 .5441491 0.12 0.907 -1.002702 1.130323
    lndeps .2949721 .2057622 1.43 0.152 -.1083144 .6982586
    lnrgdpna -.1490657 .063379 -2.35 0.019 -.2732862 -.0248451
    lnemp -.7388525 .474698 -1.56 0.120 -1.669244 .1915385
    irr -1.309219 2.035959 -0.64 0.520 -5.299625 2.681188
    lnxr -.102139 .0455526 -2.24 0.025 -.1914204 -.0128576
    lntax .2445949 .3868812 0.63 0.527 -.5136783 1.002868
    ef -.0248256 .0088889 -2.79 0.005 -.0422476 -.0074036
    _cons 1.184486 2.692227 0.44 0.660 -4.092182 6.461154
    When I run simple pooled OLS, the R-squared is well below 0.1 while including variety of variables. I am not sure if I should use xtlogit re or logit. I feel that the data requires logit because the DV is time invariant. When I run "between" regression, obviously, variables appear to be significant as expected because the DV is time-demeaned. I read in a book that if you are sure that there is no individual effects in your data (or normal OLS assumptions are not voilated) then use logit other xtlogit. When I run xtlogit, the results look strange (they are not sig either unexpectedly).
    So would anybody give comments
    1. which model would be appropriate? logit or xtlogit ,re?
    2. should I use cluster pid (which is the company code) or cluster cid (country code) to tackle hetero or would it be useful to use only vce (robust) when in fact i have already taken size of companies and size of countries ?. (Importantly, cid works for logit but does not work for xtreg ,re)
    My code is
    logit vc rdexpend lnintan lnass lnextdebt lnroa lndeps lnrgdpna lnemp irr lnxr lntax ef ,vce(cluster pid)
    The statafile is attached.
    I apologize if I have not clarified things enough.
    Attached Files

  • #2
    You did not get a quick answer. You will increase your chances of useful answer by following the FAQ on asking questions-provide Stata code with code delimiters, readable Stata output, and sample data using dataex. Please do not post files – many of us will not open files from people we don't know.

    Pseudo-R squared is not a criterion to decide whether you need a logit estimator. Given that pooled OLS also gives low R squared strongly suggest that your variables don't explain much of the variance in the dependent variable.

    It's not clear to me what you mean with the between regression sense normal panel estimators don't.

    Whether logit or xtlogit is appropriate depends a lot on your problem. It sounds like you have panel data so I would lean to a panel estimator. However, if the dependent variable does not vary within panels then you probably should go to a between estimator. You don't get the nice econometric properties of the within estimator, but the within estimator doesn't work on the between variations in the dependent variable.

    Comment


    • #3
      Thank you for your advice. Everytime i post, i go to FAQ but somehow, i have have gotten grasp of the things expected on the forum like code delimiters, readable Stata output, and sample data using dataex. Your comments are useful. I am reading relevant chapters of the econometric books as well, once i get more undertstanding, I will respond back with questions and might be self-answers. I will consider your opinion. The point is that the between estimation also give low r sqaured and its estimates are close to logit.

      Comment

      Working...
      X