Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-stage regression model with categorical dependent varible and categorical endogenous variable

    Dear Statalist,

    I am running a model where my dependent variable (y1) has 4 categories. My problem is that one of my regressors (y2, a categorical variable with 4 categories) is also influenced by the other explanatory variables (household characteristics, $x1). I have already identified an instrumental variable (included in $x2), my base outcomes for the both variables of interest (y1 and y2) are the largest categories and I have already checked for IIA running two separate multinomial logistic models.

    I have tried different approaches to account for endogeneity, without success:

    I tried [cmp][/CODE]: [cmp (y1 =$x1) (y2= $x2), ind($cmp_mprobit $cmp_mprobit)][/CODE], but I had problems to make it converge:
    cannot compute an improvement -- discontinuous region encountered

    [convergence not achieved
    convergence not achieved
    r(430);]


    I also tried [gsem][/CODE] using latent variables to account for the correlation with the error terms: [gsem (i.y1<- $x1 L, mlogit) (i.y2<- $x2 L, mlogit), vce(cluster community) var(L@1)][/CODE]

    And I got the following error message:

    [initial values not feasible
    r(1400);]

    The [gsem] command runs properly if I exclude the latent variables: [gsem (i.y1y<- $x1 L, mlogit) (i.y2<- $x2 L, mlogit), vce(cluster community)]
    But I would be ignoring the endogeneity problem.


    My database has 320 observations and I am using Stata 14.

    I would appreciate any advice you could provide to move forward.

  • #2
    To complement the previous post and present the code properly:

    I have tried using the initial values both from the equation without the latent variables and using the noestimation option, but without success. I keep obtaining the following error message:

    Code:
    Refining starting values:

    Grid node 0: log likelihood = .
    Grid node 1: log likelihood = .
    Grid node 2: log likelihood = .
    Grid node 3: log likelihood = .
    (note: Grid search failed to find values that will yield a log likelihood value.)

    Fitting full model:

    initial values not feasible
    r(1400);

    Code:
    gsem (i.y1<- $x1, mlogit) (i.y2<- $x2, mlogit), vce(cluster community)
    matrix b = e(b)
    gsem (i.y1<- $x1 L, mlogit) (i.y2<- $x2 L, mlogit), vce(cluster community) var(L@1) from(b)

    ...
    Fitting fixed-effects model:

    Iteration 0: log likelihood = -990.45505
    Iteration 1: log likelihood = -957.79754 (backed up)
    Iteration 2: log likelihood = -792.03858
    Iteration 3: log likelihood = -554.69154
    Iteration 4: log likelihood = -490.22375
    Iteration 5: log likelihood = -489.69856 (backed up)
    Iteration 6: log likelihood = -474.27916 (backed up)
    Iteration 7: log likelihood = -461.6616
    Iteration 8: log likelihood = -456.60642
    Iteration 9: log likelihood = -451.35826
    Iteration 10: log likelihood = -450.59582
    Iteration 11: log likelihood = -450.57966
    Iteration 12: log likelihood = -450.57965

    Refining starting values:

    Grid node 0: log likelihood = .
    Grid node 1: log likelihood = .
    Grid node 2: log likelihood = .
    Grid node 3: log likelihood = .
    (note: Grid search failed to find values that will yield a log likelihood value.)

    Fitting full model:

    initial values not feasible
    r(1400);



    gsem (i.p1_foodsecurity<- $x1 L, mlogit) (i.gardentypes_recoded<- $x2 L, mlogit), vce(cluster community) var(L@1) noestimate
    matrix b = e(b)
    gsem (i.p1_foodsecurity<- $x1 L, mlogit) (i.gardentypes_recoded<- $x2 L, mlogit), vce(cluster community) var(L@1) from(b)

    ...
    Refining starting values:

    Grid node 0: log likelihood = .
    Grid node 1: log likelihood = .
    Grid node 2: log likelihood = .
    Grid node 3: log likelihood = .
    (note: Grid search failed to find values that will yield a log likelihood value.)

    Fitting full model:

    initial values not feasible
    r(1400);

    Comment


    • #3
      That is a very demanding model to fit, especially with so few observations, because each alternative other than the base alternatives gets its own equation in the model. So that's really 2*3=6 equations, each with its own error term, and at least in the cmp model, these are allowed to be correlated, so that's 21 cross-correlations to estimate among the 6.

      Oh, actually there's a more basic problem. Unless you specify IIA (which gets rid of the problem above by locking down the correlations to zero), you'll need for different alternatives to have distinctive sets of regressors. That's why Stata's mprobit command imposes IIA while asmprobit (alternative-specific mprobit) does not.

      So try:
      Code:
      cmp (y1 =$x1, iia) (y2= $x2, iia), ind($cmp_mprobit $cmp_mprobit) nolr

      Comment


      • #4
        Thank you for your advice I will work on simplying the model.

        Comment


        • #5
          In order to simplify the model I am trying to verify how big is the 'endogeneity problem'. I have been looking at different posts on endogeneity, but I have not found any clear answer on how to test for endogeneity in a logistic or multinomial probit with a categorical (4 categories) presumbly endogenous variable. I am not sure if the 'traditional OLS technique' of computing the residuals from the reduced equation and including them in the original equation works for model with categorical dependent variables.

          Comment

          Working...
          X