Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bootstrap doesn't work for handling endogenity for mlogit

    Hi Dear Statalist,
    I want to handle the endogenity for mlogit. My endogenous variables are two variables(each of them has two levels), and my number of observations is 4807. According to the https://www.statalist.org/forums/for...ction-approach forum, professor Wooldridge confirms that the code about using bootstrap work well for a reasonable estimate of the standard errors for the multinomial logit model.
    I implement these code for my work, but the Stata errors "insufficient observations to compute bootstrap standard errors no results will be saved". I read many forums about this problem, but I can't understand why this happens and how can I handle it. Please help me to solve it, I'm so beginner in Stata.

    global ylist method
    global xlist Age hh_size i.work i.marital i.gender i.educ i.region i.hh_income i.spend_reduced /*
    */ i.importance_attribute_payment i.cc_hasbal i.cc_reward i.cc_ratio i.cc_revolver i.dc_free end_cash_bal log_Amount /*
    */ EaseCC EaseDC CostCC CostDC RecordCC RecordDC


    // set up the program including 1st and 2nd stage
    gen merch_accep_cash_norm = merch_accep_cash-1
    gen merch_accep_card_norm = merch_accep_card-1

    program define my2sls
    probit merch_accep_cash_norm $xlist
    predict merch_accep_cash_Hat , pr
    gen merch_accep_cash_residual = merch_accep_cash_norm - merch_accep_cash_Hat

    probit merch_accep_card_norm $xlist
    predict merch_accep_card_Hat , pr
    gen merch_accep_card_residual = merch_accep_card_norm - merch_accep_card_Hat

    mlogit $ylist $xlist merch_accep_card_residual merch_accep_cash_residual [pw=ind_weight] , baseoutcome(1)
    end

    // obtain bootstrapped standard errors
    *bootstrap "sim" _b _se, reps(400) dots
    bootstrap , reps(50): my2sls
    Last edited by arma ayat; 30 Jan 2020, 11:03.

  • #2
    Hi Arma
    not sure where is the source of the problem and without a replicable example is difficult to track it down.
    Let me share with you, however, one possible code for what you have in mind.
    Code:
    webuse union3, clear
    sum wage,d
    recode wage (1/3=1) (3/5=2) (5/8=3) (8/100=4), gen(wage_c)
    sum wage_c age grade smsa black tenure union south black tenure
    keep if union!=.
    mlogit wage_c age grade smsa black tenure union  , base(1)
    
    capture program drop myboot
    program myboot, eclass
    probit union age grade smsa south black tenure 
    **generalized residuals
    capture drop res
    predict res, score
    mlogit wage_c age grade smsa black tenure union res, base(1)
    end
    bootstrap: myboot
    The main difference with your case is that there is only 1 endogenous variable (easily to amend), and that im using the generalized residual (Inverse Mills ratio) rather than the one you are using.
    HTH
    Fernando

    Comment


    • #3
      In another post, you said that mlogit without bootstrapping is having some convergence trouble. As we discussed there, you have two binary variables where there are very few 0s. The problem of complete separation in regular logistic regression applies here.

      I see that here, you are treating these two problematic variables as endogenous. You're fitting a probit model to them. I suspect the problem could be that in many iterations, either or both the probit regressions don't converge. What happens when you fit the probit regressions individually, outside of the bootstrap? Do they converge, or do you have complete separation?

      (PS, I'm not sure how probit handles complete separation. The logistic commands would drop combinations of the categorical variables where complete separation occurs. Either way, you don't want this to be happening in your bootstrap model.)
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        1
        Last edited by arma ayat; 06 Feb 2020, 00:18.

        Comment


        • #5
          Dear Ng, I appreciate for your attention to solve my problem.
          I fitted the logistics individually outside the bootstrap that mlogit-bootstrap converged, but it had 3 or 4 complete separation.

          Code:
          logistic merch_accep_cash_norm $xlist
          predict merch_accep_cash_Hat , pr
          predict merch_accep_cash_residual , residuals
          
          logistic merch_accep_card_norm $xlist
          predict merch_accep_card_Hat , pr
          predict merch_accep_card_residual , residuals
          
          // set up the program including 1st and 2nd stage
          capture program drop my2sls
          program define my2sls
           
          mlogit $ylist $xlist    merch_accep_card_residual   merch_accep_cash_residual[pw=ind_weight] , baseoutcome(1)
          
          end
          bootstrap  ,reps(100) dots : my2sls
          I have run three different combinations of the [] _Hat and []_residuals for mlogit in the bootstrap(in below the post), and got different results. I have got confusing which results is true and trustworthy?



          Code:
          program define my2sls
              mlogit $ylist $xlist   merch_accep_card_Hat merch_accep_card_residual  merch_accep_cash_Hat merch_accep_cash_residual[pw=ind_weight] , baseoutcome(1)
          end
          
          Multinomial logistic regression                 Number of obs     =      4,476
                                                          Replications      =        100
                                                          Wald chi2(99)     =          .
                                                          Prob > chi2       =          .
          Log pseudolikelihood = -3144.8055               Pseudo R2         =     0.3064
          
          Note: 4 observations completely determined.  Standard errors questionable.
          Code:
          program define my2sls
              mlogit $ylist $xlist   merch_accep_card_Hat merch_accep_cash_Hat  , baseoutcome(1)
          end
          
          Multinomial logistic regression                 Number of obs     =      4,693
                                                          Replications      =        100
                                                          Wald chi2(99)     =          .
                                                          Prob > chi2       =          .
          Log pseudolikelihood = -3493.8464               Pseudo R2         =     0.2685
          
          doesn't have any "observations completely determined"
          Code:
          program define my2sls
              mlogit $ylist $xlist    merch_accep_card_residual   merch_accep_cash_residual[pw=ind_weight] , baseoutcome(1)
          end
          
          Multinomial logistic regression                 Number of obs     =      4,476
                                                          Replications      =        100
                                                          Wald chi2(99)     =          .
                                                          Prob > chi2       =          .
          Log pseudolikelihood = -3152.3648               Pseudo R2         =     0.3047
          
          Note: 3 observations completely determined.  Standard errors questionable.


          In my opinion, the third case is true, because by inserting the merch_accep_[cash/card]_residual the unobserved effect of merch_accep_cash/card has entered the mlogit that explain the net effect of merch_accep_cash/card.
          I think inserting the merch_accep_[cash/card]_Hat into mlogit causes collinearity with other variables, because merch_accep_[cash/card]_Hat is fitted value on other variables,
          Last edited by arma ayat; 06 Feb 2020, 03:41.

          Comment


          • #6
            Hi Dear Rios,
            I ran your code for my work, but I got the same error.
            Code:
            capture program drop myboot
            program myboot, eclass
                probit merch_accep_cash_norm $xlist
                **generalized residuals
                capture drop res
                predict merch_accep_cash_Hat, pr
                predict merch_accep_cash_residual, score
            
                probit merch_accep_card_norm $xlist
                **generalized residuals
                capture drop res
                predict merch_accep_card_Hat, pr
                predict merch_accep_card_residual, score
            
                mlogit $ylist $xlist   merch_accep_cash_residual  merch_accep_card_residual[pw=ind_weight] , baseoutcome(1)
            end
            bootstrap: myboot
            I fitted the probit outside the bootstrap, and mlogit converged and doesn't have any "observations completely determined". Therefore, I have no idea why this is happen?
            Code:
            probit merch_accep_cash_norm $xlist
            **generalized residuals
            capture drop res
            predict merch_accep_cash_Hat, pr
            predict merch_accep_cash_residual, score
            
            probit merch_accep_card_norm $xlist
            **generalized residuals
            capture drop res
            predict merch_accep_card_Hat, pr
            predict merch_accep_card_residual, score
                
            capture program drop myboot
            program myboot, eclass
                mlogit $ylist $xlist   merch_accep_cash_residual  merch_accep_card_residual[pw=ind_weight] , baseoutcome(1)
            end
            bootstrap: myboot

            Comment


            • #7
              I don't think that is a "problem" but rather a feature of your data.
              Keep in mind that bootstrap simply generates a sample with replacement. I can think, in the very worst-case scenario, where the bootstrap sample is constructed from the same observation N times. Any estimator I know of would have problems obtaining estimates from that data.
              Back to your specific problem. It may simply be that you do not have enough variation and that doing the bootstrap leaves you with a sample where "observations completely determined" happens often.
              Unfortunately i cannot say more without looking having access to the data.
              HTH
              Fernando

              Comment


              • #8
                That's very kind of you for checking my data and I appreciate you. I attach my data and .do file. I don't know solutions to handle this problem anymore.
                Attached Files

                Comment

                Working...
                X