bootstrap doesn't work for handling endogenity for mlogit

arma ayat

Join Date: Jan 2020

Posts: 12
#1

bootstrap doesn't work for handling endogenity for mlogit

30 Jan 2020, 11:00

Hi Dear Statalist,
I want to handle the endogenity for mlogit. My endogenous variables are two variables(each of them has two levels), and my number of observations is 4807. According to the https://www.statalist.org/forums/for...ction-approach forum, professor Wooldridge confirms that the code about using bootstrap work well for a reasonable estimate of the standard errors for the multinomial logit model.
I implement these code for my work, but the Stata errors "insufficient observations to compute bootstrap standard errors no results will be saved". I read many forums about this problem, but I can't understand why this happens and how can I handle it. Please help me to solve it, I'm so beginner in Stata.

global ylist method
global xlist Age hh_size i.work i.marital i.gender i.educ i.region i.hh_income i.spend_reduced /*
*/ i.importance_attribute_payment i.cc_hasbal i.cc_reward i.cc_ratio i.cc_revolver i.dc_free end_cash_bal log_Amount /*
*/ EaseCC EaseDC CostCC CostDC RecordCC RecordDC

// set up the program including 1st and 2nd stage
gen merch_accep_cash_norm = merch_accep_cash-1
gen merch_accep_card_norm = merch_accep_card-1

program define my2sls
probit merch_accep_cash_norm $xlist
predict merch_accep_cash_Hat , pr
gen merch_accep_cash_residual = merch_accep_cash_norm - merch_accep_cash_Hat

probit merch_accep_card_norm $xlist
predict merch_accep_card_Hat , pr
gen merch_accep_card_residual = merch_accep_card_norm - merch_accep_card_Hat

mlogit $ylist $xlist merch_accep_card_residual merch_accep_cash_residual [pw=ind_weight] , baseoutcome(1)
end

// obtain bootstrapped standard errors
*bootstrap "sim" _b _se, reps(400) dots
bootstrap , reps(50): my2sls

Last edited by arma ayat; 30 Jan 2020, 11:03.
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2470
#2

30 Jan 2020, 13:01

Hi Arma
not sure where is the source of the problem and without a replicable example is difficult to track it down.
Let me share with you, however, one possible code for what you have in mind.

Code:

webuse union3, clear sum wage,d recode wage (1/3=1) (3/5=2) (5/8=3) (8/100=4), gen(wage_c) sum wage_c age grade smsa black tenure union south black tenure keep if union!=. mlogit wage_c age grade smsa black tenure union , base(1) capture program drop myboot program myboot, eclass probit union age grade smsa south black tenure **generalized residuals capture drop res predict res, score mlogit wage_c age grade smsa black tenure union res, base(1) end bootstrap: myboot

The main difference with your case is that there is only 1 endogenous variable (easily to amend), and that im using the generalized residual (Inverse Mills ratio) rather than the one you are using.
HTH
Fernando
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

31 Jan 2020, 10:01

In another post, you said that mlogit without bootstrapping is having some convergence trouble. As we discussed there, you have two binary variables where there are very few 0s. The problem of complete separation in regular logistic regression applies here.

I see that here, you are treating these two problematic variables as endogenous. You're fitting a probit model to them. I suspect the problem could be that in many iterations, either or both the probit regressions don't converge. What happens when you fit the probit regressions individually, outside of the bootstrap? Do they converge, or do you have complete separation?

(PS, I'm not sure how probit handles complete separation. The logistic commands would drop combinations of the categorical variables where complete separation occurs. Either way, you don't want this to be happening in your bootstrap model.)

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
arma ayat

Join Date: Jan 2020

Posts: 12
#4

06 Feb 2020, 00:15

1

Last edited by arma ayat; 06 Feb 2020, 00:18.
Comment

arma ayat

Join Date: Jan 2020
Posts: 12

06 Feb 2020, 03:33

Dear Ng, I appreciate for your attention to solve my problem.
I fitted the logistics individually outside the bootstrap that mlogit-bootstrap converged, but it had 3 or 4 complete separation.

Code:

logistic merch_accep_cash_norm $xlist
predict merch_accep_cash_Hat , pr
predict merch_accep_cash_residual , residuals

logistic merch_accep_card_norm $xlist
predict merch_accep_card_Hat , pr
predict merch_accep_card_residual , residuals

// set up the program including 1st and 2nd stage
capture program drop my2sls
program define my2sls
 
mlogit $ylist $xlist    merch_accep_card_residual   merch_accep_cash_residual[pw=ind_weight] , baseoutcome(1)

end
bootstrap  ,reps(100) dots : my2sls

I have run three different combinations of the [] _Hat and []_residuals for mlogit in the bootstrap(in below the post), and got different results. I have got confusing which results is true and trustworthy?

Code:

program define my2sls
    mlogit $ylist $xlist   merch_accep_card_Hat merch_accep_card_residual  merch_accep_cash_Hat merch_accep_cash_residual[pw=ind_weight] , baseoutcome(1)
end

Multinomial logistic regression                 Number of obs     =      4,476
                                                Replications      =        100
                                                Wald chi2(99)     =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -3144.8055               Pseudo R2         =     0.3064

Note: 4 observations completely determined.  Standard errors questionable.

Code:

program define my2sls
    mlogit $ylist $xlist   merch_accep_card_Hat merch_accep_cash_Hat  , baseoutcome(1)
end

Multinomial logistic regression                 Number of obs     =      4,693
                                                Replications      =        100
                                                Wald chi2(99)     =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -3493.8464               Pseudo R2         =     0.2685

doesn't have any "observations completely determined"

Code:

program define my2sls
    mlogit $ylist $xlist    merch_accep_card_residual   merch_accep_cash_residual[pw=ind_weight] , baseoutcome(1)
end

Multinomial logistic regression                 Number of obs     =      4,476
                                                Replications      =        100
                                                Wald chi2(99)     =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -3152.3648               Pseudo R2         =     0.3047

Note: 3 observations completely determined.  Standard errors questionable.

In my opinion, the third case is true, because by inserting the merch_accep_[cash/card]_residual the unobserved effect of merch_accep_cash/card has entered the mlogit that explain the net effect of merch_accep_cash/card.
I think inserting the merch_accep_[cash/card]_Hat into mlogit causes collinearity with other variables, because merch_accep_[cash/card]_Hat is fitted value on other variables,

Last edited by arma ayat; 06 Feb 2020, 03:41.

Comment

arma ayat

Join Date: Jan 2020
Posts: 12

06 Feb 2020, 03:50

Hi Dear Rios,
I ran your code for my work, but I got the same error.

Code:

capture program drop myboot
program myboot, eclass
    probit merch_accep_cash_norm $xlist
    **generalized residuals
    capture drop res
    predict merch_accep_cash_Hat, pr
    predict merch_accep_cash_residual, score

    probit merch_accep_card_norm $xlist
    **generalized residuals
    capture drop res
    predict merch_accep_card_Hat, pr
    predict merch_accep_card_residual, score

    mlogit $ylist $xlist   merch_accep_cash_residual  merch_accep_card_residual[pw=ind_weight] , baseoutcome(1)
end
bootstrap: myboot

I fitted the probit outside the bootstrap, and mlogit converged and doesn't have any "observations completely determined". Therefore, I have no idea why this is happen?

Code:

probit merch_accep_cash_norm $xlist
**generalized residuals
capture drop res
predict merch_accep_cash_Hat, pr
predict merch_accep_cash_residual, score

probit merch_accep_card_norm $xlist
**generalized residuals
capture drop res
predict merch_accep_card_Hat, pr
predict merch_accep_card_residual, score
    
capture program drop myboot
program myboot, eclass
    mlogit $ylist $xlist   merch_accep_cash_residual  merch_accep_card_residual[pw=ind_weight] , baseoutcome(1)
end
bootstrap: myboot

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2470
#7

06 Feb 2020, 06:50

I don't think that is a "problem" but rather a feature of your data.
Keep in mind that bootstrap simply generates a sample with replacement. I can think, in the very worst-case scenario, where the bootstrap sample is constructed from the same observation N times. Any estimator I know of would have problems obtaining estimates from that data.
Back to your specific problem. It may simply be that you do not have enough variation and that doing the bootstrap leaves you with a sample where "observations completely determined" happens often.
Unfortunately i cannot say more without looking having access to the data.
HTH
Fernando
Comment
arma ayat

Join Date: Jan 2020

Posts: 12
#8

06 Feb 2020, 11:46

That's very kind of you for checking my data and I appreciate you. I attach my data and .do file. I don't know solutions to handle this problem anymore.
Attached Files

test.do (4.3 KB, 1 view)

stata_Jan2020_outAdopt.xlsx (1.64 MB, 1 view)
Comment

Announcement