Dear Stata-ers,
I am currently attempting to run a endogenous switching regression (ESR), using the movestay and mspredict commands with Stata 16 on windows 10. My methods are based on Lokshin and Sajaia (2004)
paper link: https://www.stata-journal.com/sjpdf....iclenum=st0071
My research is examining the effect of adopting improved storage technologies on per capita household consumption in Tanzania, with ca. 700 observations and ca. 20 explanatory variables related to household characteristics. Here is a dataex output of my data, for any who want to try it out. It contains less variables than the origninal, but the problem is replicable. Sorry for the length, it needed to in order to replicate.
Overall, the model seems to work fine.
I run:
However, when i attempt to estimate the average treatment effect of adoption on the untreated (ATU), the estimated counterfactual (i.e. predicted consumption for non-adopter households, had they chosen to adopt; coded as yc1_2), is consistently negative.
As such, my model seems to indicate that if non-user households would adopt, their expected consumption would be negative.
I have been trying to wrap my head around how my model would estimate negative consumption (intuitively quite impossible).
I have been trying to address this issue by selectively dropping variables that have negative coefficients, to no avail. Additionally, I dislike simply cherry picking the variables i want to discard, so I would like to find a sound solution to this issue.
I have attached the ESR model output and the counterfactual sum, as well as hopefully it helps.
I am unsure how to proceed and believe that reporting such results is not feasbile, due to their seeming impossibility.
Any advice on what steps to take or explanations on how this result came to be or even on interpreting this (perhaps negative consumption makes sense?!) would be immensely appreciated.
Kind regards;
Joachim
I am currently attempting to run a endogenous switching regression (ESR), using the movestay and mspredict commands with Stata 16 on windows 10. My methods are based on Lokshin and Sajaia (2004)
paper link: https://www.stata-journal.com/sjpdf....iclenum=st0071
My research is examining the effect of adopting improved storage technologies on per capita household consumption in Tanzania, with ca. 700 observations and ca. 20 explanatory variables related to household characteristics. Here is a dataex output of my data, for any who want to try it out. It contains less variables than the origninal, but the problem is replicable. Sorry for the length, it needed to in order to replicate.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float treat double land_own float(hhsize distance_inputs assets_PC rain) byte sex float age byte education float(member_savings govn_trust dst_groups pc_totcost) 1 3.5 1 24 40000 2 0 61 9 0 1 8 146000 1 7 5 45 60000 1 1 73 7 0 0 5 148800 1 8 4 18 1250000 1 1 34 7 0 1 0 204750 1 6.5 2 36 100000 3 1 75 0 0 1 24 126000 1 4 3 39 43333.33 5 1 73 0 1 0 0 170666.67 0 2.5 4 30 1250000 4 1 26 7 0 1 13 72000 1 3 10 24 1000 2 1 60 0 0 0 16 44750 1 8 7 24 12142.857 2 1 51 7 0 0 16 78857.14 1 5 2 39 450000 4 1 60 0 0 0 14 378000 1 7 3 10.2 2666.667 4 1 80 0 0 1 12 34000 1 28 9 10.5 222222.2 4 1 57 7 0 0 8.5 19733.334 1 26 8 36 5000000 3 1 46 7 1 0 24 185000 1 13.5 10 39 130000 5 1 46 0 0 0 0 40000 1 2 3 39 1333.3334 4 0 90 0 0 0 0 86666.66 0 13 5 45 96000 4 1 62 4 0 0 1 74400 0 11 8 30 437500 4 1 47 7 0 1 14 41750 0 8.5 5 48 240000 1 1 45 0 0 1 0 33480 1 12.5 8 6 187500 2 1 68 4 1 0 0 20437.5 0 6.5 8 156 31250 4 1 60 0 0 1 52 45000 0 5 4 156 43750 4 1 44 7 0 1 52 52000 0 13.25 9 90 27777.78 4 1 80 0 0 0 2.5 65777.78 1 4 4 4.5 62500 1 1 32 7 0 1 0 65500 1 18.5 8 156 4750000 5 0 56 12 0 0 104 108000 1 13 8 156 26250 1 1 65 0 1 0 0 535600 1 5 9 4 13333.333 5 1 33 7 0 0 0 14222.223 0 2.5 4 4.5 50000 2 1 29 7 0 0 0 34500 1 14 9 120 333333.3 1 1 54 4 1 0 0 108311.1 1 13 13 4.5 461.53845 4 1 47 7 1 0 0 10153.846 1 4 6 36 233333.33 3 1 36 7 1 0 8 44000 1 9 9 111 333333.3 2 1 50 7 0 0 0 36444.445 1 8 11 93 2727.273 2 1 43 7 0 0 62 20272.727 1 6 7 93 10000 2 1 31 4 0 0 62 36285.715 1 10.5 14 93 10714.286 1 1 51 7 0 1 0 80857.14 0 11 2 54 10000 2 1 76 0 1 0 36 36000 1 1.5 8 93 62500 1 1 38 0 0 1 0 23250 1 13.5 13 69 1923.077 2 1 63 4 1 0 46 16830.77 1 11.5 6 99 5833.333 3 1 67 4 1 0 0 797333.3 0 4 4 93 112500 4 1 34 7 0 1 36 35800 1 8 8 93 112500 1 1 65 0 0 0 0 11375 0 8 7 93 10714.286 1 1 60 0 0 1 0 66857.14 1 5.5 6 81 16666.666 2 1 36 2 0 0 54 60666.67 1 54 11 81 5454546 2 1 68 3 0 0 54 44727.27 0 16 9 63 33333.332 1 1 44 7 1 0 0 108444.45 0 5 6 63 100000 1 1 51 7 1 0 0 51200 1 8 10 84 700000 3 1 63 0 1 0 6 70200 0 3.5 8 93 37500 4 1 40 7 0 1 32 55000 0 13 11 54 18181.818 4 1 43 7 0 1 19.5 46909.09 0 8.5 4 96 125000 1 1 60 4 0 0 0 84000 0 2 5 120 60000 1 1 29 0 0 0 0 68000 0 3 5 126 16000 1 1 26 7 0 0 0 36960 0 0 5 105 300 1 0 31 0 0 0 0 57200 0 16 6 111 50000 4 1 40 7 0 0 0 24666.666 1 2.5 3 105 50000 3 0 55 7 0 0 70 154666.67 1 3.5 8 96 25000 2 1 45 7 0 1 0 45000 1 6 9 111 22222.22 1 1 58 4 0 0 74 23333.334 1 9 3 96 266666.66 2 1 80 4 0 0 64 109333.34 1 5 8 105 25000 3 1 38 7 0 0 37 108000 1 6 4 18 125000 1 1 87 3 1 1 0 91125 0 12 10 111 15000 2 1 44 7 0 0 0 20220 0 5 12 111 8333.333 1 1 78 0 0 1 0 17766.666 1 5 11 63 27272.727 1 1 42 7 0 0 42 51490.91 1 10 6 15 16666667 2 1 58 7 1 0 10 197333.33 0 7 2 18 250000 2 1 69 9 0 1 12 513600 1 20 5 15 160000 1 1 57 10 1 1 10 156000 0 7 7 21 2714286 2 1 53 4 0 0 0 550571.44 1 6 9 81 16666.666 1 1 47 7 1 1 29 169777.8 0 5 7 18 314285.7 1 1 43 7 0 0 0 48000 1 4.5 6 9 13333.333 5 1 46 7 0 1 4 58666.67 1 21 5 9 40000 3 1 46 7 1 1 6 57600 0 6 7 30 80000 2 0 51 7 0 1 10 20571.43 1 2.5 4 21 2250000 1 0 70 0 0 0 . 144000 0 0 2 53 3500000 1 1 53 7 1 0 . 338500 0 1.5 5 15 1200000 5 1 37 7 1 0 . 81600 1 14 4 21 125000 1 1 39 7 0 1 . 104000 0 0 4 60 37500 2 1 29 7 0 0 . 230500 end
Overall, the model seems to work fine.
I run:
Code:
movestay pc_totcost land_own hhsize distance_inputs assets_PC /// rain sex age education member_savings govn_trust, select(treat=dst_groups ) iterate(50)
Code:
mspredict yc1_2, yc1_2 sum yc1_2
I have been trying to wrap my head around how my model would estimate negative consumption (intuitively quite impossible).
I have been trying to address this issue by selectively dropping variables that have negative coefficients, to no avail. Additionally, I dislike simply cherry picking the variables i want to discard, so I would like to find a sound solution to this issue.
I have attached the ESR model output and the counterfactual sum, as well as hopefully it helps.
I am unsure how to proceed and believe that reporting such results is not feasbile, due to their seeming impossibility.
Any advice on what steps to take or explanations on how this result came to be or even on interpreting this (perhaps negative consumption makes sense?!) would be immensely appreciated.
Kind regards;
Joachim