Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrap Std. Errors for ATT, ATU, and other scaler after movestay; Endogenous. Switching Regression in Stata 14

    Hi all,
    I am new on this forum, and have searched through previous post to see if my question had been addressed earlier on, but I found none at least for now.

    Can anyone please help with below? A bit long:

    1). I need to estimate bootstrapped std. errors for the estimated ATT (Average treatment effect on the treated), ATU (Ave. treatment effect on the non-treated), etc.
    I ran these post-estimation commands after movestay

    . mspredict yc11, yc1_1 /*(predicted outcome of adopters should they have adopted...)*/

    . mspredict yc00, yc2_1 /*(predicted outcome of non-adopters if they had not adopted or participated....)*/

    .
    . mspredict yc10, yc1_2 /*(predicted outcome of non-adopters had they adopted or participated......)*/

    .
    . mspredict yc01, yc2_2 /*(predicted outcome of adopters had they not adopted or participated.... */

    Here are example outputs below:
    mean yc11 = x1
    mean yc01 = x2
    mean yc10 = x3
    mean yc00 = x4

    . scalar myATT = y11-y01 = ##, say 4

    . scalar myATU = y10-y00 = ##, say 2,5

    . scalar myBH1 = y11-y10 = ##, say 1,5

    . scalar myBH0 = y01-y00 = ##, say 1.9

    . scalar myTH = myATT - myATU = ##, say 0.8

    replace yc11=exp(yc11)
    (172 real changes made)

    . replace yc01=exp(yc01)
    (636 real changes made)

    . replace yc10=exp(yc10)
    (636 real changes made)

    . replace yc00=exp(yc00)
    (172 real changes made)

    I manually obtained the values above. Hence I need their standard errors for the difference.

    I then hit put excel to transfer to excel so I could get the calculated scalers with appropriate std. errors:

    2). Hence, my second question is:
    How to effectively transfer these scalers from estimated commands in movestay to excel using putexcel? I tried the putexcel command below in Stata 14, but it didn't get the saved data into excel after my initial scalers directly from movestay. I wanted to have them all in excel for efficient formatting.

    Here is the code I ran for putexcel:

    . putexcel set EstimateLogYield,modify

    .
    . putexcel A1=("myATT") B1=("myATU") C1=("myBH1") D1=("myBH0") E1=("myTH") F1=("y11") G1=("y01") H1=("y10") I1=("y00")

    file EstimateLogYield.xlsx saved
    .
    . putexcel A2=(scalar(myATT))

    file EstimateLogYield.xlsx saved

    .
    . putexcel B2=(scalar(myATU))
    file EstimateLogYield.xlsx saved

    etc... .......

    . ereturn list

    scalars:
    e(df_r) = 171
    e(N_over) = 1
    e(N) = 172
    e(k_eq) = 1
    e(rank) = 1

    macros:
    e(cmdline) : "mean yc00"
    e(cmd) : "mean"
    e(vce) : "analytic"
    e(title) : "Mean estimation"
    e(estat_cmd) : "estat_vce_only"
    e(varlist) : "yc00"
    e(marginsnotok) : "_ALL"
    e(properties) : "b V"
    e(depvar) : "Mean"

    matrices:
    e(b) : 1 x 1
    e(V) : 1 x 1
    e(_N) : 1 x 1
    e(error) : 1 x 1

    functions:
    e(sample)


    Finally, I hit

    . matrix list myATT myATU myBH1 myBH0 myTH

    Stata says:

    matrix myATT not found
    r(111);

    end of do-file

    r(111);

    Then, I hit matrix list y11 y01 y10 y00
    And Stata says:
    matrix y11 not found
    r(111);


    Thanks much.
    Festus Amadu

  • #2
    Well, before even approaching the question of what is going on with -putexcel-, the premise that you need to move numbers from Stata to Excel in order to calculate standard errors is just wrongheaded. In fact, you should never use Excel for analysis. Excel is fine for displaying things intended for human eyes to read in various layouts that Stata can't readily create. It is also fine for sharing data sets among people who use different statistical packages. But calculations carried out in Excel leave no audit trail; there is no way to verify what has been done or whether it's correct.

    So if the goal is to get standard errors for those yc variables it's just:

    Code:
    tabstat yc*, statistics(semean)
    .

    Comment


    • #3
      Thanks a lot. Indeed it worked out. Infact, just from hitting sum tt.

      Now how do I obtain the p-values?
      I tried the following commands:

      tabstat tt*, statistics(p-value)
      tabstat tt*, statistics(pvalue) )
      tabstat tt*, statistics(p)

      They all didn't work.

      Stata reports:
      unknown statistic: p-value
      r(198);

      unknown statistic: pvalue
      r(198);

      unknown statistic: p


      Comment


      • #4
        I also tried:
        tabstat tt*, statistics (2*ttail(df, abs(t)))
        But Stata reports:
        unknown statistic: 2*ttail(df,
        r(198);

        Comment


        • #5
          No, -tabstat- doesn't so that. It just provides simple statistics on variables. But what do you mean by getting the p-values here? What p-values? Testing what hypothesis?

          Comment


          • #6
            The hypothesis is that the yc are significantly different from zero. These ycs are program impact statistics. The null hypothesis is that they are not significantly different from zero. So we need p-values to help decide if we could reject the null or not.

            Comment


            • #7
              OK. In that case, you really didn't need to calculate the standard errors separately.

              Code:
              foreach v of varlist tt* {
                  display _newline(3) "`v'"
                  ttest `v' = 0
              }
              If you want the two-tailed t-test it is the one that you will find in the middle of the last couple of lines of the output from -ttest-.

              I will not go on my long rant about testing program effects for statistical significance, but I will give you the short version. In the real world, there is almost no such thing as a program that has zero effect. Unless you are truly in the rare situation where a zero effect of a program is plausible, rejecting the null hypothesis that the effect is zero tells you nothing you didn't know before you even bothered to gather the data. And if you get a non-significant result then you just know that your data was not precise enough to actually distinguish the effect from zero, but you have no support for the hypothesis that the effect actually is zero.

              More important, if you are a decision maker deciding whether to actually implement one of these programs in a real world setting, knowing that its effect was "statistically significant" does not help you. Even knowing with high confidence that the effect is non-zero doesn't help you. The program has certain costs, and will be worth those costs only if its effectiveness is sufficiently large. So what you really need to know is how effective it was, quantitatively, and how confident are we that it was that effective. That information can never be extracted from a p-value. That information lies in the regression coefficient itself and its 95% confidence interval.

              Comment


              • #8
                OK., Thanks Clike,
                I didn't know I needed to do a test for that; In that case, here is what I ran

                ttest tt, by(csa_adopt5)

                Two-sample t test with equal variances

                Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

                0 636 2.604751 0 0 2.604751 2.604751
                1 172 2.604751 0 0 2.604751 2.604751

                combined 808 2.604751 0 0 2.604751 2.604751


                The loop didn't work for me. I tried:

                foreach mean of tt ATU BH1 BH0 TH tt* {
                display _newline(3) "`v'"
                ttest `v' = 0 }

                Stata says:
                invalid syntax
                r(198);



                I also tried:
                tabstat tt ATU BH1 BH0 TH, statistics(mean SD), and got the following:

                stats | tt ATU BH1 BH0 TH
                ---------+--------------------------------------------------
                mean | 2.604751 -11.60068 -.6384136 -14.84384 14.20543
                sd | 0 0 0 0 0
                ------------------------------------------------------------


                I think these values reflect the fact that my coefficients are already differences of two sets of variables (e.g., as in gen ATU = y10-y00 ) from analyses sent earlier above. Thus, the difference here based on the overall dependent variable doesn't matter. Rather, we probably need a way to estimate the validity of this coefficient as a "stand alone variable" .

                Hence, for now, I think what I really need is how to interpret the Std. Dev. of the coefficients in the absence of an estimates Pvalue.
                E.g.,: 2.604751, with an SD of 0 mean what?

                I want to know if I could actually get the precise level of values such as "0.000", "0.003", "0.029" ; 0.032" etc. That way, I could "manually say that the coefficient is significant at 1%, 5%, and 10% respectively.

                Can you kindly show me the relationship between Standard Deviation and Pvalue in terns of coefficient estimates?

                Thanks!

                Comment


                • #9
                  foreach mean of tt ATU BH1 BH0 TH tt* {
                  display _newline(3) "`v'"
                  ttest `v' = 0 }
                  That is not the code I suggested. Look back at the code block in #7 and follow it carefully. Adding ATU BH12 BH0 TH and tt (if there actually are variables by that name) is fine, but the word mean does not belong there: it should be v, and you left out the keyword varlist. Details in coding are important. It seems you are rushing around trying haphazard pieces of code without understanding of what they are or how they work. You make carelessly inaccurate modifications to working code, so that you break it.

                  So let's stop that as you will only be wasting your own time and that of others. Sit down and familiarize yourself with the basics of Stata data management and analysis by reading the Getting Started [GS] and User's Guide [U] volumes of the PDF manuals that come with your Stata installation. While you won't be able to remember all of the detail, you will gain a sense of what the commonly used commands are and how they look generally. You will also get a sense of how Stata approaches data management and analysis. Then, when confronting problems, you will be able to think of appropriate commands and then refresh yourself on the details by using the help files or the PDF manuals.

                  Hence, for now, I think what I really need is how to interpret the Std. Dev. of the coefficients in the absence of an estimates Pvalue.
                  E.g.,: 2.604751, with an SD of 0 mean what?
                  When the standard deviation of a variable is zero, that means that the "variable" is, in fact, a constant. It means that that variable takes on the same value, 2.604751 in every observation (except any where it may be missing.)

                  Comment


                  • #10
                    Thanks Clyde. Yea, it works now. My bad.

                    foreach v of varlist tt* {
                    2. display _newline(3) "`v'"
                    3. ttest `v' = 0
                    4. }


                    tt

                    One-sample t test

                    Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

                    tt 808 2.604751 0 0 2.604751 2.604751

                    mean = mean(tt) t = .
                    Ho: mean = 0 degrees of freedom = 807

                    Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
                    Pr(T < t) = . Pr(T > t) = . Pr(T > t) = .


                    I will definitely review the PDFs again.

                    But in this case, what could possibly be the reason for the Pvalue as is?: Pr(T > t) = .

                    Comment


                    • #11
                      Well, looking at the summary statistics for tt, all 808 observations have the same value, 2.604751. The standard deviation is zero. So the standard error is zero, and the t-statistic is undefined. Something is wrong with your "variable" tt: it is just a constant. So you need to go back and see how you created that variable in the first place and, if it isn't actually supposed to be a constant, fix that problem. If it is supposed to be a constant, then it is meaningless to talk about its standard error or a test of whether or not it is zero.

                      Comment


                      • #12
                        Thank you. Got it.

                        Comment

                        Working...
                        X