Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrap placebo test

    Hi everyone,

    I am working with city-quarter level panel data. I have results with two-way fixed effects. Now I am trying to do a bootstrap placebo test. The main idea is that I randomly replace city i's observation delta sentiment_it with non-i cities delta sentiment_jt and treat it as the delta sentiment for city i. After replacing the sentiment for all cities, I use the placebo delta sentiment with the original controls and run the regression. Repeat this 1000 times. Then I get the summary statistics of 1000 estimated coefficients of placebo delta s stored. Consider the 5% level of significance, I should obtain the 5th and 95th percentiles of these 1000 values. If the value of 0 is between the 5th and 95th percentiles, then there is no effect of the placebo delta sentiment. That’s what I want to see.

    I have the code as following, which works without bootstrapping:

    Code:
    clear all
    use "H:\Documents\new.dta"
    cap prog drop sentiment
    prog define sentiment, eclass
    set tracedepth 1
    set trace on
    tempfile file1 file2 file3 file4
    tempvar random random1 similarity minsimilarity rsent
    //1.    Construct an original dataset with 35 cities, then generate a random number for each city in the original dataset . then rename id as oid, random as random1, then save this as a new original dataset;
    gen `random' = runiform()
    preserve
    rename id oid
    rename `random' `random1'
    save `file1'
    restore
    //2.    Go back to the original dataset.
    //3.    then "cross" the above 2 datasets (the new original dataset in step 1 and the new dataset in step 2), then delete those observations where oid from the first dataset = id from the second dataset. I will have oid!=id for all other cities, except the original city.
    cross using `file1'
    drop if oid==id
    //4.    For the remaining observations, for each oid, keep the othe city with the closest random number to that of original city that I would like to get a placebo.
    by oid, sort: gen `similarity'=abs(`random'-`random1')
    bysort oid (`similarity'): egen `minsimilarity'=min(`similarity')
    bysort oid (`similarity'): keep if `similarity'==`minsimilarity'
    save `file2'
    //5.    Now in my full dataset with all variables(including controls) , only keep sentiment variable and rename it to another name, say "rename dsentiment rsent".
    use "H:\Documents\allc_quarterly20220801"
    preserve
    keep id date dsentiment
    rename dsentiment`rsent'
    //6.    Merge the dataset from step 4 with the dataset from step 5 using id, save it as another dataset.
    merge m:m id using `file2'
    drop _merge
    save `file3'
    //7.    Go back to my full dataset, generate oid = id, merge with the dataset from step 6 using oid. Now I can run regression, and replace  sentiment by rsent, also  use oid as the city identifier
    restore
    gen oid=id
    merge m:m oid using `file3'
    keep if _merge==3
    xtset oid date
    quietly reghdfe dprice L.`rsent' L2.`rsent'  L_r_mr Ld_lrinc_urb Ld_lpopulation Ld_unemployment, absorb(i.`oid'#i.date) cluster(`oid' date)
    matrix `bols'=e(b)
    ereturn post `bols'
    save `file4'
    append using `file4'
    set trace off
    end
    
    //This step shows an error: insufficient observations to compute bootstrap standard errors. no results will be saved r(2000);
    bootstrap _b, reps(2) seed(101) nodrop: sentiment

    The code works perfect if I just run it one time without bootstrap. such as:
    Code:
    clear all
    use "H:\Documents\new.dta"
    cap prog drop sentiment
    prog define sentiment, eclass
    tempfile file1 file2 file3 file4
    tempvar random random1 similarity minsimilarity rsent
    gen `random' = runiform()
    preserve
    rename id oid
    rename `random' `random1'
    save `file1'
    restore
    cross using `file1'
    drop if oid==id
    by oid, sort: gen `similarity'=abs(`random'-`random1')
    bysort oid (`similarity'): egen `minsimilarity'=min(`similarity')
    bysort oid (`similarity'): keep if `similarity'==`minsimilarity'
    save `file2'
    use "H:\Documents\allc_quarterly20220801"
    preserve
    keep id date dsentiment
    rename dsentiment`rsent'
    merge m:m id using `file2'
    drop _merge
    save `file3'
    restore
    gen oid=id
    merge m:m oid using `file3'
    keep if _merge==3
    xtset oid date
    reghdfe dprice L.`rsent' L2.`rsent'  L_r_mr Ld_lrinc_urb Ld_lpopulation Ld_unemployment, absorb(i.`oid'#i.date) cluster(`oid' date)
    Can anyone please please kindly help me with the problem? I have been stuck here for more than one month. I will be more than grateful for your kindness and help!!!

    Thanks so so much!

    Here is the example data from "H:\Documents\new.dta":
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 city byte id
    "AKL"       1
    "WL"     2
    "CST"     2
    end


    Here is the example data from "H:\Documents\allc_quarterly20220801":
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int date byte id float(dprice dsentiment L_r_mr Ld_lrinc_urb Ld_lpopulation Ld_unemployment)
    200 1            .            .          .            .            .          .
    201 1            .   -.01131916          .            .            .          .
    202 1            .    .01322651          .  .0013923645            0          0
    203 1            .  -.005027771  -.7068357  -.005660057            0          0
    204 1            .   .008232117 -1.5204308   -.02193737            0          0
    205 1 -.0010962486   -.01004839   .2272103     .0995407   .028490543          0
    206 1  -.009660244  -.002628803   2.191338  -.005281448            0          0
    207 1 -.0088682175    .00585556   -.320951  -.010944366            0          0
    208 1   -.02238989    .01983404   .5980763  -.005010605            0          0
    209 1  -.001183033  -.012412548 -1.1360645    .08664703     .0248065  .20000005
    210 1   .006227016   .005579948   1.959168   .004014969            0          0
    211 1   .006478786   -.00948429  -.9121699   -.00091362            0          0
    212 1    .03963375  .0010023117 -.54806757  -.006374359            0          0
    213 1    .06563187   -.00825119  -1.366665    .08054638   .021749496   .0999999
    214 1    .03874493   .019511223  2.4422364  .0043697357            0          0
    215 1   .017433167 -.0031757355   -.936605  -.004992485            0          0
    216 1   .001876831  .0005965233  -.3265672  -.008236885            0          0
    217 1   .011615276 -.0011930466  -.5136318    .07195091   .017251492        -.8
    218 1  -.026113987  -.008791447  1.7759385  .0043621063            0          0
    219 1  -.029592514  .0078959465   -.802362 -.0036649704            0          0
    220 1  -.014689922  .0013928413 -.12866354  -.003651619            0          0
    221 1    .02667427   .002085209    -1.0516    .17486763   .008746147  .10000002
    222 1   .035609245   .008200645  1.0021987   .002752304            0          0
    223 1     .0271945  -.009391785 -1.2874904  -.006010056            0          0
    224 1    .02646208   -.00527811  .16857195 -.0009403229            0          0
    225 1    .08521032 -.0011987686  -1.764393    .06260681  .0011048317          0
    226 1     .0784669    .00935316  2.0893815  .0031137466            0          0
    227 1    .04408312  -.012757778  -.5805197  -.002693176            0          0
    228 1  -.009322166    .01631689 -.25982094  -.005281448            0          0
    229 1    .00605917  -.006038189  -.4068804    .07647705 -.0010128021 -.19999993
    230 1  -.008072853   -.00557518  1.2428503  .0030698776            0          0
    231 1  -.009937286  .0022935867   -.680347  -.003732681            0          0
    232 1   -.01192999   .002884388 -.28832388  -.006599426            0          0
    233 1   .010419846  -.002286911 -.57125187     .0734396  -.007630348        -.4
    234 1  -.005385399  -.002491951  1.9401388   .007092476            0          0
    235 1   .002374172 -.0043005943 -1.5820146   -.00871563            0          0
    236 1   .005366325   .008682251  .17789745  -.006949425            0          0
    237 1   .009283066   .016164303 -.22872114    .07344246 -.0002784729          0
    238 1 -.0003037453   -.02004719   .7904997 -.0013551712            0          0
    239 1  -.007632732  .0025901794  -.9840422   -.01093769            0          0
    240 1  -.007646084   .001987934  -.9602346   -.01989937            0          0
    241 1   .022103786    .01419735  .29094052   .006907463            .          .
    200 2            .            .          .            .            .          .
    201 2            .   -.02059841          .            .            .          .
    202 2            .   .018497467          .  .0013933182            0          0
    203 2            .  -.004818916  -.7068357  -.005661011            0          0
    204 2            .   .010112762 -1.5204308   -.02193737            0          0
    205 2   .009144783  -.009408474   .2272103    .09204006   .003889561  .14999986
    206 2   -.00808382   .009508133   2.191338  -.005280495            0          0
    207 2  -.009464264  -.005593777   -.320951  -.010944366            0          0
    208 2   -.02553892   .021602154   .5980763  -.005011559            0          0
    209 2  .0004696846  -.013720036 -1.1360645    .08357906   .006838799  .06000018
    210 2    .00584364  -.010088444   1.959168   .004014969            0          0
    211 2   .009218693  .0009031296  -.9121699   -.00091362            0          0
    212 2    .03869724   .012756824 -.54806757  -.006374359            0          0
    213 2     .0650301   -.02354717  -1.366665     .0799818   .006823063       -.26
    214 2   .033224583   .014594555  2.4422364  .0043706894            0          0
    215 2    .02205229  .0016970634   -.936605  -.004993439            0          0
    216 2    .00586319  -.007308483  -.3265672  -.008236885            0          0
    217 2   .007437706  .0012049675  -.5136318   .007986069    .01181984   .1099999
    218 2  -.034332752  -.012522697  1.7759385  .0043621063            0          0
    219 2   -.03365612   .004866123   -.802362 -.0036649704            0          0
    220 2  -.014978886   .022203445 -.12866354  -.003651619            0          0
    221 2   .020618916  -.005654335    -1.0516    .07372284   .031648636 -.05999994
    222 2    .03241682   .006938934  1.0021987  .0027513504            0          0
    223 2    .02555704  -.008232594 -1.2874904  -.006008148            0          0
    224 2   .016886711   .003777504  .16857195 -.0009412766            0          0
    225 2    .07488441   .013502598  -1.764393     .0685482    .03938818  .20000005
    226 2    .05706406  -.009343624  2.0893815  .0031137466            0          0
    227 2    .04396677  -.005449772  -.5805197  -.002693176            0          0
    228 2    .01858282   .007423878 -.25982094  -.005281448            0          0
    229 2    .04397154   .001183033  -.4068804    .07460594    .03187895  -.5000001
    230 2 .00021362305  -.005828857  1.2428503   .003068924            0          0
    231 2  -.015964985  .0040540695   -.680347 -.0037317276            0          0
    232 2   -.02060938 -.0005922318 -.28832388  -.006599426            0          0
    233 2    .01902008    .02573252 -.57125187    .06719875    .02761793  .25000012
    234 2   .012486935  -.016787529  1.9401388   .007091522            0          0
    235 2   .010631084  -.004216671 -1.5820146  -.008714676            0          0
    236 2   .029328823  -.012857437  .17789745  -.006950378            0          0
    237 2   .024834633   .008227825 -.22872114    .07192135   .026581764          0
    238 2 -.0019669533  -.015721798   .7904997 -.0013551712            0          0
    239 2   -.02457237  -.005127907  -.9840422  -.010936737            0          0
    240 2   -.01957035    .01867533  -.9602346   -.01990032            0          0
    241 2   .020431995   .001680851  .29094052   .032229424            .          .
    200 3            .            .          .            .            .          .
    201 3            .   .002714157          .            .            .          .
    202 3            . -.0008034706          .  .0013933182            0          0
    203 3            .   .007307053  -.7068357  -.005660057            0          0
    204 3            .   .006064892 -1.5204308  -.021938324            0          0
    205 3   .005142212  -.005865574   .2272103     .1050148    .01926899          0
    206 3  -.009325027  -.002795696   2.191338  -.005281448            0          0
    207 3  -.011177063   -.01075697   -.320951  -.010944366            0          0
    208 3   -.02442312   .016042233   .5980763  -.005010605            0          0
    209 3 -.0015792847  -.006485939 -1.1360645    .08742905    .01394701   .2000003
    210 3 .00040483475   .009266853   1.959168   .004014969            0          0
    211 3  -.003083706   .002377987  -.9121699   -.00091362            0          0
    212 3   .024474144  -.010944366 -.54806757  -.006374359            0          0
    213 3   .071223736  -.002002716  -1.366665     .0905161   .014480114          0
    214 3    .04963112   .006394863  2.4422364  .0043697357            0          0
    215 3   .022458553 -.0009965897   -.936605  -.004992485            0          0
    end
    format %tq date
    Last edited by Emma Mao; 03 Sep 2022, 22:39.

  • #2
    Although it probably is not the reason your -bootstrap- is failing, I can't help noticing
    Code:
    //6. Merge the dataset from step 4 with the dataset from step 5 using id, save it as another dataset.
    merge m:m id using `file2'
    -merge m:m- does not merge the two data sets. It creates a data salad pairing up observations that, typically are unrelated to each other. There are almost no correct uses of -merge m:m- in the universe. (I have been using Stata since 1994 and in all that time I have only encountered 1 situation where it could be applied and produce correct results--even then, there was a better way to do it.) When you find yourself tempted to use it, you should always think more carefully about what you are trying to do. Usually there is some other variable (or more than one) that you can add to the merge key variable list to use a 1:m, m:1 or 1:1 -merge- command. (Those are valid.) My guess is that in your case, what you really want here is -merge 1:1 id date- or something quite close to that.

    Alternatively, if what you need to do is to combine every observation with a given id in the master data with every observation having the same id in the user data, that is not done by -merge-: the correct command would be -joinby-. So you should fix this before you even think about getting the -bootstrap- to run: no point doing a thousand reps of code that just mangles your data into garbage.

    That said, once you fix that part of the code, I imagine you will still get an error message from -bootstrap-. The error message that -bootstrap- is giving you is telling you that your bootstrap replications are failing and no results are being collected. I cannot run your code, because it relies on a program allc_quarterly20220801, and you have not given any example data from that set. But in general when trying to troubleshoot -bootstrap-, the first thing you should do is add the -noisily- option to your -bootstrap- prefix so that you will get a running report of what is going on in each replication. That will include error messages that will point you to, if nothing else, where the problem is, and typically also what the problem is.

    If I had to guess, I would speculate that the -reghdfe- command is where you are getting errors because the estimation samples have too few observations to run the regression. This can happen when all the observations you think should be there get eliminated due to missing values. Those can be missing values in the original data, or, when using lagged values because the first lag is always missing for the first observation, and the first and second lags are always missing for the first two observations, etc.

    But try it with the -noisily- option and you will see for yourself what is going on. Then you will have to figure out why that is happening.


    Comment


    • #3
      Thank you so so much Clyde! I do appreciate your reply a lot!

      Instead of bootstrap, I shifted to a simple "foreach" loop (1000 times), and exported the coefficients to an excel. It is working now.

      For the merge m:m, I tried to correct it. The problem is that, in file2, each oid is specific, like 1 2 3, however, id might appear more than once, such as 3 3 1. So I cannot use m:1 merge using id. I wondered are there any command to replace the duplicate id 3 with 2, so that it is specific and not the same as oid?

      Again, I appreciate your help a lot!

      Comment


      • #4
        The problem is that, in file2, each oid is specific, like 1 2 3, however, id might appear more than once, such as 3 3 1. So I cannot use m:1 merge using id.
        I can't follow this description. If an id can appear more than once in both data sets then they cannot be -merge-d on id. Think about it. Suppose id 3 appears twice in both data sets. How would you (or Stata) know which of the id 3 observations in file2 to match with which of the id 3 observations in file1? The notion of -merge-ing in that way is undefinable. There are several possibilities, which lead to different resolutions.
        1. In one or both of the two data sets the observations with the same id are fully duplicate observations: they contain the exact same values for every variable. In that case, you can do -duplicates drop-, and the surplus observations will be eliminated, with no loss of information. And then you can use -merge m:1- or -merge 1:m- (or maybe even -merge 1:1-). This is the simplest case. (It's actually not quite that simple: you really should investigate the data management that led to the creation of this data set because it clearly contains at least the error the created meaningless duplicate observations--and where one error is found, others often lurk unnoticed.)
        2. It may be that the answer to the question I posed about which of the observations in file2 should match with which in file1 is that each such observation in file2 should be matched with every such observation in file1. In that case, -merge- is the wrong command, and you should combine the data sets using -joinby- instead.
        3. Another benign situation is that you are just not looking at the data properly. From the example data you show, in at least one of the data sets, the multiple variables with the same id are distinguished by date. Each combination of id and date identifies a unique observation. If the same is true in the other data set, then -merge 1:1 id date- will get you a properly merged data set. (What you show of the data set new.dta does not contain a date variable, but perhaps it is there and you just didn't think to show it?)
        4. A much more problematic situation is what is found in the example you show from new.dta. The id 2 appears twice, with different values of city. Why is that? Is one of the city values "WL" or "CST" wrong? If so you have to remove the incorrect one(s). (And again you should investigate how the wrong data got into your data set in the first place.) If both are correct, does each perhaps apply only during certain date ranges? If that's not the case, then the question of which one to pair with which observations with id 2 from allc_quarterly... weighs heavily on you. If there is a way to answer that question, then you have to remove the observations that should not be paired in the combined data set. If there is no way to answer that question and situation 2. does not apply, then your data simply cannot be combined in any meaningful, logical way. This means going back to the drawing boards and rethinking your approach.


        Comment

        Working...
        X