Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster bootstrap and singleton cluster error

    Hello, I am trying to run a cluster bootstrap of a user written program:
    bootstrap, reps(99) cluster(idn) strata(idn) seed(12345): command, ...

    Where I have a panel of individuals over time and idn is a geogrpahical local area indicator.

    I get the "singleton cluster detected" error.

    I am now trying to create a programme using bsample and simulate and see whether this way I can make the program drop the singleton.

    As I am not an expert programmer, however, I wonder wherther there is a more straightforward way (maybe using bootstrap?). I wonder if someone can help me and/or point me to the right resources.

    Thanks in advance!!
    Chiara

  • #2
    Can you show the entire command or do file you are using? These errors happen usually in panel settings when the model is not correctly specified in the command option. If you are using a regular model, say, xtreg, you should use the vce(bootstrap) option instead of the prefix. If it is user written you may need to adapt it to handle newly created IDs, see the options in boostrap, both cluster and idcluster.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      Hi Felix,
      Thanks for your reply.
      I attach part of the code. It is a two stage user written command. I only attach the first stage, as even this does not work (so no point in adding the second part).

      cap program drop firststage
      program define firststage, eclass
      syntax, [x1(varlist)] [y1(varlist)] [condition1(string)] [clustid1(varlist)]
      cap drop yhat*
      reg `y1' idn#c.tax_year if `condition1', cluster(`clustid1')
      predict yhat, xb
      end


      xtset, clear
      gen newid=idn
      gen strata=idn*(year<2008)
      bootstrap, reps(2) cluster(idn) idcluster(newid) strata(strata) seed(141120): firststage, y1(y) condition1(year<2008) clustid1(idn)


      I have tried several variations:
      1) for example by trying to see whether something changes if I drop the time condition, but nothing changes:
      bootstrap, reps(2) cluster(idn) idcluster(newid) strata(idn) seed(141120): firststage, y1(y) clustid1(idn)


      2) or another attempt:
      cap drop strata
      egen strata=group(idn tax_year)
      bootstrap, reps(2) cluster(idn) idcluster(newid) strata(strata) seed(141120): firststage, y1(y) clustid1(idn)


      I always get the "singleton cluster detected" error.

      I should specify that I have a panel of individual over tax_year, but the cluster variable is at the local area level.

      Thanks!
      Best,
      Chiara

      Comment


      • #4
        The code is quite dense and without the data and background it will be difficult to understand what is going on in detail. Since you do not return any specific values I assume that you work with the returns of the reg command. What I think might be the problem is with newid. You specify this as a new id but this variable is then never used in your actual code. This cannot be, since this new variable must be used instead. What happens if you run


        Code:
        cap program drop firststage
        program define firststage, eclass
        syntax, [x1(varlist)] [y1(varlist)] [condition1(string)] [clustid1(varlist)]
        cap drop yhat*
        reg `y1' idn#c.tax_year if `condition1', cluster(`clustid1')
        predict yhat, xb
        end
        
        xtset, clear  //Why is this even here? there are no xt commands used
        gen strata=idn*(year<2008)
        bootstrap, reps(2) cluster(idn) idcluster(newid) strata(strata) seed(141120): firststage, y1(y) condition1(year<2008) clustid1(newid)
        I am not sure what you want to achieve in the end so my idea here might be wrong. Probably you also might want to set
        reg `y1' idn#c.tax_year
        to
        reg `y1' newid#c.tax_year
        Last edited by Felix Bittmann; 11 Nov 2020, 10:18.
        Best wishes

        Stata 18.0 MP | ORCID | Google Scholar

        Comment


        • #5
          Thanks Felix! I have tried what you suggest but unfortunately it does not work. I still get the same error.
          Would there be a way to tell bootstrap to ignore/drop singletons? Or do you think it is just a problem of mispsecification of some sort in my code?
          For a given idn (local area) and tax_year I have different number of individual observations, if that helps.

          With firststage I am trying to predict the linear trends in a given local area (idn) in y in the pre-treatment period (before 2008).
          I use this to predict the trends in each local area in the whole period (that, also after 2008)
          I will then use yhat as a covariate in the second stage regression, where I wil do an event study on the effect of the "treatment" on y.

          Sorry about the code. I cannot post the data but if helpful I could create a dataset where I fidn the same issue and then post the data?

          Best,
          Chiara
          Last edited by Chiara Cavaglia; 11 Nov 2020, 10:40.

          Comment


          • #6
            Of course you can drop clusters with a single observation before the analyses like the following:
            Code:
            bysort clustervar: gen counter = _n
            bysort clustervar: egen cluster_n = max(counter)
            drop if cluster_n == 1
            However, I am not sure if this solves all your problems. For me it is at this point not really clear what the program does and whether it is specified correctly.
            Best wishes

            Stata 18.0 MP | ORCID | Google Scholar

            Comment


            • #7
              Thanks Felix. No, unfortunately that does not solve it (I had tried it like this, too - with _N). Thanks for your help! If I manage to understand what is wrong I will post it here.
              Best wishes,
              Chiara

              Comment


              • #8
                In the end the variable in strata() was wrongly specified (in fact it did not make much sense the way I had specified it)!! Together with the fact that I had not put the newid variable as the cluster variable, as you mentioned. Thanks for stressing the fact that most time the issue is due to misspecification in the cluster variables!
                Best,
                Chiara

                Comment

                Working...
                X