Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IV with fixed effect for data which has not a traditional panel structure

    How can I run an IV model with fixed effect in a non-panel dataset? I know that in case of panel data, we would use -xtivreg2-.

    I have a dataset where each row represents one childbirth delivery. I would like to estimate the effect of a municipality level policy which is believed to have affected a given variable X in different ways. Thus my key independent variable with be the interaction between X and a dummy for post-policy period. I would like to include municipality fixed effects as well as to cluster the standard errors at the municipality level.

    If I would be to force a panel structure to my data, the panel variable would be each child born (which only shows up once in the dataset). Stata would not run the command:

    Code:
    . xtset id d_pos
           panel variable:  id (weakly balanced)
            time variable:  d_pos, 0 to 1
                    delta:  1 unit
    
    . 
    . gen d_pos2 = d_pos
    
    . 
    . xtivreg2 peso d_pos2 (d_pc = trat), fe cluster(mun)
    Warning - singleton groups detected.  6097656 observation(s) not used.
    no observations
    r(2000);
    
    end of do-file
    
    r(2000);
    
    .

  • #2
    Paula:
    can't you simply go:
    Code:
    ivreg peso i.d_pos2 (d_pc=trat), vce(cluster mun)
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hello Carlo Lazzaro. I imagine the factor operator i. was by mistake (d_pos2 is a dummy variable for post-policy period).

      Would this code be considering municipality fixed effect? I imagine it is just clustering the standard errors at the municipality level.

      In any case, this is not working. I don't know what I am doing wrong.
      Code:
      . ivreg peso d_pos2 (d_pc=trat), vce(cluster mun)
      option vce() not allowed
      r(198);

      Comment


      • #4
        Paula:
        if -dpos_2- is actually a categorical variable, what's wrong with -i.- -fvvarlist- prefix? (at worst , with a two-level categorical variable, it is redundant).
        That said, yoou may want to try:
        Code:
        ivregress gmm peso i.municipality d_pos2 (d_pc=trat), vce(cluster municipality)
        Caveat emptor: untested code (specifically, I'm not sure that -i.municipality- and related cluster robust standard error can live together).
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Thanks Carlo. You are right, I should use the command -ivregress- instead.

          Code:
          . codebook mun
          
          -------------------------------------------------------------------------------------------------------------------------------------------
          mun                                                                                                                 hospital's municipality
          -------------------------------------------------------------------------------------------------------------------------------------------
          
                            type:  numeric (double)
          
                           range:  [110001,530010]              units:  1
                   unique values:  3,321                    missing .:  0/6,194,096
          
                            mean:    325678
                        std. dev:   93675.2
          
                     percentiles:        10%       25%       50%       75%       90%
                                      211070    261420    330360    355030    431490
          
          .
          . set matsize 3400
          
          . ivregress 2sls peso d_pos i.mun (d_pc=trat), vce(cluster mun)
          note: 210015.mun identifies no observations in the sample
          note: 311890.mun identifies no observations in the sample
          note: 355320.mun identifies no observations in the sample
          maxvar too small
              You have attempted to use an interaction with too many levels or attempted to fit a model with too many variables.  You need to
              increase maxvar; it is currently 5000.  Use set maxvar; see help maxvar.
          
              If you are using factor variables and included an interaction that has lots of missing cells, either increase maxvar or set
              emptycells drop to reduce the required matrix size; see help set emptycells.
          
              If you are using factor variables, you might have accidentally treated a continuous variable as a categorical, resulting in lots of
              categories.  Use the c. operator on such variables.
          r(907);
          
          end of do-file
          
          r(907);
          
          .
          I then tried to -set maxvar- to its maximum and got the error message below:
          Code:
          . use SINASC_SIHpartos, clear
          
          . 
          . set maxvar 32767
          no; data in memory would be lost
          r(4);
          
          end of do-file
          
          r(4);

          Comment


          • #6
            Paula:
            what if you set -maxvar- before loading the dataset you're working on?
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Thanks Carlo!
              You are right - I do not get an error message when setting -maxvar- before loading the data. The issue is that it has been taking forever for Stata to run the regression (It has been already 3 hours and Stata is still running). As I don't need the coefficient estimates for the municipality FE, Is there a way to absorb them? I was trying to do so when using -xtivreg2-. However, as mentioned above, this command does not work as I do not have panel data.

              Comment


              • #8
                Paula:
                if you do not need -i.municipality- coefficient, you can try:
                Code:
                ivregress gmm peso d_pos2 (d_pc=trat), vce(cluster municipality)
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  To be safe, I would compare it with using within deviations from means. First, make sure you only use the complete cases, which is best done by creating a complete cases indicator and only computing the within-municipality average for any variable if it is part of a complete case. You won't see "estimates" of the dummies, which is best, and it won't run up against space constraints. In my MIT Press book I show that this is how fixed effects IV estimation works.
                  Below is generic code. The endogenous variable is w and z1 ... zm are excluded instruments.

                  Code:
                  gen s = (y != .) & (w != .)  (x1 != .) & ... & (xk != .) & (z1 != .) & ... & (zm != .)
                  egen ybar = mean(y) if s, by(municipality)
                  gen ydd = y - ybar
                  egen wbar = mean(w) if s, by(municipality)
                  gen wdd = w - wbar
                  egen x1bar = mean(x1) if s, by(municipality)
                  gen x1dd = x1 - x1bar
                  ...
                  egen xkbar = mean(xk) if s, by(municipality)
                  gen xkdd = xk - xkbar
                  egen z1bar = mean(z1) if s, by(municipality)
                  gen z1dd = z1 - z1bar
                  ...
                  egen zmbar = mean(zm) if s, by(municipality)
                  gen zmdd = zm - zmbar
                  ivregress 2sls ydd x1dd ... xkdd (wdd = z1dd ... zmdd), vce(cluster municipality)

                  Comment


                  • #10
                    Many thanks for your response Jeff Wooldridge!!! And sorry for just picking up on this now.
                    Would your code be equivalent to transforming the data to mean differences by using the command -xtdata- and then running the ivregress command on the transformed variables? Or would it differ in case there are incomplete observations? The -xtdata- alternative is recommended here: https://www.stata.com/support/faqs/s...ts-regression/

                    PS: Carlo Lazzaro, from what I understand, your suggestion on #8 would not be controlling for municipality FE, right? I would like to control for municipality's time-invariant factors even though I don't need to see the "estimates" of the municipalities' dummy.

                    Comment


                    • #11
                      You may consider using ivreghdfe from SSC and do away with dummies or demeaning.

                      Code:
                      ssc install ivreghdfe
                      help ivreghdfe

                      Comment

                      Working...
                      X