Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard error intercept, Frisch-Waugh theorem

    Hi all,

    I am applying the Frisch-Waugh-Theorem to partial out a set of fixed effects D and get the OLS estimates and standard errors of the remaining regressors X.
    The idea is mainly to use the code to pre-process data, but I would like to understand exactly how to get correct OLS results before doing anything.

    Below a MWE in Mata to clarify what I do.
    For reference I report also a benchmark when using -areg-.

    Code:
    cls
        clear all
        sysuse auto, clear
      
    // Benchmark
        
    areg price gear length i.trunk, absorb(turn)
        
        
    // Absorb "manually" in MATA
        
    xi i.turn i.trunk
    gen uno = 1
            
    mata
        
         // import
            y = st_data(., "price")
            X = st_data(., ("gear", "length", "_Itrunk_*", "uno"))  
            D = st_data(., "_Iturn*")
        
         // demeaned X and y
            M_D = I(rows(y)) - D * qrinv(cross(D,D)) * D'  // "residual maker" M_D = I - D(D'D)^(-1)D'
            y_dem = M_D * y
            X_dem = M_D * X
        
         // OLS using de-meaned variables and corresponding standard errors
            b1 = qrsolve(X_dem, y_dem)
            res2 = cross(y_dem - X_dem*b1, y_dem - X_dem*b1)  
            MSE = res2/(rows(X_dem) - (cols(X_dem)+cols(D)-1))
            XX_dem = qrinv(cross(X_dem, X_dem))
            SE = sqrt(diagonal(XX_dem) * MSE)
            SE
        
         // Compute constant as c = y_bar - X_bar*b1
            e = rows(b1) - 1
            c = mean(y) - (mean(X[., 1..e]) * b1[1..e])'
            c
        
    end
    I hope I haven't made trivial mistakes, in fact it seems that all estimates (including the separately computed constant) correspond to the ones reported by -areg-. However, I am not sure how to get the standard error of the constant without inverting the full (X, D)'(X, D) variance-covariance matrix (which would make very little sense, given the setting).

    Thanks for any comments.
    The question is cross-posted at https://stats.stackexchange.com/ques...-waugh-theorem

  • #2
    Hi Stefano
    This is indeed a nice exercise. Before you go any further, if you intention is to extend areg capabilities, be aware that there are other programs already available in stata that deal with multiple high dimensional fixed effects. The gold standard right now being reghdfe by Sergio Correira
    That being said.
    You are already on the right track to recover the standard errors of the constant. What I have done on my own version of fixed effects models is to use not the demeaned variables, but the demeaned recentered variables for the regression, see my own modification to your code
    Code:
    mata
        
         // import
            y = st_data(., "price")
            X = st_data(., ("gear", "length", "trunk", "uno"))  
            D = st_data(., "ibn.turn*")
        
         // demeaned X and y
         // "residual maker" M_D = I - D(D'D)^(-1)D'
            M_D = I(rows(y)) - D * invsym(D'*D) * D'  
            y_dem = M_D * y
            X_dem = M_D * X
            y_dem=y_dem:+(colsum(y):/rows(y))
            X_dem=X_dem:+(colsum(X):/rows(X))
        
         // OLS using de-meaned variables and corresponding standard errors
            b1 = invsym(X_dem'*X_dem)*X_dem'*y_dem
            res2 = cross(y_dem - X_dem*b1, y_dem - X_dem*b1)  
            MSE = res2/(rows(X_dem) - (cols(X_dem)+cols(D)-1))
            XX_dem = qrinv(cross(X_dem, X_dem))
            
            SE = sqrt(diagonal(XX_dem) * MSE)
            SE
        
    end
    The problem lies when 2 or more fixed effects sets are involved. For two, you need to count the "mobility groups". For more than 2, there is nothing concrete. I describe a small algorithm that can count the non estimable fixed effects, which works in practice on simulations, but have not shown it works in a mathematically rigorous matter.

    HTH
    Fernando


    Comment


    • #3
      Hi Fernando,

      thanks a lot for your reply. I am not planning to extend -areg-, but I would like to understand the underlying mechanics within a simple example, before moving to real data.

      About the strategy you proposed, unfortunately it does not address the case when in X there is another set of fixed effects (as in my MWE).

      Best wishes,
      Stefano

      Comment


      • #4
        Hi Stefano,
        It still works, but you need to count also the number of coefficients not estimated in your set of fixed effects. Look at the following code:
        Code:
        mata
        
        //import
        y=st_data(.,"price")
        X=st_data(.,("gear length i.trunk uno"))
        D=st_data(.,"ibn.turn")
        
        //demeanedXandy
        //"residualmaker"M_D=I-D(D'D)^(-1)D'
        M_D=I(rows(y))-D*invsym(D'*D)*D'
        y_dem=M_D*y
        X_dem=M_D*X
        y_dem=y_dem:+(colsum(y):/rows(y))
        X_dem=X_dem:+(colsum(X):/rows(X))
        
        //OLSusingde-meanedvariablesandcorrespondingstandarderrors
        b1=invsym(X_dem'*X_dem)*X_dem'*y_dem
        res2=cross(y_dem-X_dem*b1,y_dem-X_dem*b1)
        MSE=res2/(rows(X_dem)-(cols(X_dem)-diag0cnt(invsym(X_dem'*X_dem))+cols(D)-1))
        XX_dem=qrinv(cross(X_dem,X_dem))
        
        SE=sqrt(diagonal(XX_dem)*MSE)
        SE
        
        end
        the function "diag0cnt" counts the zero elements in the main diagonal, which means those cannot be estimated.
        HTH
        Fernando

        Comment


        • #5
          Hi Fernando,

          thanks for your answer, indeed your code returns almost the same result as areg (including the RMSE). The only part that is different is the standard error for the intercept (all other standard errors are identical).
          With areg it's 11259.39, while with the MATA code it's 11258.3645.

          I believe this is due to the fact that one needs to adjust the degrees of freedom, but I am getting a bit confused as I thought was already taken care of precisely when computing the MSE. Any ideas about this?

          Many thanks,
          Stefano

          Comment


          • #6
            Hi Stefano
            YOu are absolutely right. I forgot the "double" adjustment that is required:
            Code:
            mata:
            
            //import
            y=st_data(.,"price")
            X=st_data(.,("gear length i.trunk uno"))
            D=st_data(.,"ibn.turn")
            
            //demeanedXandy
            //"residualmaker"M_D=I-D(D'D)^(-1)D'
            M_D=I(rows(y))-D*invsym(D'*D)*D'
            y_dem=M_D*y
            X_dem=M_D*X
            y_dem=y_dem:+(colsum(y):/rows(y))
            X_dem=X_dem:+(colsum(X):/rows(X))
            
            //OLSusingde-meanedvariablesandcorrespondingstandarderrors
            b1=invsym(X_dem'*X_dem)*X_dem'*y_dem
            err=y_dem-X_dem*b1
            n=rows(y_dem)
            k=rows(invsym(X_dem'*X_dem))-diag0cnt(invsym(X_dem'*X_dem))
            //incorrect standard errors
            VCV=sum(err:^2)/(n-k)*invsym(X_dem'*X_dem)
            diagonal(VCV):^.5
            // corrected 
            VCV=VCV*(n-k)/(n-k-cols(D)+1)
            diagonal(VCV):^.5
             
            
            end
            I modified the code to show what I have done in my own version of linear models with High order fixed effects.
            HTH

            Comment


            • #7
              Hi, of course, it makes total sense. Thanks!

              Best wishes,
              Stefano

              Comment

              Working...
              X