Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memory issue with estimating instrumental variable regression on big data with a large number of dummies

    Dear Statalist,

    I am trying to run an IV regression with about 1200 fixed effects using a large data set of about 100GB. I'm trying to run this using Stata MP 17.0 on a computing server with 400+GB of RAM but I'm consistently running into memory problems. I've tried ivreg2 and ivregress2, both of which ended up crashing Stata after running out of memory. Using the ivregress command gave the error:
    Code:
    First-stage regressions
    -----------------------
               _iv_vce_wrk():  3900  unable to allocate real <tmp>[376459587,1244]
                     <istmt>:     -  function returned error
    r(3900);
    Following the calculations in this post (https://www.statalist.org/forums/for...or-in-stata-16), it seems that (376459587 * 1244 * 8 / 1024 / 1024 / 1024) ~ 3489GB of memory is required for the method ivregress uses? Is there any way to estimate the regression (I'm also hoping to retrieve the estimate on the fixed effects)? Thanks for helping!

    Kind regards,
    William
    Last edited by William Zheng; 10 Jul 2023, 11:20.

  • #2
    Install ivreghdfe and absorb the fixed effects. After installation:

    Code:
    help ivreghdfe
    Code:
    * Install ftools (remove program if it existed previously)
    cap ado uninstall ftools
    net install ftools, from("https://raw.githubusercontent.com/sergiocorreia/ftools/master/src/")
    
    * Install reghdfe
    cap ado uninstall reghdfe
    net install reghdfe, from("https://raw.githubusercontent.com/sergiocorreia/reghdfe/master/src/")
    
    * Install ivreg2, the core package
    cap ado uninstall ivreg2
    ssc install ivreg2
    
    * Finally, install this package
    cap ado uninstall ivreghdfe
    net install ivreghdfe, from(https://raw.githubusercontent.com/sergiocorreia/ivreghdfe/master/src/)

    Comment


    • #3
      Hi Andrew,

      Thanks for the help and sorry for the slow response as it took a while for me to test around with the ivreghdfe command. I tried:
      Code:
      ivreghdfe  `depvar' (`endovar' = `iv') `controls', absorb(`absorbvar') vce(cluster `cluster') pool(1) compact verbose(2)
      Unfortunately this couldn't run and Stata crashed when it ran out of memory. Below is the log file for the ivreghdfe command (with verbose) until Stata crashed. Just to provide some more information, I've previously tried the fixed effects regression without the IV using reghdfe and it didn't work but areg turned out to work. Please let me know if you have any ideas on how to get around to estimate the IV regression. Thanks!
      Code:
      [CMD] reghdfe __000003 [aweight=__000002], absorb(FE2=citycode) pool(1) compact
      >  verbose(2) nopartialout varlist_is_touse vce(cluster citycode)
      Parsing and validating options:
      -------------------------------
      
      # Parsing varlist: __000003
      macros:
                 r(basevars) : "__000003"
                r(fe_format) : "%8.0g"
                   r(depvar) : "__000003"
      
      # Parsing vce(cluster citycode)
      macros:
         s(base_clustervars) : "citycode"
              s(clustervars) : "citycode"
             s(num_clusters) : "1"
                  s(vcetype) : "cluster"
      
      # Parsing dof()
      macros:
           s(dofadjustments) : "pairwise clusters continuous"
         s(base_clustervars) : "citycode"
              s(clustervars) : "citycode"
             s(num_clusters) : "1"
                  s(vcetype) : "cluster"
      
      # Passing main options to Mata
      
          - HDFE.absvars = `"FE2=citycode"'
          - HDFE.tousevar = `"__000003"'
          - HDFE.weight_type = `"aweight"'
          - HDFE.weight_var = `"__000002"'
          - HDFE.technique = `"map"'
          - HDFE.transform = `"symmetric_kaczmarz"'
          - HDFE.acceleration = `"conjugate_gradient"'
          - HDFE.preconditioner = `"block_diagonal"'
          - HDFE.parallel_dir = `""'
          - HDFE.parallel_opts = `""'
          - HDFE.drop_singletons = 1
          - HDFE.tolerance = 1.00000000000e-08
          - HDFE.maxiter = 16000
          - HDFE.compact = 1
          - HDFE.poolsize = 1
          - HDFE.verbose = 2
          - HDFE.parallel_maxproc = 0
          - HDFE.parallel_force = 0
          - HDFE.timeit = 0
      
      # Parsing absorb(FE2=citycode) and initializing FixedEffects() object
      
      macros:
                        s(G) : "1"
            s(has_intercept) : "1"
              s(save_any_fe) : "1"
              s(save_all_fe) : "0"
                  s(absvars) : ""citycode""
                 s(basevars) : "citycode"
                    s(ivars) : ""citycode""
                    s(cvars) : """"
                  s(targets) : ""FE2 ""
               s(intercepts) : "1"
               s(num_slopes) : "0"
         s(extended_absvars) : "1.citycode"
      
      Loading fixed effects information:
      ----------------------------------
      
      # Initializing Mata object for 1 fixed effect
      
         +---------------------------------------------------------------------------
      > ---------------------+
         |  i | g |      Name | Int? | #Slopes |    Obs.   |   Levels   | Sorted? | I
      > ndiv? | #Drop Singl. |
         |----+---+-----------+------+---------+-----------+------------+---------+--
      > ------+--------------|
         |  1 | 1 |  citycode | Yes  |    0    | 3.765e+08 |       1119 |      No |  
      >    No |          0   |
         +---------------------------------------------------------------------------
      > ---------------------+
      
      # Initializing panelsetup() and loading slope variables for each FE
      
         - Fixed effects: citycode
           panelsetup()
      
      # Loading weights [aweight=__000002]
      
         - sorting weights for factor citycode
      ## Preserving dataset
      
      # Estimating degrees-of-freedom absorbed by the fixed effects
      
         - categorical variable citycode is also a cluster variable, so it doesn't re
      > duce DoF
      
      Stopping reghdfe without partialling out
      ----------------------------------------
      
      (sum of wgt is     3.7323e+01)
      Warning - collinearities detected
      Vars dropped:       2.thunderstorm

      Comment


      • #4
        Originally posted by William Zheng View Post
        I am trying to run an IV regression with about 1200 fixed effects
        When you say 1200 fixed effects, do you mean 1200 variables within -absorb()- in ivreghdfe? How many observations do you have?

        Just to provide some more information, I've previously tried the fixed effects regression without the IV using reghdfe and it didn't work but areg turned out to work.
        areg just allows one FE variable to be absorbed. If you have just one FE variable, then just use the official xtivreg command.

        Comment


        • #5
          By about 1200 fixed effects I meant that I have about 1100 city dummies alongside some additional dummies for weather etc. I have about 380 million observations, and previously with areg I absorbed the city FE (which accounts for the vast majority of FEs) and added the remaining FEs as indicator variables in the regression. My main goal is to retrieve the city FEs and I'll try xtivreg next. Thanks.

          Comment

          Working...
          X