Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Laptop Configuration for Big Data

    Dear Stata List

    I am working with a large trade dataset (17 g). The dataset has three dimensions : 5000 products, 190 countries and 3 time periods. I use reghdfe command to run a linear model with multidimensional fixed effects (country-time, country-product and product-time). On my laptop it takes about 2 hrs to estimate the model.

    Can you recommend how I can update the configuration to estimate such a model in a reasonable time frame?

    My laptop configuration is below:

    8 g RAM, Intel Dual Core i5-7200U processor and 1 TB hard drive.

    Thank you

    Rohit

  • #2
    -reghdfe- by Sergio Correia: http://scorreia.com/software/reghdfe/

    Sergio writes:
    Within Stata, it can be viewed as a generalization of areg/xtreg
    With this knowledge refer to the MP report to see how much more performance you can get from your laptop by getting some more Stata MP licenses for it.
    Looks like both areg and xtreg,fe are well parallelized.

    Best, Sergiy

    PS: one could argue that you have a big dataset, not Big Data.

    Comment


    • #3
      If you post the entire command we could potentially give more advice. The order of the items in the cluster command matters, you should put the largest number of distinct categories first.

      Comment


      • #4
        Hi Sergiy and Arthur

        I should have noted in the original post that I use Stata MP 16.1 (Single user 2 core). I had recently upgraded the Stata licence from SE to MP hoping that it will enable quick estimation. But that has not helped much.

        The panel dataset is organized at importer country-product (190*5000) period (3) level. I estimate the effect of tariff rate on product k (averaged across all trading partners) that country i faces in period t on trade value gap (exports reported by rest of the world - imports reported by country i). I estimate the model using the following specification:

        reghdfe trade_value_gap tariff, absorb(ik kt it) vce(r)

        Thanks

        Rohit


        Comment


        • #5
          That is a very large dataset indeed! Maybe it would be better to contact folks at Stata ([email protected]) and ask for a recommended system requirements. You need to figure out if the problem is a RAM problem or a processor one. My guess it is the former.

          Data in Stata are usually uploaded to RAM, and considering your data file is 17GB already, the 8GB on your laptop is very small. The system in this case would have to use swap memory, which means using part of the hard disk as a RAM supplement to make things work. I guess you would need 16 or preferably 32GB of RAM to analyze such huge data. Please let us know what you end up doing so we can all learn from your experience.

          Cheers!
          W.

          Comment

          Working...
          X