Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data tips

    Hi People,

    I am working on a data set, which is really huge. It is an administrative data set and I want to Combine them, one is about firms maybe 500 000 observations over 1,4 gb size, the other one is about

    individual employee data, much bigger with nearly 10 Million observations about 4 gb in size. When I want to merge both , my Memory is too low, the complete stata programe breaks up. So do you have any tips for me for working with Panel data? I want to calculate such things like churning rate or gross flow turnover rate. Does it maybe makes sense only to Keep real Panel data, because then my Observation number would reduce immense?

  • #2
    Presumably you will want to summarize employee data at the firm level, based on the sorts of things you are analyzing. One approach would be to use the collapse command to reduce the employee data to one observation per firm and then merge that much smaller file with your firm data.

    Comment


    • #3
      What's the memory in your computer?

      Comment


      • #4
        Do you need all the variables too?

        Comment


        • #5
          oh I think I have about 5 gb of it.... by the data, no not actually but I think that it would be better to work on one data set before merging to the other which doesn't work and to make it smaller or?

          Comment


          • #6
            Your final post suggests a summary might help. If your calculations could reduce the number of variables you need to keep, do them first. Then, as Nick indicated, delete unneeded variables etc. before merging. As William suggests, if you're going to collapse the data to the firm level eventually, you may be able to do that before the merge saving piles of space.

            Also look at the compress command. It may be that your variables are not stored in the most efficient manner. Also check if you have numeric variables (or ones you will make numeric) stored as strings - making them numbers (see real function) can save a lot of space.

            Comment

            Working...
            X