I am working with a very large dataset. It has 2 million rows. To Stata's credit, I find that I can generate and replace variables quickly.
Yet after I multiply impute the data (3 times), I find that simple commands slow to a crawl. For example, "mi xeq: gen newvar = oldvar + 1" takes upwards of 10 minutes.
I wonder if this is due to my having the mi data in flong style. In flong style, the dataset has 8 million rows, while in wide style the dataset has just 2 million rows (but more columns). Is there any reason to expect faster processing when mi data are wide than when they are flong? More generally, is there any reason to expect faster processing when there are fewer rows but more columns?
Yet after I multiply impute the data (3 times), I find that simple commands slow to a crawl. For example, "mi xeq: gen newvar = oldvar + 1" takes upwards of 10 minutes.
I wonder if this is due to my having the mi data in flong style. In flong style, the dataset has 8 million rows, while in wide style the dataset has just 2 million rows (but more columns). Is there any reason to expect faster processing when mi data are wide than when they are flong? More generally, is there any reason to expect faster processing when there are fewer rows but more columns?
Comment