Value Labels and Appending Data

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#16

04 May 2019, 21:50

Yes, very nice.
Comment
Yawo Kokuvi

Join Date: May 2015

Posts: 137
#17

04 May 2019, 22:03

Clyde, have been running the program, but it takes so long .. more than 20 minutes, and at times crashing my computer. is this supposed to take this long ?

thanks - Yy
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#18

04 May 2019, 22:40

It depends on the size of the files. 20 minutes to combine a large number of large files sounds quite reasonable, even quick, to me. It could be many hours if they're really big. But if they're small files, no.

Also, is Stata really "crashing" or is it just taking a long time and not showing you any output and you're losing patience?
Comment
Yawo Kokuvi

Join Date: May 2015

Posts: 137
#19

04 May 2019, 22:47

It is crashing / freezing. I can see it working - up to a point (about 10 files in, and then it freezes; forcing a restart. I checked the code, and it works by itself up to decode the variables, so it is either the looping through the files, and/or appending that is probably the culprit.

I will leave it overnight to see what happens ...

/Yy
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#20

04 May 2019, 23:04

There is no reason that the looping or appending would be the culprit if it is working up through the 10 files. If after letting it run overnight it's still stuck at the 10th file, then more likely there is something wrong with the 10th data file itself. Take a look at that file and see if you can find it. One possibility is that that file doesn't actually have any variables with value labels--in that case `r(varlist)' will be empty and the code will break at the -foreach- statement. But then you would also get an error message--not a freeze-up or crash.

That said, I ask you, how do you know that Stata is freezing:? If it is doing a very long read or write operation (and it has to do some extremely long write operations as the appended file grows during each loop--writing out all of the appended contents up to that point when it reaches the -save- command) it can look like it's freezing, but it isn't. I think it is more likely that you are just expecting things to go faster than your computer can actually do. Reading and writing is very slow.

Last edited by Clyde Schechter; 04 May 2019, 23:07.
Comment
Yawo Kokuvi

Join Date: May 2015

Posts: 137
#21

04 May 2019, 23:16

great ... maybe it's my expectations. I will load load it on my IMAC which I am not using at the moment, and leave it overnight ... maybe I was using the browser on my laptop, hence the limited resouirces. I recall the first time it froze it gave a memory error - that it has no more memory to work with. I gave it more resources, setting the memory at 1000g (thats about 1 Gig) - is that enough? my machines have 16 gigs, with about 5 Gig available before starting STATA. Should I allocate more memory for the tast?

thanks - Yy
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#22

05 May 2019, 10:11

I don't quite know in what way you're doing this memory allocation. Recent versions of Stata do memory allocation automatically: they ask the OS for more memory when they need it. If you are nevertheless allocating memory yourself, how much you need will evidently depend on the size of your data sets. Bear in mind that at the last iteration of the -foreach f of local filenames- loop, the fully appended data set will need to be accommodated in memory, along with whatever space Stata needs for itself. Moreover, while the -decode-ing is going on, and before the original variables are dropped, the data set will be even a bit larger, perhaps a lot larger if the strings are long. An integer variable is 2 bytes, a long or float is 8. If on decoding the string is longer than that, then even after the -decoding- is done and the original numeric variables are dropped, the data set will have expanded somewhat.

Last edited by Clyde Schechter; 05 May 2019, 10:13.
Comment
Yawo Kokuvi

Join Date: May 2015

Posts: 137
#23

05 May 2019, 11:23

Thanks, Clyde for your assistance. I realized late last night that the memory can be allocated automatically, so I let it alone, basically reverse my previous "Set max-memory" command to missing, which according to the manual is equivalent to missing.

I let it run throughout all night till about 12pm today when it halted, with an error indicating that the computer had run out of memory. So, that's the result after 12 hours.

Certainly, the dataset has expanded quite a bit. Even with a newly formatted (16Gig Memory; 1TB HD; 2.7 GHZ Intel Core i7) IMAC, running only STATA - the system still run out of resources.

I will try to learn a bit about how to make this run more efficiently. I will try this on my office computer tomorrow and see if it still runs out of resources there as well.

In the meantime, I I think the only other option is to probably run this manually:

Open each dataset and run this command 30 times:

Code:

set more off ds, has(vallabel) local vars `r(varlist)' foreach v of varlist `vars'{ decode `v', gen(s_`v') drop `v' }

Thanks for your assistance. I appreciate all the time you've spent with me, and welcome any further ideas.

Yy
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#24

05 May 2019, 11:52

One other thing that might help a bit is to add a -compress- command right after - use `"`f'"', clear-. If your datasets have a lot of variables that take up more space than they truly need, you may save a lot of memory.

Working with large datasets can certainly be painful!
Comment
Yawo Kokuvi

Join Date: May 2015

Posts: 137
#25

05 May 2019, 16:02

Yes, it is a bear - a pain .... but we will survive. I will try this tomorrow tonight again, and tomorrow.

cheers, Yy
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment