Accidentally deleted ethnic group from ethnicity variable

Zach Goldberg

Join Date: Jul 2017

Posts: 184
#1

Accidentally deleted ethnic group from ethnicity variable

01 Sep 2017, 15:58

Greetings,

To shorten and simplify a cross-tabulation of ethnic group and political ideology, I 'dropped' native americans (n=27; total n=4,200) but accidentally 'saved' a few commands later. What's the quickest way to recover the complete variable? I have a do-file that I can technically re-execute but it's less than complete (I occasionally code right into the command field out of laziness--lesson learned). Any suggestions? Thanks in advance for the help!
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

01 Sep 2017, 16:46

It seems you have bad news. If I got it right, you accidentally dropped observations with its whole content.

Being this so, unless the do-file has commands to create the data set from scratch, I fail to envisage a way to recover the lost observations.

Best regards,

Marcos
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#3

01 Sep 2017, 18:10

Well, there is a possibility you can recover it if you had a log file running when you originally created the data set with your do-file and the log file was also running while you entered commands from the Command window.

You can open the log file in the do-editor, or, for that matter, any other text editor. All of the commands that were run while it is open are echoed there, and if you edit out the initial ". " that precedes each command, the line numbers, and the Stata output, you will be left with an accurate and complete record of what was done. You can then save that as a do-file and run it to re-generate your data set.

If you didn't have a log file running at the time, then I don't think there is any way you can get the lost data back.

By the way, I fear you may learn the wrong lesson here, or rather, an incomplete one. Even if you had never made this current -save- error and your data set remained completely intact, you are still operating way below professional standards in not having an audit trail for how you created that data set. If several months from now somebody questions your data and wants to know how you created it, do you really think you will remember everything that you did without documenting it? Why should anyone have any confidence in this data set or any analyses generated from it? Your serious mistake was not the accidental -save-. The real problem was not properly documenting the data set creation completely and reproducibly in the first place.

I know you're probably already feeling bad about what happened, and I'm sorry for piling on while you're feeling down, but it's important that you draw the right lesson from this. Trust me, you will accidentally overwrite an important data file again in your future--probably several times. To err is human. But if you make it a point, compulsively, to always document your data set creation in do files, it will be no harm done.

By the way, if this data was important, why did you not have a back up copy of it as well? Another best practice to follow in the future.
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#4

01 Sep 2017, 18:32

Zach,

Both Dropbox and Google Drive have version histories that allow one to recover previous versions of files created or modified in the past 30 days.

If, by chance, your data set was stored on one of those services or another service with a file reversion feature in the past 30 days, you should be able to recover the earlier version of it. I always keep important data in a storage system that has a reversion feature.

Good luck.

Red Owl
1 like
Comment
Zach Goldberg

Join Date: Jul 2017

Posts: 184
#5

01 Sep 2017, 19:02

Thanks for the feedback everyone. Clyde: You are right--I have to become more disciplined with this. Points well taken.

Is there really no way for me to merge the observations from a fresh data-set?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#6

01 Sep 2017, 20:09

Is there really no way for me to merge the observations from a fresh data-set?

Are you saying there is some other data set that has the variables you lost. If that is true, and if there is also a variable (or set of variables) that the damaged data set and this one share in common and that uniquely identify observations in at least one of the data sets, then you may be able to reconstruct it using the -merge- command. If you are not familiar with how to do that, post some example data from both data sets (the damaged one and the one that has the variables that were lost) for more detailed advice.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#7

01 Sep 2017, 21:11

I'd be worried that the process is getting more complicated than just restarting. Whatever you do, make copies of what you have now in case your next steps make things even worse.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#8

01 Sep 2017, 21:15

I second Red Owl's recommendation of Dropbox or something similar. All my active work is usually done in a Dropbox folder. More than once I've decided to go back to an earlier version because I think I've screwed something up.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#9

02 Sep 2017, 04:47

A quite simple preventive strategy is just having the 'original' data 'unscathed', then we make a 'working' copy of it, saving with - notes - the 'good' renditions we get after each important step in data management, and selecting a sequence number which conveys the order of the changes. For example: original.dta; original_step1.dta; original_step2.dta.

For the data cleaning process to be documented, a do-file can do the trick.

At each step, we shall copy the new dataset as well as the edited do-file to a 'safe' and distant place. In my case, I use Google Drive. Provided the original data set is preserved, things will always gonna be all right, like the song.

Last edited by Marcos Almeida; 02 Sep 2017, 04:58.

Best regards,

Marcos
1 like
Comment

Announcement

Accidentally deleted ethnic group from ethnicity variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment