Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • My dataset has corrupted and was automatically edited or changed........how to fix this?

    Dear statalist users,
    Since yesterday, my dataset has changed wired. I have been doing lots of regression models during the past week, and has saved the dataset everyday with a different name in my secured dropbox folder. Yesterday, I noticed that the latest dataset has 2 samples less than my original dataset, and added extra samples (without data or wired data). IN addition, meanscores (variables that I created) for some samples have converted to 149e-99 something like this. It should be between 1 and 5, a continuous variable.

    I also opened the original dataset which I have not been edited since May 2021, and noticed that the data for one of the samples has totally eliminated, or converted to 0.

    I have not done this myself, and I don't know what has happened. If I cannot use these datasets now, I would need to request the original dataset and re-create all the variables that I have created - which took me 5 months.

    I wonder if this is because I was using STATA 14.6 version IC and I may have exceeded the number of variables that the software allows me to create? I just ordered student upgrade for STATA BE version. BUt I don't know if this would fix the damaged dataset.

    Please could anyone help me if you have experienced a similar problem before?
    Thanks. Have a nice day.
    Rinko
    Attached Files
    Last edited by Rinko Kinoshita; 21 Aug 2021, 08:55.

  • #2
    Welcome to Statalist.

    Sorry about the possible corruption. I don't have a solution as very minimal information was presented (for example, we don't know what code was used and what was saved, all of these could have affected the data in ways we can't be certain given the question.) However, I have a comment regarding:

    I would need to request the original dataset and re-create all the variables that I have created - which took me for 5 months.
    If you have not been documenting your analysis using a "do-file", please start doing so from now on. If you use the command "help gs" and you should see a chapter called something like "Using the Do-file Editor" which will walk you through the basics. Do files can serve as a record for all the data management and analysis. If anything happened, simply rerunning the do-file will recreate the process. That way, you don't need to spend another 5 months to repeat the process.

    And another tip: Always keep a backup of the original data. Do not keep overwriting or editing the same data set (and worse, doing so without a do-file) as this practice can create untraceable changes which can cause serious problems down the way.

    Comment


    • #3
      I doubt that Stata was the cause of corruption, and it's more likely that your data was not synced completely between saves, or the sync process was interrupted. Try to roll back to a previous version if you can.

      Comment


      • #4
        Ken Chui Thanks. I have do files but not really well organized. I will do so from now on. The possible corruption was observed when I was saving the dataset everyday with a different name because I didn't want to overwrite it.
        Anyway, I am re-doing the coding and ordered a newer version of the Stata. Thanks Rinko

        Comment


        • #5
          Leonardo Guizzetti thanks for your help. I think you may be right about not synchronised fully. I followed your advise and went back to the previous version to save as much variables that I created already as possible. Thanks Rinko

          Comment

          Working...
          X