Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • create dataset copy with no original observations but with new observations - including the full range of unique values and their labels

    I am trying to create a copy of a dataset (123,005 observations, 469 variables - including long, double, and string variables).

    I want the new copy to include all the variables of the original dataset as well as the full range of unique values and their labels.

    However, I don't want the new copy of the dataset to contain any of the original observations (participants) - so I want all observations to be fabricated/random.

    The reason for this is that I cannot work on the 'real' data remotely due to data regulations. I could work remotely on a version of the dataset that contained the full range possible values and metadata but none of the original observations.

    I have tried dropping all observations "drop in 1/123005" which keeps colulmn names but because there are no observations there are no values.

    From what I can tell, the only way to do this would be to create enough dummy observations so that the full range of possible values existed. But doing this manually for 469 columns with various ranges and labels is not feasible.

    Any help or advice in the right direction from anyone who has done anything of this sort before would be very much apreciated!

  • #2
    I am using STATA/MP 17.0

    Comment


    • #3
      One solution that occurs to me is to permute (shuffle) each variable's values, which will ensure that all the original values are represented for each variable, but no individual's original data is present. There is a community-contributed command, -ssc describe shufflevar-, that will do this for a single variable or for each one of a whole varlist, the latter being what I believe you want.

      Code:
      set seed 4843889
      shufflevar *
      save newfile.dta
      This might take a few minutes to do on a file of the size you have.

      Comment


      • #4
        Thank you very much. I think your code worked but for the fact that value labels have been removed. I don't suppose you have any idea how to keep value labels?

        Comment


        • #5
          Whoops, the dropping of the value labels wasn't mentioned in the documentation. You could try this, handling the variables one at a time.
          Code:
          foreach v of varlist * {
             local vlab: value label `v'  // save label
             shufflevar `v', dropold
             label values `v'_shuffled `vlab'  // apply label
          }

          Comment

          Working...
          X