Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape to wide with blanks for all but the first row for each unique id

    I have a dataset with 140 variables, 916 obs for 86 unique ids. I want to reshape to wide but get an error because other variables are not constant within id. This is because data for most variables are recorded in the first row of each observation. I was using the command replace var1= var1[_n-1] if var1 == "", but this is taking too long. Is there an easier way to do this?

  • #2
    Code:
    sort id, stable
    by id: replace var1 = var1[1]
    will be a bit faster.

    But if you think that is slow, wait till you actually get to the -reshape-! While I always counsel patients, you might consider installing the -gtools- suite from SSC. That will give you the -greshape- command, which is faster and supports the same syntax.

    That said, why are you -reshape-ing this data set to wide? There are only a few things that can be efficiently done in Stata (or done at all) in wide layout. The long layout is by far more effective for nearly all data management and analysis in Stata. Unless you know for a fact that you will be doing some specific thing that requires the wide layout, you should leave your data long.

    I should add that the -replace- commands to spread the constant-within-id variables over the missing values should be done regardless: a proper long layout should have them there. It sounds like you are in the position of starting with a dysfunctional, incomplete long data set and you are trying to convert it to a wide data set which will also be dysfunctional. Better to make it a proper long data set (with the -replace-ment of missing values) that will prove useful for further management and analysis.
    Last edited by Clyde Schechter; 01 Jun 2023, 22:31.

    Comment

    Working...
    X