Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preserving data when stacking

    Hi all

    I have many variables which need to be stacked. Each one of these variables has a few associated variables in the same observation which I need to preserve, though multiple of the variables to be stacked are associated with a single set of variables. There's lots of crossover and I'm not sure what to do. In the example below, fooC1 and fooC2 need to be stacked on each other, barC1 and barC2 have to be stacked on each other and fooV and barV, as well as year and city, must be the same for each of the new observations created, to create the result given. I think I should be using reshape, but I'm not sure how to do this with so many variables at once.

    Year City fooV barV fooC1 fooC2 barC1 barC2
    1997 NY 528 629 Annie Brian Carly Daniel
    Goes to
    Year City C V
    1997 NY Annie 528
    1997 NY Brian 528
    1997 NY Carly 629
    1997 NY Daniel 629
    Last edited by Laurence Selwyn; 12 Jul 2021, 08:48.

  • #2
    Hello Laurence. I had some success replicating your ending condition. Just a disclaimer, I'm not an expert by any means, and would consider myself a beginner to intermediate user, and there are certainly more elegant ways to do this, I just don't know what they are.
    Year City fooV barV fooC1 fooC2 barC1 barC2
    1997 NY 528 629 Annie Brian Carly Daniel
    Code:
    rename fooc1 first1
    
    rename fooc2 second1
    
    rename barc1 first2
    
    rename barc2 second2
    
    gen ID = 1
    
    reshape long first second, i(ID) j(new_var)
    Up to this point, we now have the grouped names on one observation. Now we need to go out of our way to make sure the number we want grouped with them stays with them. I chose to generate a variable, divide by it, and check if it's an integer. There are other, better, ways, I'm sure. Your data should now resemble the following

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float ID byte new_var int year str2 city int(foov barv) str5 first str6 second
    1 1 1997 "NY" 528 629 "Annie" "Brian"
    1 2 1997 "NY" 528 629 "Carly" "Daniel"
    end
    And we can perform some logic to leave what we need.

    Code:
    gen n = _n
    
    gen div = n/2
    
    replace barv = . if div != int(div)
    
    replace foov = . if div ==  int(div)
    
    drop ID n div
    And the final reshape

    Code:
    rename first C1
    
    rename second C2
    
    reshape long C, i(new_var) j(j_var)
    
    drop new_var j_var
    Effectively, we've solved your problem. If you want your foov and barv to be one variable, you can use

    Code:
    egen V = rowtotal(foov barv)
    drop foov barv
    And your data should resemble the following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year str2 city str6 C float V
    1997 "NY" "Annie"  528
    1997 "NY" "Brian"  528
    1997 "NY" "Carly"  629
    1997 "NY" "Daniel" 629
    end
    I hope this helps. If I broke any community rules, please feel free to let me know, I'm a starter here.

    Comment


    • #3
      Hi Eric, thanks for your help

      The dataset I am using was much much larger than the example, but what you described really helped me out. There's a chance that more or less than two of each of the C variables could exist, so I needed to dynamically program it, but was able to figure out a method in the end, based on what you suggested. Thanks ever so much

      Comment

      Working...
      X