Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shift missing values to end of columns, indent expressions from end to front, sort expressions

    Hi all,

    I have a large time series data set in this form:

    id name year price1 ratio1 price2 ratio2 priceN ratioN
    1 A 1888 0.5 0.5 . . 0.7 0.7
    1 A 1889 0.1 . 0.2 0.3 . .
    2 B 1888 . 0.8 . . 0.9 0.8
    2 B 1889 . . 0.4 . 0.5 0.5



    with N=2200.

    Since there are a lot of missing values and I am almost at the column limit, I want to shift those expressions that are missing to the end and indent those expressions that are at the end. It would be best if the values were sorted in descending order along the variables price or ratio, so that

    id name year price1 ratio1 price2 ratio2 priceN ratioN
    1 A 1888 0.7 0.7 0.5 0.5 . .
    1 A 1889 0.2 0.3 0.1 . . .
    2 B 1888 0.9 0.8 . 0.8 . .
    2 B 1889 0.5 0.5 0.4 . . .


    Any idea how I should proceed?
    I would be very grateful for any help, tips and tricks.

    Kind regards,
    Angelica



  • #2
    I'd be pretty sure that what you ask for here will not be useful to you. For example, when you suggest that you want to shift values to the end (right hand end??, I presume), your variable names then become meaningless, as, for example, the value price1 might become price10. Perhaps there is some context here two values could be interchanged in this way without destroying the sense of the data, but I'd think that to be pretty unusual. My best guess is that you want to do something that would require a so-called "long layout" in Stata, in which the 1, 2, ..., N markers would be variables having some substantive meaning. In that context, the "column limit" (limit on number of variables) is not an issue.

    However, I think the problem is somewhat deeper, and I would suggest you step back for a minute and explain to us what you kind of an analysis you want to do with the data. Some substantive context will like be useful here. My suspicion is that everything you think you want to do by way of rearranging data would at best be irrelevant to your purposes, and more likely would be actively harmful.

    Also, as the system shows this is your first post here, I'd encourage you to take a look at the FAQ for new users (tab at the top left of the screen) in which, among other things, you can learn about using the -dataex- command to provide convenient example data.

    Comment


    • #3
      I agree with Mike's advice on the soundness of this. To address the problem as posed (assuming that the ordering 1,..., N here is arbitrary), you just need to reshape long the data, sort and then reshape wide,

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float id str3 name float(year price1 ratio1 price2 ratio2 price3 ratio3)
      1 "A" 1888 .5 .5  .  . .7 .7
      1 "A" 1889 .1  . .2 .3  .  .
      2 "B" 1888  . .8  .  . .9 .8
      2 "B" 1889  .  . .4  . .5 .5
      end
      
      reshape long price ratio, i(id name year) j(which)
      sort id name year price
      by id name year: replace which=_n
      reshape wide price ratio, i(id name year) j(which)
      Res.:

      Code:
      . l
      
           +------------------------------------------------------------------------+
           | id   name   year   price1   ratio1   price2   ratio2   price3   ratio3 |
           |------------------------------------------------------------------------|
        1. |  1      A   1888       .5       .5       .7       .7        .        . |
        2. |  1      A   1889       .1        .       .2       .3        .        . |
        3. |  2      B   1888       .9       .8        .       .8        .        . |
        4. |  2      B   1889       .4        .       .5       .5        .        . |
           +------------------------------------------------------------------------+
      
      .

      Comment


      • #4
        Thanks to you both. I made a mistake when providing the data (it's confidential so I have to come up with a comparable example). So there is a "customer" to each price and ratio, such that when values are interchanged, say price1 becomes price10, this does not matter as long as customer1 also becomes customer10. Here goes the adjusted version and Andrew's code:


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte id str1 name int year str6 customer1 float(price1 ratio1) str7 customer2 float(price2 ratio2) str7 customer3 float(price3 ratio3)
        1 "A" 1888 "Peter"  .5 .5 "Jessica"  .  . "Lindy"   .7 .7
        1 "A" 1889 "Anna"   .1  . "Josh"    .2 .3 "Jessica"  .  .
        2 "B" 1888 "Lindy"   . .8 "Hailey"   .  . "Joan"    .9 .8
        2 "B" 1889 "Harvey"  .  . "Hailey"  .4  . "Nancy"   .5 .5
        end
        
        reshape long customer price ratio, i(id name year) j(which)
        sort id name year price
        by id name year: replace which=_n
        reshape wide customer price ratio, i(id name year) j(which)
        This is already helpful as it puts to the end those customers with missing values:

        Code:
        list id name year customer1 price1 ratio1 customer2 price2 ratio2 customer3 price3 ratio3
        
             +---------------------------------------------------------------------------------------------------------------------+
             | id   name   year   custom~1      price1      ratio1   custom~2      price2      ratio2   custom~3   price3   ratio3 |
             |---------------------------------------------------------------------------------------------------------------------|
          1. |  1      A   1888      Peter          .5          .5      Lindy   .69999999   .69999999    Jessica        .        . |
          2. |  1      A   1889       Anna          .1           .       Josh          .2   .30000001    Jessica        .        . |
          3. |  2      B   1888       Joan   .89999998   .80000001      Lindy           .   .80000001     Hailey        .        . |
          4. |  2      B   1889     Hailey   .40000001           .      Nancy          .5          .5     Harvey        .        . |
             +---------------------------------------------------------------------------------------------------------------------+
        However, it would be great to have the order such that the customer with the highest price becomes customer1 in each year, followed by the second highest price by customer 2 and so on, yielding a table like:

        Code:
             +---------------------------------------------------------------------------------------------------------------------+
             | id   name   year   custom~1      price1      ratio1   custom~2      price2      ratio2   custom~3   price3   ratio3 |
             |---------------------------------------------------------------------------------------------------------------------|
          1. |  1      A   1888      Lindy   .69999999   .69999999      Peter          .5          .5    Jessica        .        . |
          2. |  1      A   1889       Josh          .2   .30000001       Anna          .1           .    Jessica        .        . |
          3. |  2      B   1888       Joan   .89999998   .80000001      Lindy           .   .80000001     Hailey        .        . |
          4. |  2      B   1889      Nancy          .5          .5     Hailey   .40000001           .     Harvey        .        . |
             +---------------------------------------------------------------------------------------------------------------------+
        Does that make more sense? Sorry about the decimal numbers, don't know what's wrong there.

        Comment


        • #5
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte id str1 name int year str6 customer1 float(price1 ratio1) str7 customer2 float(price2 ratio2) str7 customer3 float(price3 ratio3)
          1 "A" 1888 "Peter"  .5 .5 "Jessica"  .  . "Lindy"   .7 .7
          1 "A" 1889 "Anna"   .1  . "Josh"    .2 .3 "Jessica"  .  .
          2 "B" 1888 "Lindy"   . .8 "Hailey"   .  . "Joan"    .9 .8
          2 "B" 1889 "Harvey"  .  . "Hailey"  .4  . "Nancy"   .5 .5
          end
          
          reshape long customer price ratio, i(id name year) j(which)
          gen missing=missing(ratio)&missing(price)
          gsort id name year missing -price
          by id name year: replace which=_n
          drop missing
          reshape wide customer price ratio, i(id name year) j(which)
          Res.:

          Code:
          . l
          
               +---------------------------------------------------------------------------------------------------------+
               | id   name   year   custom~1   price1   ratio1   custom~2   price2   ratio2   custom~3   price3   ratio3 |
               |---------------------------------------------------------------------------------------------------------|
            1. |  1      A   1888      Lindy       .7       .7      Peter       .5       .5    Jessica        .        . |
            2. |  1      A   1889       Josh       .2       .3       Anna       .1        .    Jessica        .        . |
            3. |  2      B   1888       Joan       .9       .8      Lindy        .       .8     Hailey        .        . |
            4. |  2      B   1889      Nancy       .5       .5     Hailey       .4        .     Harvey        .        . |
               +---------------------------------------------------------------------------------------------------------+

          Comment


          • #6
            Excellent, many thanks.

            Comment

            Working...
            X