Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse (mean) and data ordering

    Hi statalist,

    I'm working with a data set that resembles the following:

    ID Income
    1 50
    1 40
    1 20
    2 10
    2 40
    2 50
    3 60
    3 20
    3 10

    I used collapse (mean) Income, by (ID)

    now after collapsing the data is appearing in the following form
    ID Income
    2 Mean(2)
    3 Mean(3)
    1 Mean (1)

    I need the output in the same order as before, i.e, in the form of
    ID Income
    1 Mean(1)
    2 Mean(2)
    3 Mean(3)
    what should I do to obtain the means in the same order as the original data?

    Thanks.

  • #2
    Titir:
    welcome to this forum.
    Code:
    sort ID
    should do the trick.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3

      It doesn't seem to be working. The ID in the original data are not in any ascending or descending order. It's like the following:
      ID Income
      97xx45 50
      97xx45 40
      97xx45 20
      65fg56 10
      65fg56 40
      65fg56 50
      99ty66 60
      99ty66 20
      99ty66 10
      maybe that is why the sort ID is not working?
      Last edited by Titir Bhattacharya; 01 Mar 2019, 12:11.

      Comment


      • #4
        Titir:
        that's what I got from your data:
        Code:
        . input str20 ID Income
        
                               ID     Income
          1.  97xx45 50
          2.  97xx45 40
          3.  97xx45 20
          4.  65fg56 10
          5.  65fg56 40
          6.  65fg56 50
          7.  99ty66 60
          8.  99ty66 20
          9.  99ty66 10
         10. end
        
        . sort ID
        
        . list
        
             +-----------------+
             |     ID   Income |
             |-----------------|
          1. | 65fg56       50 |
          2. | 65fg56       10 |
          3. | 65fg56       40 |
          4. | 97xx45       20 |
          5. | 97xx45       50 |
             |-----------------|
          6. | 97xx45       40 |
          7. | 99ty66       20 |
          8. | 99ty66       60 |
          9. | 99ty66       10 |
             +-----------------+
        
        . collapse (mean) Income, by( ID)
        
        . list
        
             +-------------------+
             |     ID     Income |
             |-------------------|
          1. | 65fg56   33.33333 |
          2. | 97xx45   36.66667 |
          3. | 99ty66         30 |
             +-------------------+
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          If your data are not sorted by any variable in the dataset, you need to create a variable that they can then be sorted by
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str6 id byte income
          "97xx45" 50
          "97xx45" 40
          "97xx45" 20
          "65fg56" 10
          "65fg56" 40
          "65fg56" 50
          "99ty66" 60
          "99ty66" 20
          "99ty66" 10
          end
          
          generate sorted = _n
          collapse (firstnm) sorted (mean) income, by(id)
          sort sorted
          list, clean noobs
          Code:
          . list, clean noobs
          
                  id   sorted    income  
              97xx45        1   36.6667  
              65fg56        4   33.3333  
              99ty66        7        30
          Next, a piece of advice to improve your future posts. You've seen here that showing data that resembles your data gets an answer that resembles the correct answer. Take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

          The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

          Comment


          • #6
            so the data are getting sorted in ascending order of the first numerical part of the ID. is there any way to preserve the original ordering of the data ( i.e 97xx45 65fg56 99ty66) after collapse is done? i.e any way to obtain the following:

            ID Income
            97xx45 36.66667
            65fg56 33.33333
            99ty66 30

            Comment


            • #7
              Originally posted by William Lisowski View Post
              If your data are not sorted by any variable in the dataset, you need to create a variable that they can then be sorted by
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str6 id byte income
              "97xx45" 50
              "97xx45" 40
              "97xx45" 20
              "65fg56" 10
              "65fg56" 40
              "65fg56" 50
              "99ty66" 60
              "99ty66" 20
              "99ty66" 10
              end
              
              generate sorted = _n
              collapse (firstnm) sorted (mean) income, by(id)
              sort sorted
              list, clean noobs
              Code:
              . list, clean noobs
              
              id sorted income
              97xx45 1 36.6667
              65fg56 4 33.3333
              99ty66 7 30
              Next, a piece of advice to improve your future posts. You've seen here that showing data that resembles your data gets an answer that resembles the correct answer. Take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

              The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
              Thanks a lot. I'll most definitely follow your advice.

              Comment


              • #8
                Titir,

                Have you tried the code that William Lisowski showed in #5? It will do that.

                If it somehow does not, then please post back with fresh example data and the code you used showing how it fails.

                Added: This was written in response to #6, and it apparently crossed with #7.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Titir,

                  Have you tried the code that William Lisowski showed in #5? It will do that.

                  If it somehow does not, then please post back with fresh example data and the code you used showing how it fails.

                  Added: This was written in response to #6, and it apparently crossed with #7.
                  Hi, I'm going to post a fresh example data. But just a quick question before that, is there any code to create the variable where the data can be sorted? I could manually enter the 9 observations mentioned here but my data set contains too many observations to be entered manually.

                  Thanks.

                  Comment


                  • #10
                    William Lisowski's code does that. That's what the -generate sorted = _n- command does. Run it with your real data and you will see.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      William Lisowski's code does that. That's what the -generate sorted = _n- command does. Run it with your real data and you will see.
                      It worked. Thank you so much.

                      Comment


                      • #12
                        Originally posted by William Lisowski View Post
                        If your data are not sorted by any variable in the dataset, you need to create a variable that they can then be sorted by
                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input str6 id byte income
                        "97xx45" 50
                        "97xx45" 40
                        "97xx45" 20
                        "65fg56" 10
                        "65fg56" 40
                        "65fg56" 50
                        "99ty66" 60
                        "99ty66" 20
                        "99ty66" 10
                        end
                        
                        generate sorted = _n
                        collapse (firstnm) sorted (mean) income, by(id)
                        sort sorted
                        list, clean noobs
                        Code:
                        . list, clean noobs
                        
                        id sorted income
                        97xx45 1 36.6667
                        65fg56 4 33.3333
                        99ty66 7 30
                        Next, a piece of advice to improve your future posts. You've seen here that showing data that resembles your data gets an answer that resembles the correct answer. Take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

                        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
                        thank you so so much for the code. I got the result.

                        Comment


                        • #13
                          Really useful. Thank you for both the question and the solution.

                          Comment

                          Working...
                          X