Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding number of people of a given type for each group

    I have a data set with 1000 observations. As an example, let's consider the following table:
    GroupID Preferences (1 = movie; 2 = sports)
    1 1
    1 2
    1 1
    2 2
    2 2
    2 1
    2 2
    3 1
    3 1
    The first column specifies the group name; each number just refers to an individual in the group.

    For each group, I want to get a data on the number of people that prefer movie.

    So my ideal table will something like this:
    GroupID Preferences No of people preferring movie ( = 1) in the group
    1 1 2
    1 2 2
    1 1 2
    2 2 1
    2 2 1
    2 1 1
    2 2 1
    3 1 2
    3 1 2
    For the later analysis that I want to do, I really need the third column to look as above.

    How can I get such a third column in STATA?

    I think something along the following lines would work but cannot figure out by myself.
    Code:
    gen num_movie = .
    foreach j in var Preferences {
        replace num_movie = GroupID if `j' == 1
    }

  • #2
    Code:
    clear
    input GroupID     Preferences
    1     1
    1     2
    1     1
    2     2
    2     2
    2     1
    2     2
    3     1
    3     1
    end
    
    bysort GroupID: egen wanted = total((Preferences==1))
    
    list, sep(0)
    Results:
    Code:
         +-----------------------------+
         | GroupID   Prefer~s   wanted |
         |-----------------------------|
      1. |       1          1        2 |
      2. |       1          2        2 |
      3. |       1          1        2 |
      4. |       2          2        1 |
      5. |       2          2        1 |
      6. |       2          1        1 |
      7. |       2          2        1 |
      8. |       3          1        2 |
      9. |       3          1        2 |
         +-----------------------------+
    And welcome to Statalist. In future, instead of posting data as a table, follow the FAQ's advice (http://www.statalist.org/forums/help) and use -dataex-. That would save other users' time to recreate the data.

    Comment


    • #3
      Thank you very much. Will incorporate your advice from next time onwards.

      Comment


      • #4
        As a followup, I want to ask another question. If I want to get total number of movie preferences only once for each ID, how can I proceed? In this case, my desired table would be:
        GroupID No. of people preferring movies (=1)
        1 2
        2 1
        3 2

        Comment


        • #5
          Hi Eric,

          Using collapse is one way to get what you wanted, something like:
          Code:
          collapse (sum) Preferences if Preferences==1, by(GroupID)
          Code:
          list GroupID Preferences
          
               +--------------------+
               | GroupID   Prefer~s |
               |--------------------|
            1. |       1          2 |
            2. |       2          1 |
            3. |       3          2 |
               +--------------------+

          Comment


          • #6
            This almost works, but removes the original data set. What I wanted to say earlier is to have the original data intact while having additional columns -- ID (with values 1, 2, 3) and Preferences (with values 2, 1, 2) on the right side of the original data. How can I do this?

            Many thanks for your help.

            Comment


            • #7
              Perhaps this example will start you in a useful direction.
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float(GroupID Preferences)
              1 1
              1 2
              1 1
              2 2
              2 2
              2 1
              2 2
              3 1
              3 1
              end
              
              generate seq = _n
              sort GroupID
              by GroupID: egen PreferMovie = total(Preferences==1)
              by GroupID: egen PreferSports = total(Preferences==2)
              sort seq
              drop seq
              list, abbreviate(16) sepby(GroupID)
              Code:
              . list, abbreviate(16) sepby(GroupID)
              
                   +----------------------------------------------------+
                   | GroupID   Preferences   PreferMovie   PreferSports |
                   |----------------------------------------------------|
                1. |       1             1             2              1 |
                2. |       1             2             2              1 |
                3. |       1             1             2              1 |
                   |----------------------------------------------------|
                4. |       2             2             1              3 |
                5. |       2             2             1              3 |
                6. |       2             1             1              3 |
                7. |       2             2             1              3 |
                   |----------------------------------------------------|
                8. |       3             1             2              0 |
                9. |       3             1             2              0 |
                   +----------------------------------------------------+

              Comment


              • #8
                Originally posted by Eric Bell View Post
                This almost works, but removes the original data set. What I wanted to say earlier is to have the original data intact while having additional columns -- ID (with values 1, 2, 3) and Preferences (with values 2, 1, 2) on the right side of the original data. How can I do this?

                Many thanks for your help.
                Code:
                clear
                input GroupID     Preferences
                1     1
                1     2
                1     1
                2     2
                2     2
                2     1
                2     2
                3     1
                3     1
                end
                
                bysort GroupID: egen wanted = total((Preferences==1))
                
                egen uniqueID = tag(GroupID)
                gen wanted2   = wanted if uniqueID==1
                gsort -uniqueID GroupID
                
                list, sep(0)
                Results:

                Code:
                     +--------------------------------------------------+
                     | GroupID   Prefer~s   wanted   uniqueID   wanted2 |
                     |--------------------------------------------------|
                  1. |       1          1        2          1         2 |
                  2. |       2          2        1          1         1 |
                  3. |       3          1        2          1         2 |
                  4. |       1          1        2          0         . |
                  5. |       1          2        2          0         . |
                  6. |       2          1        1          0         . |
                  7. |       2          2        1          0         . |
                  8. |       2          2        1          0         . |
                  9. |       3          1        2          0         . |
                     +--------------------------------------------------+
                Having said all the above, the advice in #5 is actually very good. If you have the syntax file, you can always go back to the repeated format using -preserve- and -restore-. Mixing cross-sectional data and longitudinal data like above will be very confusing later.
                Last edited by Ken Chui; 15 Nov 2021, 06:26.

                Comment

                Working...
                X