Finding number of people of a given type for each group

Eric Bell

Join Date: Nov 2021

Posts: 6
#1

Finding number of people of a given type for each group

12 Nov 2021, 18:11

I have a data set with 1000 observations. As an example, let's consider the following table:
GroupID Preferences (1 = movie; 2 = sports)

1 1

1 2

1 1

2 2

2 2

2 1

2 2

3 1

3 1

The first column specifies the group name; each number just refers to an individual in the group.

For each group, I want to get a data on the number of people that prefer movie.

So my ideal table will something like this:
GroupID Preferences No of people preferring movie ( = 1) in the group

1 1 2

1 2 2

1 1 2

2 2 1

2 2 1

2 1 1

2 2 1

3 1 2

3 1 2

For the later analysis that I want to do, I really need the third column to look as above.

How can I get such a third column in STATA?

I think something along the following lines would work but cannot figure out by myself.

Code:

gen num_movie = . foreach j in var Preferences { replace num_movie = GroupID if `j' == 1 }
Tags: data, foreach, interaction, loop, Suggestion

Ken Chui

Join Date: Aug 2014
Posts: 1058

12 Nov 2021, 18:17

Code:

clear
input GroupID     Preferences
1     1
1     2
1     1
2     2
2     2
2     1
2     2
3     1
3     1
end

bysort GroupID: egen wanted = total((Preferences==1))

list, sep(0)

Results:

Code:

     +-----------------------------+
     | GroupID   Prefer~s   wanted |
     |-----------------------------|
  1. |       1          1        2 |
  2. |       1          2        2 |
  3. |       1          1        2 |
  4. |       2          2        1 |
  5. |       2          2        1 |
  6. |       2          1        1 |
  7. |       2          2        1 |
  8. |       3          1        2 |
  9. |       3          1        2 |
     +-----------------------------+

And welcome to Statalist. In future, instead of posting data as a table, follow the FAQ's advice (http://www.statalist.org/forums/help) and use -dataex-. That would save other users' time to recreate the data.

Comment

Eric Bell

Join Date: Nov 2021

Posts: 6
#3

12 Nov 2021, 21:52

Thank you very much. Will incorporate your advice from next time onwards.
Comment
Eric Bell

Join Date: Nov 2021

Posts: 6
#4

13 Nov 2021, 16:33

As a followup, I want to ask another question. If I want to get total number of movie preferences only once for each ID, how can I proceed? In this case, my desired table would be:
GroupID No. of people preferring movies (=1)

1 2

2 1

3 2
Comment

Dung Le

Join Date: May 2018
Posts: 120

13 Nov 2021, 18:44

Hi Eric,

Using collapse is one way to get what you wanted, something like:

Code:

collapse (sum) Preferences if Preferences==1, by(GroupID)

Code:

list GroupID Preferences

     +--------------------+
     | GroupID   Prefer~s |
     |--------------------|
  1. |       1          2 |
  2. |       2          1 |
  3. |       3          2 |
     +--------------------+

Comment

Eric Bell

Join Date: Nov 2021

Posts: 6
#6

14 Nov 2021, 12:23

This almost works, but removes the original data set. What I wanted to say earlier is to have the original data intact while having additional columns -- ID (with values 1, 2, 3) and Preferences (with values 2, 1, 2) on the right side of the original data. How can I do this?

Many thanks for your help.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

14 Nov 2021, 15:54

Perhaps this example will start you in a useful direction.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(GroupID Preferences)
1 1
1 2
1 1
2 2
2 2
2 1
2 2
3 1
3 1
end

generate seq = _n
sort GroupID
by GroupID: egen PreferMovie = total(Preferences==1)
by GroupID: egen PreferSports = total(Preferences==2)
sort seq
drop seq
list, abbreviate(16) sepby(GroupID)

Code:

. list, abbreviate(16) sepby(GroupID)

     +----------------------------------------------------+
     | GroupID   Preferences   PreferMovie   PreferSports |
     |----------------------------------------------------|
  1. |       1             1             2              1 |
  2. |       1             2             2              1 |
  3. |       1             1             2              1 |
     |----------------------------------------------------|
  4. |       2             2             1              3 |
  5. |       2             2             1              3 |
  6. |       2             1             1              3 |
  7. |       2             2             1              3 |
     |----------------------------------------------------|
  8. |       3             1             2              0 |
  9. |       3             1             2              0 |
     +----------------------------------------------------+

Comment

Ken Chui

Join Date: Aug 2014
Posts: 1058

15 Nov 2021, 06:22

Originally posted by Eric Bell View Post

This almost works, but removes the original data set. What I wanted to say earlier is to have the original data intact while having additional columns -- ID (with values 1, 2, 3) and Preferences (with values 2, 1, 2) on the right side of the original data. How can I do this?

Many thanks for your help.

Code:

clear
input GroupID     Preferences
1     1
1     2
1     1
2     2
2     2
2     1
2     2
3     1
3     1
end

bysort GroupID: egen wanted = total((Preferences==1))

egen uniqueID = tag(GroupID)
gen wanted2   = wanted if uniqueID==1
gsort -uniqueID GroupID

list, sep(0)

Results:

Code:

     +--------------------------------------------------+
     | GroupID   Prefer~s   wanted   uniqueID   wanted2 |
     |--------------------------------------------------|
  1. |       1          1        2          1         2 |
  2. |       2          2        1          1         1 |
  3. |       3          1        2          1         2 |
  4. |       1          1        2          0         . |
  5. |       1          2        2          0         . |
  6. |       2          1        1          0         . |
  7. |       2          2        1          0         . |
  8. |       2          2        1          0         . |
  9. |       3          1        2          0         . |
     +--------------------------------------------------+

Having said all the above, the advice in #5 is actually very good. If you have the syntax file, you can always go back to the repeated format using -preserve- and -restore-. Mixing cross-sectional data and longitudinal data like above will be very confusing later.

Last edited by Ken Chui; 15 Nov 2021, 06:26.

GroupID	Preferences (1 = movie; 2 = sports)
1	1
1	2
1	1
2	2
2	2
2	1
2	2
3	1
3	1

GroupID	No. of people preferring movies (=1)
1	2
2	1
3	2

Announcement