counting frequency that a variable equals a certain number

Collin James

Join Date: Sep 2021

Posts: 107
#1

counting frequency that a variable equals a certain number

03 Aug 2022, 13:11

Hello,

My data is currently in a wide format, at the id/month/yr level. Each individual, in a month-yr, can use as many programs as they need and these are shaped wide like program1 program2 program3 (imagine programs like Medicaid, health services, etc). However, instead of saying medicaid, they are coded with numbers (i.e. 27=medicaid.) There is a data example below. So, for instance, for the person with ID 721 below, I want to create a variable that counts how many times a certain number shows up across his observations. So, you can see for instance, that the number 78 (a program key) shows up in his first observation, and in his third observation (but it is the second program he uses and thus under program_key2 in that month-yr). So, for 721 I would have some variable that says count_78=2. This means 2 times in his history, he has used program 78. Thanks in advance for help on this and let me know if anything is confusing.

* Example generated by -dataex-. For more info, type help dataex
clear
input long id int(program_key1 program_key2)

721 78 84
721 66 70
721 299 78
809 84 216
809 78 211
809 66 84
end
[/CODE]
Tags: None

Collin James

Join Date: Sep 2021
Posts: 107

03 Aug 2022, 13:34

I think this may do it correctly, although I feel like there is probably a more elegant way:

Code:

foreach v of numlist 216 34 46 50 287 293 294 217 187 21 270 170 191 192 193 194 27 48 59 261 70 71 255 258 73 259 257 66 299 300 303 211 248 78 79 104 82 83 84 85 {
        gen count`v'=0
        foreach x of numlist 1/21 {
        replace count`v'=count`v'+1 if program_key`x'==`v'
    }

}

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30059
#3

03 Aug 2022, 13:52

Well, that will not do what you ask, because it does not combine the re-occurrences of the same value of program_key in different observations of the same id. Moreover, it will leave you with 40 different count variables which will be, at best, unpleasant to work with.

The trick is to go to long layout. Then it becomes easy:

Code:

gen long obs_no = _n reshape long program_key, i(obs_no) j(_j) contract id program_key

Note: the example data does not really distinguish among different observations for the same id. In your description, it sounds as if these may differ in terms of a monthly date variable that you did not show. If that is the case, instead of creating the obs_no variable, change the -i()- option in the -reshape- command to -i(id name_of_your_monthly_date_variable)-.
1 like
Comment
Collin James

Join Date: Sep 2021

Posts: 107
#4

03 Aug 2022, 14:03

Ah, thank you! -CJ
Comment
Collin James

Join Date: Sep 2021

Posts: 107
#5

04 Aug 2022, 07:54

One more question on this. If I wanted to count how many program_keys were not missing for an id in a month-yr, how would I do? For instance, id 1 in 2017 month 1 might have values for program_key1 and program_key2 but program_key3 and program_key4 might be missing. So, the varaible I want would equal 2 to show that he had two non-missing program keys (used two programs that month). Thanks -CJ
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30059
#6

04 Aug 2022, 09:08

Well, the number of non-missing program_keys would be the total of the number of selections of specific keys. So you just have to add up the results from the previous code.

Code:

by id (program_key), sort: egen non_missing_keys = total(_freq)
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35595

04 Aug 2022, 10:33

See

Code:

help egen

for rowwise functions to count missing and non-missing values.

It's also something you can do directly by say

Code:

gen nonmissing = 0 

forval j = 1/4 { 
      replace nonmissing = nonmissing + !missing(program_key`j') 
}

or even

Code:

gen nonmissing = !missing(program_key1) + !missing(program_key2) + !missing(program_key3) + !missing(program_key4)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30059
#8

04 Aug 2022, 10:58

The solutions proposed in #4 are incomplete because in this data, an individual may have multiple observations and the desired totals go across observations as well as across variables. So those solutions would need to be followed by a -by id, sort: egen wanted = total(nonmissing)-.
Comment

Announcement

counting frequency that a variable equals a certain number

Comment

Comment

Comment

Comment

Comment

Comment

Comment