Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting frequency that a variable equals a certain number

    Hello,

    My data is currently in a wide format, at the id/month/yr level. Each individual, in a month-yr, can use as many programs as they need and these are shaped wide like program1 program2 program3 (imagine programs like Medicaid, health services, etc). However, instead of saying medicaid, they are coded with numbers (i.e. 27=medicaid.) There is a data example below. So, for instance, for the person with ID 721 below, I want to create a variable that counts how many times a certain number shows up across his observations. So, you can see for instance, that the number 78 (a program key) shows up in his first observation, and in his third observation (but it is the second program he uses and thus under program_key2 in that month-yr). So, for 721 I would have some variable that says count_78=2. This means 2 times in his history, he has used program 78. Thanks in advance for help on this and let me know if anything is confusing.

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id int(program_key1 program_key2)

    721 78 84
    721 66 70
    721 299 78
    809 84 216
    809 78 211
    809 66 84
    end
    [/CODE]

  • #2
    I think this may do it correctly, although I feel like there is probably a more elegant way:

    Code:
    foreach v of numlist 216 34 46 50 287 293 294 217 187 21 270 170 191 192 193 194 27 48 59 261 70 71 255 258 73 259 257 66 299 300 303 211 248 78 79 104 82 83 84 85 {
            gen count`v'=0
            foreach x of numlist 1/21 {
            replace count`v'=count`v'+1 if program_key`x'==`v'
        }
    
    }

    Comment


    • #3
      Well, that will not do what you ask, because it does not combine the re-occurrences of the same value of program_key in different observations of the same id. Moreover, it will leave you with 40 different count variables which will be, at best, unpleasant to work with.

      The trick is to go to long layout. Then it becomes easy:
      Code:
      gen long obs_no = _n
      reshape long program_key, i(obs_no) j(_j)
      contract id program_key
      Note: the example data does not really distinguish among different observations for the same id. In your description, it sounds as if these may differ in terms of a monthly date variable that you did not show. If that is the case, instead of creating the obs_no variable, change the -i()- option in the -reshape- command to -i(id name_of_your_monthly_date_variable)-.

      Comment


      • #4
        Ah, thank you! -CJ

        Comment


        • #5
          One more question on this. If I wanted to count how many program_keys were not missing for an id in a month-yr, how would I do? For instance, id 1 in 2017 month 1 might have values for program_key1 and program_key2 but program_key3 and program_key4 might be missing. So, the varaible I want would equal 2 to show that he had two non-missing program keys (used two programs that month). Thanks -CJ

          Comment


          • #6
            Well, the number of non-missing program_keys would be the total of the number of selections of specific keys. So you just have to add up the results from the previous code.
            Code:
            by id (program_key), sort: egen non_missing_keys = total(_freq)

            Comment


            • #7
              See

              Code:
              help egen
              for rowwise functions to count missing and non-missing values.

              It's also something you can do directly by say

              Code:
              gen nonmissing = 0 
              
              forval j = 1/4 { 
                    replace nonmissing = nonmissing + !missing(program_key`j') 
              }
              or even

              Code:
              gen nonmissing = !missing(program_key1) + !missing(program_key2) + !missing(program_key3) + !missing(program_key4)

              Comment


              • #8
                The solutions proposed in #4 are incomplete because in this data, an individual may have multiple observations and the desired totals go across observations as well as across variables. So those solutions would need to be followed by a -by id, sort: egen wanted = total(nonmissing)-.

                Comment

                Working...
                X