Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying under complex conditions in a panel

    Hello,

    I have a rather complex problem. I have a unbalanced panel dataset with Insurancenumbers (meaning people) and ATC-Codes (as strings) for prescriptions and want to identify those who have a certain amount of same ATC-Codes to generate new variables. The structure is as follows:

    Insurancenumber - ATC

    1 - R03
    1 - N07
    1 - R03
    1 - A07
    1 - X01
    1 - R03
    2 - B00
    2 - B00
    2 - F09
    3 - G03
    4 - B00
    4 - R03
    .

    I have certain definitions in the form of: If someone had 3 or more times "R03" he gets X1=1 and otherwise X1=0 (one new variable I want to create). If someone has for example 2 or more times "B00" he gets X2=1 and otherwise X2=0. Resulting in:

    Insurancenumber - ATC - X1 - X2

    1 - R03 - 1 - 0
    1 - N07 - 1 - 0
    1 - R03 - 1 - 0
    1 - A07 - 1 - 0
    1 - X01 - 1 - 0
    1 - R03 - 1 - 0
    2 - B00 - 0 - 1
    2 - B00 - 0 - 1
    2 - F09 - 0 - 1
    3 - G03 - 0 - 0
    4 - B00 - 0 - 0
    4 - R03 - 0 - 0
    .
    .
    .

    If another person had for example both "R03" > 3 and "B00">2, both X1 and X2 would be 1.

    I am desperately looking for the command-structure to do this and hope someone has a solution !

    Best regards

    Karl Emmert-Fees
    Last edited by Karl Emmert-Fees; 28 Aug 2015, 02:02. Reason: Added tags

  • #2
    Stata is very, very good at problems like these. Note that if you had used dataex (SSC) you could have created example input code similar to that below.

    Code:
    input Insurancenumber str3 ATC
    1  R03
    1  N07
    1  R03
    1  A07
    1  X01
    1  R03
    2  B00
    2  B00
    2  F09
    3  G03
    4  B00
    4  R03
    end
    Now the trick is that every time we see R03 then ATC == "R03" is evaluated as 1 and otherwise that expression is evaluated as 0. This is just counting. When we are done with each person we see what the (cumulative) sum of our counts is. And similarly with the other ATC values.

    Code:
    bysort Ins : gen X1 = sum(ATC == "R03")
    by Ins: replace X1 = X1[_N] >= 3
    by Ins : gen X2 = sum(ATC == "B00")
    by Ins: replace X2 = X2[_N] >= 2
    list, sepby(Ins)
    Here is the result:

    Code:
         +--------------------------+
         | Insura~r   ATC   X1   X2 |
         |--------------------------|
      1. |        1   R03    1    0 |
      2. |        1   N07    1    0 |
      3. |        1   R03    1    0 |
      4. |        1   A07    1    0 |
      5. |        1   X01    1    0 |
      6. |        1   R03    1    0 |
         |--------------------------|
      7. |        2   B00    0    1 |
      8. |        2   B00    0    1 |
      9. |        2   F09    0    1 |
         |--------------------------|
     10. |        3   G03    0    0 |
         |--------------------------|
     11. |        4   B00    0    0 |
     12. |        4   R03    0    0 |
         +--------------------------+
    Some people would use egen here, and that's fine too. For example,

    Code:
    bysort Ins : egen X1 = total(ATC == "R03")
    replace X1 = X1 >= 3
    by Ins : egen X2 = total(ATC == "B00")
    replace X2 = X2 >= 2
    I have used >= here to correspond to 3 or more and 2 or more in your first formulation. Your last sentence with code doesn't match your initial formulation. You choose.

    See also

    FAQ . . . . . . . . . . . . . . . . . . . . . . . True and false in Stata
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    8/05 What is true and false in Stata?
    http://www.stata.com/support/faqs/data-management/
    true-and-false/

    SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q1/02 SJ 2(1):86--102 (no commands)
    explains the use of the by varlist : construct to tackle
    a variety of problems with group structure, ranging from
    simple calculations for each of several groups to more
    advanced manipulations that use the built-in _n and _N

    http://www.stata-journal.com/sjpdf.h...iclenum=pr0004



    Comment


    • #3
      Welcome to the Forum. Please take the time to read the FAQ, in particular the sections about how to post material legibly and effectively. I.e. not as you have done! Your listing of the data extract is a great idea (and as the FAQ recommends), but it's in a format that is definitely not easily usable by people who wish to "play" with it to answer your question.

      You need to take advantage of Stata's great "bygroup" capabilities.

      Here is the recommended format for showing us your data (see the FAQ).
      Code:
          inumber   atc   x1   x2 
                1   R03    1    0 
                1   N07    1    0 
                1   R03    1    0 
                1   A07    1    0 
                1   X01    1    0 
                1   R03    1    0 
                2   B00    0    1 
                2   B00    0    1 
                2   F09    0    1 
                3   G03    0    0 
                4   B00    0    0 
                4   R03    0    0
      Here is sample code that re-creates your "x1" and "x2" variables in new variables "x1_new" and "x2_new". There may be more succinct ways of doing this, but note the principles involved.

      Code:
      ge RO3present = (atc == "R03")
      bysort inumber: egen totalR03 = total(RO3present)
      ge x1_new = (totalR03 >= 3)
      ge B00present = (atc == "B00")
      bysort inumber: egen totalB00 = total(B00present)
      ge x2_new = (totalB00 >= 2)
      list, sepby(inumber) noobs
      and the results are
      Code:
      . list, sepby(inumber) noobs
      
        +---------------------------------------------------------------------------------------+
        | inumber   atc   x2   x1   RO3pre~t   totalR03   x1_new   B00pre~t   totalB00   x2_new |
        |---------------------------------------------------------------------------------------|
        |       1   R03    0    1          1          3        1          0          0        0 |
        |       1   N07    0    1          0          3        1          0          0        0 |
        |       1   R03    0    1          1          3        1          0          0        0 |
        |       1   A07    0    1          0          3        1          0          0        0 |
        |       1   X01    0    1          0          3        1          0          0        0 |
        |       1   R03    0    1          1          3        1          0          0        0 |
        |---------------------------------------------------------------------------------------|
        |       2   B00    1    0          0          0        0          1          2        1 |
        |       2   B00    1    0          0          0        0          1          2        1 |
        |       2   F09    1    0          0          0        0          0          2        1 |
        |---------------------------------------------------------------------------------------|
        |       3   G03    0    0          0          0        0          0          0        0 |
        |---------------------------------------------------------------------------------------|
        |       4   B00    0    0          0          1        0          1          1        0 |
        |       4   R03    0    0          1          1        0          0          1        0 |
        +---------------------------------------------------------------------------------------+
      You can of course now delete the variables you don't need

      Comment


      • #4
        Hello Nick,

        Thank you very much, it works perfectly. Since I dont just want to copy everything one more question for my understanding:

        by Ins: replace X1 = X1[_N] >= 3 Is that a short form of writing:

        by Ins: replace X1 = 1 if X1[_N] >= 3 ?

        All the best

        Karl Emmert-Fees

        Comment


        • #5
          Stephen's code clearly conveys the same message as mine. A nuance is that the argument of egen, total() can be an expression and does not have to be a variable name.

          Comment

          Working...
          X