Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying specific numbers within a variable to generate a new var.

    Hi,

    Could someone kindly point out where I am going wrong in trying to identify recode specific numbers within a variable as a new variable?

    I have a large CPRD Aurum dataset extract with medcodeids, but no attached "label" (dataset 1). I a have a separate .dta file with medcodeids grouped by the underlying meaning eg "chronic kidney disease(CKD)" term or "GFR" term (dataset 2). I have taken all of the medcodeids from dataset 2 relating to GFR, and am trying to identify them within dataset 1, such that I can start to group them. However, when I run the code below, every single observation changes to "1" for GFR, despite the medcodeids NOT matching those that I have put in the code. Terrible explanation sorry, hoping the code below explains better!

    An extract from dataset 1:

    obs_id double(patid medcodeid) float GFR
    1 1000000020274 380389013 1
    2 1000000320274 976481000006110 1
    3 1000000320274 976481000006110 1
    4 1000000320274 976481000006110 1
    5 1000000320274 380389013 1
    6 1000000320274 380389013 1
    7 1000000320274 380389013 1
    8 1000000420274 380389013 1
    9 1000000420274 380389013 1
    10 1000000420274 976481000006110 1
    11 1000000420274 380389013 1
    12 1000000420274 976481000006110 1
    13 1000000420274 976481000006110 1
    14 1000000620274 380389013 1
    15 1000000620274 380389013 1
    16 1000000620274 380389013 1
    17 1000000620274 380389013 1
    18 1000000620274 282610015 1
    19 1000000620274 133205018 1
    20 1000000620274 304071000000115 1



    To which I have applied the following code:

    gen GFR=0
    replace GFR=1 if medcodeid==976481000006110|1942831000006114|545152 1000006118|371441000000114|12621921000006110|18549 91000006119|8352981000006116|1540241000006111|1332 05018|8250311000006118|1942821000006111|1268044100 0006112|1744631000006112|8069731000006118|12621931 000006112|1866321000006117| 8294821000006118


    As you can see, the medcodeid for obs 1 (380389013) is not found in the above list, and yet GFR is tagged as "1".


    a) Could someone possible point out my error?

    b) If you had a magical way of applying the medcodeid label (ie "GFR"/"CKD") to dataset 1, without having to manually copy the medcodeids from dataset 2 and re-write them into a list, that would be even better!

    Thanks very much.

    Jemima




  • #2
    Let's use a simpler example to make the point.

    Code:
    ... if a == 1 | 2 | 3
    is not an alternative to

    Code:
    ... if a == 1 | a == 2 | a == 3 
    But it is legal. It is parsed as

    Code:
    ... if (a == 1) | 2 | 3


    and the rule is if any argument is not zero, the entire expression is evaluated as true. . In the example given, 2 and 3 are always non-zero, so the entire expression is true regardless of the values of a.

    This is evidently biting in your case. As soon as 1942831000006114 is encountered as an argument, the entire expression is always evaluated as true.

    You need to use the inlist() function if you (reasonably) don't want to type out a very long-winded expression.

    More at https://journals.sagepub.com/doi/10....6867X231162009
    Last edited by Nick Cox; 01 Aug 2023, 07:24.

    Comment


    • #3
      Thankyou. Seems very simple in retrospect! Although I have made all the errors you pointed out in your article before getting to this point 😂.

      Thanks so much for your help

      Jemima

      Comment


      • #4
        See also https://www.stata.com/support/faqs/d...s-for-subsets/ for a good approach.

        Comment

        Working...
        X