Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find observation matching a value and return value from that observation

    Dear Statalist,

    I'm running an ungoing panel survey where new data are added continuously, so I need to write code to add the new data to existing datasets. I've developed a strategy for merging new data with existing datasets. I've determined that using a long format is best for my situation.

    The respondents answer the survey after receiving an intervention with a follow-up survey on the same intervention a year after. The same respondent may receive several interventions and the interventions may be of two different kinds. Sometimes the follow-up data is missing.

    I will have a unique identifier for the respondent ID (ID), the intervention date (Date), whether the entry is a follow-up (Followup) and which intervention group the data is on (Group). I’ve made a dataex example below to develop my code on.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte ID int Date byte(Follow-up Group)
    1 22738 0 0
    1 22312 0 1
    1 22312 1 1
    2 21947 0 0
    2 21947 1 0
    3 21785 0 0
    3 22758 0 0
    3 21785 1 0
    3 22758 1 0
    4 22166 0 1
    4 22404 0 1
    5 21530 0 0
    5 21530 1 0
    6 22670 0 1
    6 22670 1 1
    7 22706 0 0
    7 23013 0 0
    7 23013 1 0
    end
    format %td Date
    What I need to create is a variable identifying the intervention number from the above data. This is to ensure, I can work with just ‘first interventions’ or ‘second interventions’ etc. and their folllowups when I work with the data.

    I can almost achieve that with this code:
    Code:
    sort ID Date Follow , stable
    bysort  ID  Follow : generate number =_n if Group<1
    bysort  ID   Follow  : replace number =_n if Group
    But it fails when there is no follow-up data for the first intervention, assigning the ‘number’ value of the first intervention to the second intervention follow-up.

    I figure a solution could be to restrict the above to the initial surveys and not follow-ups (e.g. &if Follow<1 to the bysort commands above) and then create code for the follow-ups where the Stata finds the matching ID and Date and then copy whatever number is assigned for the intervention. I tried the vlookup ado but it won’t work with my date information.

    Solutions?

  • #2
    It's difficult to follow your explanation. Maybe add a third variable showing how you would like your output to look like in the dataex example?

    Comment


    • #3
      That's a good idea. Here's the same dataset with some added observations and the desired number that I need to generate:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte ID int Date byte Followup float desirednumber
      1 22312 0 1
      1 22738 0 2
      1 22312 1 1
      2 21947 0 1
      2 21947 1 1
      3 21785 0 1
      3 22758 0 2
      3 21785 1 1
      3 22758 1 2
      4 22404 0 1
      4 22166 0 2
      5 21530 0 1
      5 21530 1 1
      6 22670 0 1
      6 22670 1 1
      7 22706 0 1
      7 23013 0 2
      7 23013 1 2
      8 21673 0 1
      8 19390 0 2
      8 21277 0 3
      8 19390 1 2
      9 22706 0 1
      9 21785 0 2
      9 21785 1 2
      end
      format %td Date
      Whenever Followup=1 number should be the same as the corresponding observation with the same ID and Date. The code I have been able to write (above) almost gets it right but fails on observations 18, 22 and 25.

      Comment


      • #4
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte ID int Date byte Followup float desirednumber
        1 22312 0 1
        1 22738 0 2
        1 22312 1 1
        2 21947 0 1
        2 21947 1 1
        3 21785 0 1
        3 22758 0 2
        3 21785 1 1
        3 22758 1 2
        4 22404 0 1
        4 22166 0 2
        5 21530 0 1
        5 21530 1 1
        6 22670 0 1
        6 22670 1 1
        7 22706 0 1
        7 23013 0 2
        7 23013 1 2
        8 21673 0 1
        8 19390 0 2
        8 21277 0 3
        8 19390 1 2
        9 22706 0 1
        9 21785 0 2
        9 21785 1 2
        end
        format %td Date
        
        gen long seq=_n
        bysort ID (seq): gen wanted= Followup != Followup[_n-1]
        by ID: replace wanted= wanted[_n-1]+1 if !wanted
        bys ID Date (seq): replace wanted= wanted[1]
        Res.:

        Code:
        . sort seq
        
        . l, sep(0)
        
             +-----------------------------------------------------+
             | ID        Date   Followup   desire~r   seq   wanted |
             |-----------------------------------------------------|
          1. |  1   01feb2021          0          1     1        1 |
          2. |  1   03apr2022          0          2     2        2 |
          3. |  1   01feb2021          1          1     3        1 |
          4. |  2   02feb2020          0          1     4        1 |
          5. |  2   02feb2020          1          1     5        1 |
          6. |  3   24aug2019          0          1     6        1 |
          7. |  3   23apr2022          0          2     7        2 |
          8. |  3   24aug2019          1          1     8        1 |
          9. |  3   23apr2022          1          2     9        2 |
         10. |  4   04may2021          0          1    10        1 |
         11. |  4   08sep2020          0          2    11        2 |
         12. |  5   12dec2018          0          1    12        1 |
         13. |  5   12dec2018          1          1    13        1 |
         14. |  6   25jan2022          0          1    14        1 |
         15. |  6   25jan2022          1          1    15        1 |
         16. |  7   02mar2022          0          1    16        1 |
         17. |  7   03jan2023          0          2    17        2 |
         18. |  7   03jan2023          1          2    18        2 |
         19. |  8   04may2019          0          1    19        1 |
         20. |  8   01feb2013          0          2    20        2 |
         21. |  8   03apr2018          0          3    21        3 |
         22. |  8   01feb2013          1          2    22        2 |
         23. |  9   02mar2022          0          1    23        1 |
         24. |  9   24aug2019          0          2    24        2 |
         25. |  9   24aug2019          1          2    25        2 |
             +-----------------------------------------------------+

        Comment


        • #5
          Thank you for your response, Andrew. It's helpful but I'm afraid the solution does not work when I extend the data with further cases (I am sorry my initial example did not hold all possible situations).


          I tried adding a lot more rows with ID=8 and here the solution breaks down.

          To complicate matters I also need to take account of two different groups (types of interventions) where I would like the numbering to follow position within the group.
          Luckily I found a solution by using sort below.

          Here's an extended example with group added
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte ID int Date byte(Group Followup) float(wanted desired)
          1 22312 1 1 1 1
          1 22312 1 0 1 1
          1 22738 0 0 1 1
          2 21947 0 0 1 1
          2 21947 0 1 1 1
          3 21785 0 1 1 1
          3 21785 0 0 1 1
          3 22758 0 1 1 2
          3 22758 0 0 1 2
          4 22166 1 0 1 1
          4 22404 1 0 2 2
          5 21530 0 1 1 1
          5 21530 0 0 1 1
          6 22670 1 0 1 1
          6 22670 1 1 1 1
          7 22706 0 0 1 1
          7 23013 0 0 2 2
          7 23013 0 1 2 2
          8 19390 0 0 1 1
          8 19390 0 1 1 1
          8 21277 0 0 2 2
          8 21530 0 0 3 3
          8 21530 0 1 3 3
          8 21673 1 0 3 1
          8 21785 0 0 2 4
          8 21785 1 0 2 2
          8 22166 1 0 2 3
          8 22404 0 0 1 5
          8 22706 0 1 1 6
          8 22706 0 0 1 6
          9 21785 0 0 2 1
          9 21785 0 1 2 1
          9 22706 0 0 1 2
          end
          format %td Date

          Here's what seems to be a valid solution:
          Code:
          bysort ID Group Followup (Date): generate number=_n
          sort ID Group Date Follow
          replace number=number[_n-1] if Followup
          It seems to work:
          Code:
          gen correct=0
          replace correct=1 if (number==desired)
          sort ID Date Follow
          list ID Date Group Followup number desired correct , sep(0)
              +----------------------------------------------------------------+
               | ID        Date   Group   Followup   number   desired   correct |
               |----------------------------------------------------------------|
            1. |  1   01feb2021       1          0        1         1         1 |
            2. |  1   01feb2021       1          1        1         1         1 |
            3. |  1   03apr2022       0          0        1         1         1 |
            4. |  2   02feb2020       0          0        1         1         1 |
            5. |  2   02feb2020       0          1        1         1         1 |
            6. |  3   24aug2019       0          0        1         1         1 |
            7. |  3   24aug2019       0          1        1         1         1 |
            8. |  3   23apr2022       0          0        2         2         1 |
            9. |  3   23apr2022       0          1        2         2         1 |
           10. |  4   08sep2020       1          0        1         1         1 |
           11. |  4   04may2021       1          0        2         2         1 |
           12. |  5   12dec2018       0          0        1         1         1 |
           13. |  5   12dec2018       0          1        1         1         1 |
           14. |  6   25jan2022       1          0        1         1         1 |
           15. |  6   25jan2022       1          1        1         1         1 |
           16. |  7   02mar2022       0          0        1         1         1 |
           17. |  7   03jan2023       0          0        2         2         1 |
           18. |  7   03jan2023       0          1        2         2         1 |
           19. |  8   01feb2013       0          0        1         1         1 |
           20. |  8   01feb2013       0          1        1         1         1 |
           21. |  8   03apr2018       0          0        2         2         1 |
           22. |  8   12dec2018       0          0        3         3         1 |
           23. |  8   12dec2018       0          1        3         3         1 |
           24. |  8   04may2019       1          0        1         1         1 |
           25. |  8   24aug2019       1          0        2         2         1 |
           26. |  8   24aug2019       0          0        4         4         1 |
           27. |  8   08sep2020       1          0        3         3         1 |
           28. |  8   04may2021       0          0        5         5         1 |
           29. |  8   02mar2022       0          0        6         6         1 |
           30. |  8   02mar2022       0          1        6         6         1 |
           31. |  9   24aug2019       0          0        1         1         1 |
           32. |  9   24aug2019       0          1        1         1         1 |
           33. |  9   02mar2022       0          0        2         2         1 |
               +----------------------------------------------------------------+

          Comment

          Working...
          X