Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding records of one variable (e.g. A, B, C...) where another variable has only one value (e.g. 1 and not 2)

    I have a dataset in which two variables are being counted:
    - varA, with values A, B, C... (subject id)
    - varB, with values 1, 2, 3... (visit type identifier, where visit #1 is screening visit, and #2, #3 and after are interview visits)

    Some subjects will decline participation at the screening visit, and therefore have only a record where varB==1, but no varB==2.

    I wish to count the number of subjects who did not have any visits after the screening visit.
    Last edited by Michael McCulloch; 28 Aug 2014, 15:03.

  • #2
    Code:
    gen revisit=.
    replace revisit=0 if varB==1
    replace revisit=1 if varB>1 & varB!=.
    sort varA
    egen revisit2=max(revisit), by(varA)
    will get you a dummy variable by case, with 0 for those with no revisit, and 1 for those with a revisit

    Comment


    • #3
      Some discussions in

      http://www.stata.com/support/faqs/da...ions-in-group/

      http://www.stata-journal.com/sjpdf.h...iclenum=dm0033

      If I understand your question correctly, you want the distinct values of varA for which varB is only ever 1. If so, that is

      Code:
      bysort varA (varB) : gen always1 = varB[1] == 1 & varB[_N] == 1
      levelsof varB if always1
      tab varB if always1

      Comment


      • #4
        Come to think of it, that's not quite what you're after. After running the code above,

        Code:
        preserve
        collapse (max) revisit, by(varA)
        tab revisit
        restore

        Comment


        • #5
          Thanks Nick! Nice to be in touch again. You helped me tremendously during my dissertation days. Have just been spending a lot of time in SQL land... now back in Stata land!
          Your code does just what I want, if I change the last line to "tab varB if always1".

          If it's okay to ask follow-up questions in this new addition of Statalist, how would I modify your code if, in addition to the above, I also wanted to count those where:
          - varA could also be 1 or 99, but still varB is always1?
          - varB could be 2, but visit#1 was not recorded?

          And... thanks Ben too.

          Comment


          • #6
            Ben & Nick, there's a difference of n=2 in the number counted between your code (16 vs 18). I'm going to do a manual count to validate. Thanks to you both.

            Comment


            • #7
              I'm a bit confused about the data. Is varA numeric or string? At first it sounded like string, but the last message makes it sound numeric.

              I thought varA was the id variable. Why are you interested then in varA equaling 1 or 99? What about all the other possible id values? Is 99 an MD code?

              Listing some of the records might help.

              With regards to the original question, it seems like you could also do

              Code:
              egen maxvisit = max(varB), by(varA)
              Subjects where maxvisit = 1 are those who only made the screening visit.

              Your code does just what I want, if I change the last line to "tab varB if always1"
              How is that different from what Nick typed?
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Sorry Richard, I meant to say varB could be 1 or 99!

                Data look like this:
                varA

                Comment


                • #9
                  Sorry Richard, I meant to say varB could be 1 or 99!

                  Data look like this:
                  varA varB
                  1 1
                  2 1
                  3 1
                  3 2
                  3 3
                  4 3
                  5 1
                  5 3
                  5 3

                  Comment


                  • #10
                    Richard, your concise code returns the same answer as Nick's. Thank you!

                    Comment


                    • #11
                      For the 1 & 99 bit, you can probably do something like

                      Code:
                      egen maxvisit = max(varB), by(varA)
                      gen always1 = maxvisit == 1 | maxvisit == 99
                      Some creative person can probably cut that down to one line!
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      Stata Version: 17.0 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://www3.nd.edu/~rwilliam

                      Comment


                      • #12
                        The names varA and varB mean nothing much and have already caused error and confusion. Let's recast to identifier id and event type event. Revising (correcting!) my solution in #3

                        Code:
                        bysort id (event) : gen always1 = event[1] == 1 & event[_N] == 1
                        levelsof id if always1
                        tab id if always1
                        If the interesting values of event can be 1 or 99, and other intermediate values are observed, then the sorting trick used above is no use without modification. Well, I have others. The first work-around is just to define

                        Code:
                          
                        gen OK = event == 1 | event == 99
                        and to work on that variable instead. Alternatively, consider

                        Code:
                        egen alwaysl1or99 = min(event == 1 | event == 99), by(id)
                        Clearly the expression

                        Code:
                        event == 1 | event == 99
                        is true (evaluates as 1) whenever event is 1 or 99 and false (0) otherwise. Further -- and this is useful -- the minimum over an identifier being 1 means that said expression always evaluates to 1 (because if it were ever 0, the minimum would be 0). So min() identifies instances of "all" (or always). The converse is that max() on a true-or-false expression identifies "any" (or some).

                        This was spelled out in excruciating detail in http://www.stata.com/support/faqs/da...ble-recording/

                        Code:
                        event == 1 | event == 99
                        is not too painful to type. An alternative

                        Code:
                        inlist(event, 1, 99)
                        is easier to extend to more complicated definitions.
                        Last edited by Nick Cox; 29 Aug 2014, 03:01.

                        Comment


                        • #13
                          Many thanks Nick, for the clear explication, the suggestion about using meaningful variable names, and also the alternate code extendable to more complicated definitions. Very much appreciated.

                          Comment

                          Working...
                          X