Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculate mean of percentage of multiple values from three variables

    Hello members,
    I have a variable "pathogen" indicating about what pathogen it is all about and the the variable "patient_id" indicating the index patients with this pathogen and a third variable "contact_positive" indicating if a contact person was either tested negative (0) or positive (1) (who was in contact with the index patient).
    pathogen patient_id contact_positive
    1 55 0
    1 55 0
    1 55 0
    1 55 1
    1 57 1
    1 57 0
    1 58 1
    1 60 0
    1 60 0
    1 60 0
    2 60 0
    2 60 1
    2 62 0
    2 62 1
    I would now like to know, sorted by pathogen, what is the percentage of contact_positive=1 per patient and then calculate the mean out of these percentages.

    Can you please help me with that? Thank you!

  • #2
    Code:
    egen contact_positive_mean = mean(contact_postitive), by(patient_id)
    summ contact_positive_mean

    Comment


    • #3
      The mean over all observations need not equal the mean over all patient means. If you want the latter, see the last lines here.


      Code:
      egen contact_positive_mean = mean(contact_positive), by(patient_id)
      summ contact_positive_mean  
      
      egen tag = tag(patient_id)
      su contact_positive_mean if tag
      Naturally in either case what you get are means as proportions, not percentages.
      Last edited by Nick Cox; 22 Dec 2023, 08:40.

      Comment


      • #4
        Thank you very much! Really appreciated. I have one last other question: If I have different patients (each patient has one patient id) and they have multiple diagnosted pathogens and I only want to keep the pathogen once per patient how can I do that?

        Now:
        patient id pathogen
        54 1
        54 2
        54 2
        54 3
        54 4
        60 1
        60 1
        62 2
        64 4
        64 4
        64 4
        64 5
        64 6
        66 2
        66 2
        What I want:
        patient id pathogen
        54 1
        54 2
        54 3
        54 4
        60 1
        62 2
        64 4
        64 5
        64 6
        66 2

        Comment


        • #5
          Code:
          h duplicates

          Comment


          • #6
            But I have other variables in that dataset with dublicates which I don't want to delete. Is there a code that is more specific?

            Comment


            • #7
              The tagging technique in #3 was already an answer.

              Comment


              • #8
                that mean's I use

                egen tag = tag(patient_id) and then h duplicates if tag?

                Comment


                • #9
                  As I understand it you want to see

                  Code:
                  clear
                  input patient_id    pathogen
                  54    1
                  54    2
                  54    2
                  54    3
                  54    4
                  60    1
                  60    1
                  62    2
                  64    4
                  64    4
                  64    4
                  64    5
                  64    6
                  66    2
                  66    2
                  end
                  
                  egen tag = tag(patient_id pathogen)
                  
                  list patient_id pathogen if tag, sepby(patient_id)
                  
                       +---------------------+
                       | patien~d   pathogen |
                       |---------------------|
                    1. |       54          1 |
                    2. |       54          2 |
                    4. |       54          3 |
                    5. |       54          4 |
                       |---------------------|
                    6. |       60          1 |
                       |---------------------|
                    8. |       62          2 |
                       |---------------------|
                    9. |       64          4 |
                   12. |       64          5 |
                   13. |       64          6 |
                       |---------------------|
                   14. |       66          2 |
                       +---------------------+
                  See also the groups command:

                  https://www.statalist.org/forums/for...updated-on-ssc

                  Code:
                  SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
                          (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                          Q1/18   SJ 18(1):291
                          groups exited with an error message if weights were specified;
                          this has been corrected
                  
                  SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
                          (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                          Q3/17   SJ 17(3):760--773
                          presents command for listing group frequencies and percents and
                          cumulations thereof; for various subsetting and ordering by
                          frequencies, percents, and so on; for reordering of columns;
                          and for saving tabulated data to new datasets

                  Code:
                  . groups patient_id pathogen
                  
                    +---------------------------------------+
                    | patien~d   pathogen   Freq.   Percent |
                    |---------------------------------------|
                    |       54          1       1      6.67 |
                    |       54          2       2     13.33 |
                    |       54          3       1      6.67 |
                    |       54          4       1      6.67 |
                    |       60          1       2     13.33 |
                    |---------------------------------------|
                    |       62          2       1      6.67 |
                    |       64          4       3     20.00 |
                    |       64          5       1      6.67 |
                    |       64          6       1      6.67 |
                    |       66          2       2     13.33 |
                    +---------------------------------------+
                  Last edited by Nick Cox; 22 Dec 2023, 16:20.

                  Comment


                  • #10
                    Ok, thank you! But I don't just want a list but a new dataset with which I can do other analyses afterwards.... How can I do that?

                    Comment


                    • #11
                      The help for groups explains how to get a new dataset of its results.

                      Comment

                      Working...
                      X