Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • comparing data when one observation is recorded more than once

    Sorry about the barrage of questions but I am struggling with this. I have a file which look like this
    IDnumber date_of_exam SCI MCI dementia
    1 16dec2003 1 0 0
    1 20nov2010 0 1 0
    1 20dec2012 0 0 1
    2 10jan2004 0 1 0
    2 15feb2006 - - -
    2 19apr2010 0 0 1
    3 02mar2013 0 1 0
    4 04apr2014 0 0 1
    5 9sep2013 0 1 0
    5 20aug2016 - - -
    6 20jan2004 0 1 0
    6 25feb2006 0 0 1
    6 02apr2010 - - -
    6 20dec2012 - - -
    date_of_exam is the date of examination.

    SCI (subjective cognitive impairment) MCI (mild cognitive impairment) and dementia are diagnoses. I want to compare the diagnosis at the first date_of_exam with the diagnosis at the very last date_of_exam for each IDnumber.

    So in the case of IDnumber 1, I would like stata to tell me that on 16dec2003 diagnosis was SCI and on 20dec2012 the diagnosis was dementia.

    Similarly in the case of IDnumber 2 I would like stata to tell me that the diagnosis was MCI on the first date_of_exam 10jan2004 and the diagnosis on the last date_of_exam 19apr2010 diagnosis was dementia.

    Note please that some IDnumber s have only one date of exam like in IDnumber 3 and 4 and it is necessay to know that only one date of exam was recorded for these IDs.

    In the case of IDnumber 5 , the diagnosis recorded on the first date_of_exam was MCI but there is no diagnosis recorded on the last date_of_exam (missing data). In this dataset a missing value in the last date_of_exam means that the diagnosis stayed the same as the preceding date_of exam. Therefore here the diagnosis on the last date_of_exam 20aug2016 was MCI as recorded on 19sep2013 and I need to know that.


    Also in IDnumber 6 first date_of_exam recorded MCI on 20jan2004. Last date_of_exam diagnosis recorded as missing on 20dec2012. In this dataset this means that on last date_of_exam 20dec2012 the diagnosis was dementia as recorded last on 25feb2006. I would like stata to tell me on first date_of_exam the diagnosis 20jan2004 was MCI and on the last day of exam 20dec2012 the diagnosis was dementia.


    Hope this is clear. Would really appreciate your help with the best way forward with this dataset.

    Many thanks for your help 😊

    Last edited by Yehani Wedatilake; 11 Oct 2021, 04:32.

  • #2
    The exact code depends on details of your data set that cannot be gleaned from the tableau you have shown. Here I imagine you have a Stata data set in which date_of_exam is a real Stata date variable, and that SCI, MCI, and dementia are actually numeric variables, where missing values are given by Stata's system missing or extended missing value, not a string values with hyphen characters denoted missing. If that's not the case, you will need to adapt your data set accordingly.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte idnumber float exam_date byte(sci mci dementia)
    1 16055 1 0 0
    1 18586 0 1 0
    1 19347 0 0 1
    2 16080 0 1 0
    2 16847 . . .
    2 18371 0 0 1
    3 19419 0 1 0
    4 19817 0 0 1
    5 19610 0 1 0
    5 20686 . . .
    6 16090 0 1 0
    6 16857 0 0 1
    6 18354 . . .
    6 19347 . . .
    end
    format %td exam_date
    
    xtset idnumber exam_date
    
    //  REPLACE MISSING VALUES WITH PRECEDING VALUES
    foreach v of varlist sci-dementia {
        replace `v' = L1.`v' if missing(`v')
    }
    
    //  CREATE VARIABLES FOR FIRST AND LAST VALUES
    foreach v of varlist sci-dementia {
        by idnumber (exam_date): gen first_`v' = `v'[1]
        by idnumber (exam_date): gen last_`v' = `v'[_N]
    }
    In the future, you should not leave it to others to imagine your data set's important features: you should use the -dataex- command to actually provide them with a faithful working example, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    It is also unclear what you mean when you say you "want Stata to tell you" those values. Here I have interpreted that as meaning you want to create new variables containing the first and final values for those three diagnosis variables. But perhaps you meant to reduce the data set to one observation per person--which is also doable. Or perhaps you wanted to list those results out in the Results window. Etc. The information is all there--but you need to be clearer about how you want it arranged.
    Last edited by Clyde Schechter; 11 Oct 2021, 10:13.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      The exact code depends on details of your data set that cannot be gleaned from the tableau you have shown. Here I imagine you have a Stata data set in which date_of_exam is a real Stata date variable, and that SCI, MCI, and dementia are actually numeric variables, where missing values are given by Stata's system missing or extended missing value, not a string values with hyphen characters denoted missing. If that's not the case, you will need to adapt your data set accordingly.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte idnumber float exam_date byte(sci mci dementia)
      1 16055 1 0 0
      1 18586 0 1 0
      1 19347 0 0 1
      2 16080 0 1 0
      2 16847 . . .
      2 18371 0 0 1
      3 19419 0 1 0
      4 19817 0 0 1
      5 19610 0 1 0
      5 20686 . . .
      6 16090 0 1 0
      6 16857 0 0 1
      6 18354 . . .
      6 19347 . . .
      end
      format %td exam_date
      
      xtset idnumber exam_date
      
      // REPLACE MISSING VALUES WITH PRECEDING VALUES
      foreach v of varlist sci-dementia {
      replace `v' = L1.`v' if missing(`v')
      }
      
      // CREATE VARIABLES FOR FIRST AND LAST VALUES
      foreach v of varlist sci-dementia {
      by idnumber (exam_date): gen first_`v' = `v'[1]
      by idnumber (exam_date): gen last_`v' = `v'[_N]
      }
      In the future, you should not leave it to others to imagine your data set's important features: you should use the -dataex- command to actually provide them with a faithful working example, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      It is also unclear what you mean when you say you "want Stata to tell you" those values. Here I have interpreted that as meaning you want to create new variables containing the first and final values for those three diagnosis variables. But perhaps you meant to reduce the data set to one observation per person--which is also doable. Or perhaps you wanted to list those results out in the Results window. Etc. The information is all there--but you need to be clearer about how you want it arranged.
      Thank you very much indeed. This is great! Sorry about the delayed response my stata kept crashing when I launched your example and it turned out I had to update it.

      I take your points on -dataex. and being specific about what I would require. Yes you are correct- ideally I would like to reduce the dataset to one observation per person with the diagnosis on the first and last date of exam in the same row.


      I have looked at your example and it seems that it works very well for IDnumber 1 and 2. However when it comes to 3 this person has only had one exam_date and was recorded as mci. So the first_mci vairable should be recorded as 1 and last_mci variable should be recorded as missing as there has not been any further follow up. It is important to know if there has only been one exam_date and no follow up. If last_mci is recorded it would mean that this person had follow up and was recorded as mci on a subsequent date_of _exam.

      The same applies to idnumber 4 as it should only be recorded as first_dementia and other last first variables recorded as missing.

      Regarding idnumber 5, there have been two exam_date s. When a diagnosis is missing for subsequent exam_dates this implies that the the diagnosis stayed the same as the one immediately preceding. So I would expect it to be recorded as 1 for first_mci and 1 for last_mci on second date of exam not 1 for first_mci on both occasions.


      Same with idnumber 6 where it should be recorded as 1 for first mci and 1 for last_dementia. (missing on exam_date 20dec2012 means that the diagnosis stayed the same as the last recorded on 25feb2006 which is dementia)


      I would appreciate your further advice on this and how to condense the dataset into one observation per person. Many thanks again.

      Comment


      • #4
        The revised statement of the problem actually makes it easier:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte idnumber float exam_date byte(sci mci dementia)
        1 16055 1 0 0
        1 18586 0 1 0
        1 19347 0 0 1
        2 16080 0 1 0
        2 16847 . . .
        2 18371 0 0 1
        3 19419 0 1 0
        4 19817 0 0 1
        5 19610 0 1 0
        5 20686 . . .
        6 16090 0 1 0
        6 16857 0 0 1
        6 18354 . . .
        6 19347 . . .
        end
        format %td exam_date
        
        isid idnumber exam_date, sort
        
        collapse (firstnm) first_sci = sci first_mci = mci first_dementia = dementia ///
            (lastnm) last_sci = sci last_mci = mci last_dementia = dementia ///
            (count) n_exams = exam_date, by(idnumber)
        replace last_sci = . if n_exams == 1
        replace last_mci = . if n_exams == 1
        replace last_dementia = . if n_exams == 1

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          The revised statement of the problem actually makes it easier:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte idnumber float exam_date byte(sci mci dementia)
          1 16055 1 0 0
          1 18586 0 1 0
          1 19347 0 0 1
          2 16080 0 1 0
          2 16847 . . .
          2 18371 0 0 1
          3 19419 0 1 0
          4 19817 0 0 1
          5 19610 0 1 0
          5 20686 . . .
          6 16090 0 1 0
          6 16857 0 0 1
          6 18354 . . .
          6 19347 . . .
          end
          format %td exam_date
          
          isid idnumber exam_date, sort
          
          collapse (firstnm) first_sci = sci first_mci = mci first_dementia = dementia ///
          (lastnm) last_sci = sci last_mci = mci last_dementia = dementia ///
          (count) n_exams = exam_date, by(idnumber)
          replace last_sci = . if n_exams == 1
          replace last_mci = . if n_exams == 1
          replace last_dementia = . if n_exams == 1
          Thanks again...hmmm is there something missing here - I get some error messages.

          Comment


          • #6
            The code shown runs with your example data and gives no error messages.

            Saying "I get some error messages" with no further information gives me nothing to go on to try to troubleshoot this. There must be something in your real data that differs materially from your example data and is causing problems. Please post back with a new example that reproduces the problem(s) you are encountering. Also show the exact code you are using with that example, and show the exact output you are getting from Stata, including any error messages.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              The code shown runs with your example data and gives no error messages.

              Saying "I get some error messages" with no further information gives me nothing to go on to try to troubleshoot this. There must be something in your real data that differs materially from your example data and is causing problems. Please post back with a new example that reproduces the problem(s) you are encountering. Also show the exact code you are using with that example, and show the exact output you are getting from Stata, including any error messages.

              ___ ____ ____ ____ ____ (R)
              /__ / ____/ / ____/
              ___/ / /___/ / /___/ 16.1 Copyright 1985-2019 StataCorp LLC
              Statistics/Data analysis StataCorp
              4905 Lakeway Drive
              College Station, Texas 77845 USA
              800-STATA-PC https://www.stata.com
              979-696-4600 [email protected]
              979-696-4601 (fax)

              Stata license: 5-user network perpetual
              Serial number: 301606208681
              Licensed to: Aldring
              Aldring og Helse

              Notes:
              1. Unicode is supported; see help unicode_advice.

              . * Example generated by -dataex-. For more info, type help dataex

              .
              . clear

              .
              . input byte idnumber float exam_date byte(sci mci dementia)

              idnumber exam_date sci mci dementia
              1.
              . 1 16055 1 0 0
              2.
              . 1 18586 0 1 0
              3.
              . 1 19347 0 0 1
              4.
              . 2 16080 0 1 0
              5.
              . 2 16847 . . .
              6.
              . 2 18371 0 0 1
              7.
              . 3 19419 0 1 0
              8.
              . 4 19817 0 0 1
              9.
              . 5 19610 0 1 0
              10.
              . 5 20686 . . .
              11.
              . 6 16090 0 1 0
              12.
              . 6 16857 0 0 1
              13.
              . 6 18354 . . .
              14.
              . 6 19347 . . .
              15.
              . end

              .
              . format %td exam_date

              .
              .
              .
              . isid idnumber exam_date, sort
              (data now sorted by idnumber exam_date)

              .
              .
              .
              . collapse (firstnm) first_sci = sci first_mci = mci first_dementia = dementia ///
              / invalid name
              r(198);

              .
              . (lastnm) last_sci = sci last_mci = mci last_dementia = dementia ///
              ( is not a valid command name
              r(199);

              .
              . (count) n_exams = exam_date, by(idnumber)
              ( is not a valid command name
              r(199);

              .
              . replace last_sci = . if n_exams == 1
              variable last_sci not found
              r(111);

              .
              . replace last_mci = . if n_exams == 1
              variable last_mci not found
              r(111);

              .
              . replace last_dementia = . if n_exams == 1
              variable last_dementia not found
              r(111);



              .
              Last edited by Yehani Wedatilake; 15 Oct 2021, 02:33.

              Comment


              • #8
                you are apparently using the code interactively but it is meant to be run from a do file and can only be run successfully from a do file; if you want to run it interactively, you will need to do some work including eliminating all the slash marks (which, as used, are only allowed in do file) and ensuring that everything that is mean to be a single interactive command actually is entered that way - using as a do file will be easier

                Comment

                Working...
                X