Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify inconsistencies (decrease) in the value of a variable that must be increasing? Longitudinal height monitoring

    Hi, guys

    I have a database with weight and height information for children at different follow-up dates over five years. Especially with regard to height (variable "alt_en"), I would like to identify when this measure decreased over the follow-up records (which means inconsistency in the data, since a child does not decrease in height over time).

    In the example below, the same individual (id) has 34 height (alt_en) records between the years 2016 to 2019. Height is expected to be always increasing (or at least stable, but never decreasing). You can observe these inconsistencies in other "id".
    How could you systematically identify these inconsistencies in the height measurement over the years in the complete database?

    I thank the help of all you.

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id byte sexo_en float(ano_acomp_en datanasc_en dataacomp_en idade_meses_en alt_en seq_juncao_en max_juncao_en)
    59268007 1 2016 19308 20471  38.20945 102  1 34
    59268007 1 2016 19308 20487  38.73511 103  2 34
    59268007 1 2016 19308 20528  40.08214 107  3 34
    59268007 1 2016 19308 20569  41.42916 110  4 34
    59268007 1 2016 19308 20587  42.02053 104  5 34
    59268007 1 2016 19308 20612  42.84189 104  6 34
    59268007 1 2016 19308 20662   44.4846 106  7 34
    59268007 1 2016 19308 20682  45.14169 106  8 34
    59268007 1 2016 19308 20706  45.93018 108  9 34
    59268007 1 2017 19308 20828   49.9384 111 10 34
    59268007 1 2017 19308 20852   50.7269 110 11 34
    59268007 1 2017 19308 20885  51.81109 113 12 34
    59268007 1 2017 19308 20936  53.48665 110 13 34
    59268007 1 2017 19308 20957  54.17659 112 14 34
    59268007 1 2017 19308 20984  55.06365 112 15 34
    59268007 1 2017 19308 21018   56.1807 112 16 34
    59268007 1 2017 19308 21039  56.87064 116 17 34
    59268007 1 2017 19308 21104  59.00616 116 18 34
    59268007 1 2018 19308 21223  62.91581 119 19 34
    59268007 1 2018 19308 21257  64.03285 119 20 34
    59268007 1 2018 19308 21285 64.952774 119 21 34
    59268007 1 2018 19308 21299  65.41273 118 22 34
    59268007 1 2018 19308 21320  66.10267 119 23 34
    59268007 1 2018 19308 21349  67.05544 119 24 34
    59268007 1 2018 19308 21382  68.13963 119 25 34
    59268007 1 2018 19308 21474  71.16222 120 26 34
    59268007 1 2019 19308 21594  75.10472 121 27 34
    59268007 1 2019 19308 21628  76.22176 121 28 34
    59268007 1 2019 19308 21692  78.32443 127 29 34
    59268007 1 2019 19308 21725  79.40862 127 30 34
    59268007 1 2019 19308 21760  80.55852 127 31 34
    59268007 1 2019 19308 21781  81.24846 127 32 34
    59268007 1 2019 19308 21819  82.49692 127 33 34
    59268007 1 2019 19308 21850   83.5154 129 34 34
    end
    format %td datanasc_en
    format %td dataacomp_en
    label values sexo_en sexo
    label def sexo 1 "masculino", modify
    ------------------ copy up to and including the previous line ------------------

  • #2
    Andressa, hi.

    There are many ways to address this. Have you tried something like this?

    Code:
        cap drop inconsistency
        sort id idade_meses
    
        
        by id: gen inconsistency = cond(alt_en[_n]>=alt_en[_n-1],0,1)
        by id: replace inconsistency = . if _n==1
    If an inconsistency value of 1 is detected, it indicates that the height measurement at time t is smaller than the previous one, at t-1, which is biologically implausible. While there are more sophisticated methods available to identify and address such inconsistencies, it appears that you have multiple instances of this issue. Therefore, I would recommend checking all inconsistencies manually.



    Hope this helps.

    Comment


    • #3
      I don't know. If I understand this correctly, this is a child followed from about age 3 years to about age 7 years. Their reported height would depend on

      wearing which shoes or no shoes

      condition of hair

      wearing hat

      child standing up straight or not

      exact protocol of measurement

      Expecting that every measurement exceeds all previous needs to be tempered by awareness of measurement error.

      Comment


      • #4
        Thank you so much for your ever so assertive guidance and help.

        Tiago, this syntax worked, thank you very much

        Nick, you're right, thanks for the notes. However, my database contains secondary data from an information system, so it is not possible to perform this check

        Comment


        • #5
          Hello Stata community.

          I would like to ask for help regarding the situation below: The code suggested by Tiago works very well to identify inconsistent heights (heights that decrease).

          However, by excluding those heights that decrease over time, I generate other negative height differences (that is, other heights that decrease in relation to the previous one).
          This is exemplified below, with the id 80279171. Note that the eighth height measurement (90cm) is lower than the previous one (93cm). In this case, the command works to point out that 90cm measurement as inconsistent. However, the ninth and tenth height measurement (91cm) has a positive variation in relation to the eighth measurement (90cm), but negative variation in relation to the seventh measurement (93cm). So, when I delete the measure identified as inconsistent (90cm), I generate another inconsistency (93 cm and 91 cm). this has been happening to other id's too

          How can I flag these measurements (for later deletion), so that for each id (child) I have increasing height measurements over time?

          Another thing I have also seen is that some authors make this negative variation in height more flexible to -2cm. This is because secondary data collected in health services may present measurement problems due to the evaluator's training, instruments used, etc.

          I thank the help of all you.



          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long id int ano_acomp_ca float(datanasc_ca dataacomp_ca idade_meses_ca datanasc_en dataacomp_en ano_acomp_en idade_meses_en seq_EN3 max_EN3 alt alt_incons)
          80279171 2015 20065 20102 1.2156057 20065 20102 2015 1.2156057  1 10 58 .
          80279171 2015 20065 20102 1.2156057 20065 20529 2016 15.244352  2 10 78 0
          80279171 2015 20065 20102 1.2156057 20065 20620 2016 18.234085  3 10 82 0
          80279171 2015 20065 20102 1.2156057 20065 20842 2017  25.52772  4 10 84 0
          80279171 2015 20065 20102 1.2156057 20065 20872 2017 26.513346  5 10 85 0
          80279171 2015 20065 20102 1.2156057 20065 20912 2017 27.827517  6 10 89 0
          80279171 2015 20065 20102 1.2156057 20065 21011 2017  31.08008  7 10 93 0
          80279171 2015 20065 20102 1.2156057 20065 21055 2017  32.52567  8 10 90 1
          80279171 2015 20065 20102 1.2156057 20065 21111 2017   34.3655  9 10 91 0
          80279171 2015 20065 20102 1.2156057 20065 21116 2017 34.529774 10 10 91 0
          end
          format %td datanasc_ca
          format %td dataacomp_ca
          format %td datanasc_en
          format %td dataacomp_en
          ------------------ copy up to and including the previous line ------------------

          Comment


          • #6
            Originally posted by Andressa Freire View Post
            Nick, you're right, thanks for the notes. However, my database contains secondary data from an information system, so it is not possible to perform this check
            You may have misunderstood the thrust of Nick’s point. It’s very unlikely you that you would have data to check other than the recorded height. But, it’s relatively trivial to have measurement error in those measurements That could even be as much as 5 cm (consider the height difference of wearing boots vs barefoot within the same year, when actual height likely hasn’t changed much).

            I think more to the point though, is why you are deleting these records? Why not use the data as recorded? After all the overall trend will always have to be positive, even if those values appear to decrease in shorter intervals.

            Comment

            Working...
            X