Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Systematic choice of records of interest in big dates

    Hi, guys

    I have a database where the same id can have several food consumption records (ranging from 1 to 16 records per id in the complete database). In the example below, each id has 4 records. I need to choose only 1 record and I chose to choose the last record (seq_ca6m==max_ca6m). However, note that for id 11302767, the last record is missing, as this record does not fit into any category of the scenario variable. In this case, I would like to choose the record immediately closest to the last position that is not missing. How can I do this, systematically?
    Additional information: I have the following decision tree for choosing the food intake record:
    1) Selection of id with 1 food consumption record; 2) In case of id with multiple records: choose the last record. If this record is missing for the scenario variable, immediately choose the record closest to the last one that is not missing
    I thank the help of all you
    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id byte sexo_ca int ano_acomp_ca float(datanasc_ca dataacomp_ca idade_meses_ca seq_ca6m max_ca6m cenario)
      210714 0 2017 20874 20950   2.49692 1 4 1
      210714 0 2017 20874 20996 4.0082135 2 4 1
      210714 0 2017 20874 21020  4.796715 3 4 2
      210714 0 2017 20874 21049  5.749487 4 4 1
    11302767 0 2018 21144 21243  3.252567 1 4 0
    11302767 0 2018 21144 21271  4.172485 2 4 0
    11302767 0 2018 21144 21297  5.026694 3 4 0
    11302767 0 2018 21144 21325  5.946612 4 4 .
    29640336 1 2016 20661 20730 2.2669406 1 4 2
    29640336 1 2016 20661 20768 3.5154004 2 4 0
    29640336 1 2016 20661 20794   4.36961 3 4 1
    29640336 1 2017 20661 20824  5.355236 4 4 2
    39151761 0 2017 21167 21174 .22997946 1 4 0
    39151761 0 2018 21167 21215  1.577002 2 4 2
    39151761 0 2018 21167 21242 2.4640656 3 4 2
    39151761 0 2018 21167 21271  3.416838 4 4 2
    end
    format %td datanasc_ca
    format %td dataacomp_ca
    label values sexo_ca sexo
    label def sexo 0 "feminino", modify
    label def sexo 1 "masculino", modify
    label values cenario cenario
    label def cenario 0 "LM exclusivo", modify
    label def cenario 1 "introducao de outros liquidos", modify
    label def cenario 2 "IA precoce", modify
    ------------------ copy up to and including the previous line ------------------

  • #2
    Code:
    gen byte has_cenario = !missing(cenario)
    bysort id (has_cenario seq_ca6m): keep if _n == _N

    Comment


    • #3
      Perfect! Thank you so much, Clyde

      Comment


      • #4
        Hello, Statalist community Your help helps me a lot to walk and evolve in Stata. I have a situation here where I can't completely solve the case.
        In my database, I have children (id) with 2 or more anthropometric measurements. For my model (I intend to use linear regression of mixed effects), these anthropometric measurements must have a minimum interval of 1 month between measurements. For this I created the following variables:
        *inter_caen: data interval for recording food consumption (exposure) and nutritional status (waste)
        *within the same id: subtrai or value of each line for the first line
        bysort id: gen inter_EN = inter_caen - inter_caen[1]
        *within the same id: subtrai or value of each line from the previous line
        bysort id: gen inter_EN1 = inter_caen - inter_caen[_n-1]

        Starting from this, I created a variable to choose the records that fit the mine condition: guarantee that the anthropometry records selected have a minimum interval of one month (30 days) between them. I used this code:
        bysort id : gen inter_valid1 = cond(inter_EN > 30 | mod(inter_EN, 30) == 0, inter_EN, .) & inter_EN1>=30

        This code is true for almost everything, but there are cases like or with id 77631138 that is not true. Note that the second record is 28 days relative to the first and the third is 34 days relative to the first record. Both mark 0 for the variable generated "inter_valid", more than the third record should be valid because it meets the condition of a minimum of 30 days in relation to the first record. How could I solve this question?

        I thank the help of all you

        ----------------------- copy starting from the next line -----------------------
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long id float(idade_meses_en cenario peso alt inter_caen seq_EN1 max_EN1 inter_EN inter_EN1)
        77631138 1.6427104 0  4.57 56 -34 1 4   0   .
        77631138  2.562628 0  5.56 58  -6 2 4  28  28
        77631138  2.759754 0  5.56 58   0 3 4  34   6
        77631138 11.301848 0 87.98 71 260 4 4 294 260
        77632633  .9856262 1  3.75 50 -31 1 4   0   .
        77632633 2.0041068 1  4.76 53   0 2 4  31  31
        77632633 3.0554416 1  5.49 57  32 3 4  63  32
        77632633  5.223819 1  6.56 61  98 4 4 129  66
        77647867  1.905544 1   6.3 60   0 1 4   0   .
        77647867 2.9240246 1   7.6 63  31 2 4  31  31
        77647867  3.449692 1     8 64  47 3 4  47  16
        77647867 10.349076 1  11.6 73 257 4 4 257 210
        77648623 1.8398356 0  5.38 57   0 1 4   0   .
        77648623  2.858316 0  5.95 58  31 2 4  31  31
        77648623  5.026694 0   6.8 64  97 3 4  97  66
        77648623  5.946612 0   6.7 65 125 4 4 125  28
        end
        label values cenario cenario
        label def cenario 0 "LM exclusivo", modify
        label def cenario 1 "Substitutos do leite materno", modify
        ------------------ copy up to and including the previous line ------------------

        Comment


        • #5
          Code:
          drop inter_EN1    // NOT NEEDED
          by id (inter_EN), sort: gen ref = inter_EN if _n == 1
          by id (inter_EN): replace ref = cond(inter_EN - ref[_n-1] > 30, ///
              inter_EN, ref[_n-1]) if _n > 1
          gen byte select = (inter_EN == ref)

          Comment


          • #6

            thank you so much for your help always, Clyde

            Comment

            Working...
            X