Systematic choice of records of interest in big dates

Andressa Freire

Join Date: Aug 2022

Posts: 33
#1

Systematic choice of records of interest in big dates

16 Jul 2023, 19:08

Hi, guys

I have a database where the same id can have several food consumption records (ranging from 1 to 16 records per id in the complete database). In the example below, each id has 4 records. I need to choose only 1 record and I chose to choose the last record (seq_ca6m==max_ca6m). However, note that for id 11302767, the last record is missing, as this record does not fit into any category of the scenario variable. In this case, I would like to choose the record immediately closest to the last position that is not missing. How can I do this, systematically?
Additional information: I have the following decision tree for choosing the food intake record:
1) Selection of id with 1 food consumption record; 2) In case of id with multiple records: choose the last record. If this record is missing for the scenario variable, immediately choose the record closest to the last one that is not missing
I thank the help of all you
----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long id byte sexo_ca int ano_acomp_ca float(datanasc_ca dataacomp_ca idade_meses_ca seq_ca6m max_ca6m cenario) 210714 0 2017 20874 20950 2.49692 1 4 1 210714 0 2017 20874 20996 4.0082135 2 4 1 210714 0 2017 20874 21020 4.796715 3 4 2 210714 0 2017 20874 21049 5.749487 4 4 1 11302767 0 2018 21144 21243 3.252567 1 4 0 11302767 0 2018 21144 21271 4.172485 2 4 0 11302767 0 2018 21144 21297 5.026694 3 4 0 11302767 0 2018 21144 21325 5.946612 4 4 . 29640336 1 2016 20661 20730 2.2669406 1 4 2 29640336 1 2016 20661 20768 3.5154004 2 4 0 29640336 1 2016 20661 20794 4.36961 3 4 1 29640336 1 2017 20661 20824 5.355236 4 4 2 39151761 0 2017 21167 21174 .22997946 1 4 0 39151761 0 2018 21167 21215 1.577002 2 4 2 39151761 0 2018 21167 21242 2.4640656 3 4 2 39151761 0 2018 21167 21271 3.416838 4 4 2 end format %td datanasc_ca format %td dataacomp_ca label values sexo_ca sexo label def sexo 0 "feminino", modify label def sexo 1 "masculino", modify label values cenario cenario label def cenario 0 "LM exclusivo", modify label def cenario 1 "introducao de outros liquidos", modify label def cenario 2 "IA precoce", modify

------------------ copy up to and including the previous line ------------------
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30164

16 Jul 2023, 19:20

Code:

gen byte has_cenario = !missing(cenario)
bysort id (has_cenario seq_ca6m): keep if _n == _N

Comment

Andressa Freire

Join Date: Aug 2022

Posts: 33
#3

26 Jul 2023, 16:46

Perfect! Thank you so much, Clyde
Comment
Andressa Freire

Join Date: Aug 2022

Posts: 33
#4

02 Aug 2023, 13:44

Hello, Statalist community Your help helps me a lot to walk and evolve in Stata. I have a situation here where I can't completely solve the case.
In my database, I have children (id) with 2 or more anthropometric measurements. For my model (I intend to use linear regression of mixed effects), these anthropometric measurements must have a minimum interval of 1 month between measurements. For this I created the following variables:
*inter_caen: data interval for recording food consumption (exposure) and nutritional status (waste)
*within the same id: subtrai or value of each line for the first line
bysort id: gen inter_EN = inter_caen - inter_caen[1]
*within the same id: subtrai or value of each line from the previous line
bysort id: gen inter_EN1 = inter_caen - inter_caen[_n-1]

Starting from this, I created a variable to choose the records that fit the mine condition: guarantee that the anthropometry records selected have a minimum interval of one month (30 days) between them. I used this code:
bysort id : gen inter_valid1 = cond(inter_EN > 30 | mod(inter_EN, 30) == 0, inter_EN, .) & inter_EN1>=30

This code is true for almost everything, but there are cases like or with id 77631138 that is not true. Note that the second record is 28 days relative to the first and the third is 34 days relative to the first record. Both mark 0 for the variable generated "inter_valid", more than the third record should be valid because it meets the condition of a minimum of 30 days in relation to the first record. How could I solve this question?

I thank the help of all you

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long id float(idade_meses_en cenario peso alt inter_caen seq_EN1 max_EN1 inter_EN inter_EN1) 77631138 1.6427104 0 4.57 56 -34 1 4 0 . 77631138 2.562628 0 5.56 58 -6 2 4 28 28 77631138 2.759754 0 5.56 58 0 3 4 34 6 77631138 11.301848 0 87.98 71 260 4 4 294 260 77632633 .9856262 1 3.75 50 -31 1 4 0 . 77632633 2.0041068 1 4.76 53 0 2 4 31 31 77632633 3.0554416 1 5.49 57 32 3 4 63 32 77632633 5.223819 1 6.56 61 98 4 4 129 66 77647867 1.905544 1 6.3 60 0 1 4 0 . 77647867 2.9240246 1 7.6 63 31 2 4 31 31 77647867 3.449692 1 8 64 47 3 4 47 16 77647867 10.349076 1 11.6 73 257 4 4 257 210 77648623 1.8398356 0 5.38 57 0 1 4 0 . 77648623 2.858316 0 5.95 58 31 2 4 31 31 77648623 5.026694 0 6.8 64 97 3 4 97 66 77648623 5.946612 0 6.7 65 125 4 4 125 28 end label values cenario cenario label def cenario 0 "LM exclusivo", modify label def cenario 1 "Substitutos do leite materno", modify

------------------ copy up to and including the previous line ------------------
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30164

02 Aug 2023, 14:11

Code:

drop inter_EN1    // NOT NEEDED
by id (inter_EN), sort: gen ref = inter_EN if _n == 1
by id (inter_EN): replace ref = cond(inter_EN - ref[_n-1] > 30, ///
    inter_EN, ref[_n-1]) if _n > 1
gen byte select = (inter_EN == ref)

Comment

Andressa Freire

Join Date: Aug 2022

Posts: 33
#6

02 Aug 2023, 16:57

thank you so much for your help always, Clyde
Comment

Announcement

Systematic choice of records of interest in big dates

Comment

Comment

Comment

Comment

Comment