How to keep IDs which are within a range in date format?

Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#1

How to keep IDs which are within a range in date format?

08 Jan 2024, 07:16

Hi everyone,

From my dataset, I would like to keep only those IDs for which we have contracts that run between 1 January 2021 and November 2023 without interruption, please.

I want to keep the IDs that have no interruption during the interlude above (i.e. that remain in my sample from start to finish).

I don't want to keep the IDs that have some interruption or that enter in my sample later than 1 January 2021.

Here's a -dataex- as an example:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long(id idcontrato) double(date_contract_start date_contract_end) 1253 1253 19100 19266 1253 1410 19267 22431 1253 1055841 22432 22645 1253 1815013 22646 22676 1253 1904434 22677 22735 1253 2413928 22736 23010 1253 2880202 23011 23100 1253 3189647 23101 . 2195 2195 19465 20458 2195 80657 20459 21258 2195 1365966 22544 22645 2195 1726472 22646 22676 2195 2073891 22677 22735 2195 2277485 22736 22918 2195 2613231 22919 23010 2195 2863016 23011 23100 2195 3175831 23101 23314 2195 3343382 23315 . 2239 2239 19477 21608 2239 351357 21609 21615 end format %td date_contract_start format %td date_contract_end

-id- refers to household IDs,

-idcontrato- refers to contract ID,

-date_contract_start- is the beginning of a given contract,

-date_contract_end- is the end of a given contract.

The missing values in the "date_contract_end" variable mean that the contract in question is still in force and we should keep them.

Thank you in advance for your help.
All the best,

Michael
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30141

08 Jan 2024, 09:30

Code:

by id (date_contract_start), sort: egen byte gap = ///
    max(date_contract_start[_n+1] > date_contract_end + 1 & _n < _N)
    
by id (date_contract_start): drop if gap | (date_contract_start[1] > td(1jan2021)) ///
     | (date_contract_end[_N] < td(1nov2023))

Comment

Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#3

09 Jan 2024, 00:45

Hi Clyde Schechter,

Beautiful! It works perfectly well. Thank you so much for your help!

All the best,
Michael
Comment
Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#4

09 Jan 2024, 00:49

I have just a small question, by curiosity and for my understanding in the code provided in #2:
Could you explain me the last part of the first chunk please, i.e. the part in bold and green below:

Code:

by id (date_contract_start), sort: egen byte gap = /// max(date_contract_start[_n+1] > date_contract_end + 1 & _n < _N)

Thank you in advance again for your help and time.
All the best,

Michael
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30141
#5

09 Jan 2024, 08:57

There is a gap if the next contract start date is more than 1 day after the current contract's end date. As for the _n < _N part, consider what happens when _n == _N. That is the last observation for the given id, and therefore date_contract_start[_n+1] does not exist, which means that its value is missing. In Stata, a missing value is always greater than any real number. So the date_contract_start[_n+1] > date_contract_end + 1 part of the condition is always true when _n == _N. But that does not constitute a gap because it is the end of the data.
1 like
Comment
Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#6

09 Jan 2024, 09:06

Hi Clyde Schechter,

Beautiful! I couldn't ask for anything clearer than that. Thank you for your time!
Lovely day,

Michael
Comment

Announcement

How to keep IDs which are within a range in date format?

Comment

Comment

Comment

Comment

Comment