Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to check between several datasets which IDs go out or enter in the sample?

    Hi everyone,

    I have a question to ask you please:

    I have several datasets about electricity consumption in Spain, from January 2021 up to November 2023. My aim is:
    1. I'd like to see which IDs leave my sample and which return. For example, I'd like to see whether, in February 2021, any IDs have already left my sample, or any new ones have come in.
    2. I would like to do that for all my monthly files, from January 2021 to November 2023, please. These monthly data (from January 2021 to November 2023) contain household IDs.
    Contracts IDs are located in another dataset. This dataset contains some contracts that began well before January 2021 (May 2011).

    Below is an example for January 2021 (Dataset 1), and my dataset which contains contracts IDs in addition to households IDs (Dataset 2):

    Dataset 1:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(id fecha_consumo)
    1001 20210101
    1001 20210102
    1001 20210103
    1001 20210104
    1001 20210105
    1001 20210106
    1001 20210107
    1001 20210108
    1001 20210109
    1001 20210110
    1001 20210111
    1001 20210112
    1001 20210113
    1001 20210114
    1001 20210115
    end
    • id is the variable for households IDs.
    • fecha_consumo is the variable representing the date in which the electricity consumption was registered.
    Dataset 2:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(id idcontrato) double(date_contract_start date_contract_end)
    1001    1001 18887 21700
    1001  451697 21701 22431
    1001 1236132 22432 22645
    1001 1730454 22646 22676
    1001 2082075 22677 22735
    1001 2172904 22736 23010
    1001 2872183 23011 23069
    1001 3107888 23070     .
    1005    1005 18800 21639
    1005  420392 21640 21651
    1005  432684 21652 22066
    1005  720923 22067 22431
    1005 1124767 22432 22456
    1005 1288758 22457 22645
    1005 1742918 22646 22676
    end
    format %td date_contract_start
    format %td date_contract_end
    • id is the variable for households IDs.
    • date_contract_start is the variable representing the date in which the contract starts for a given household.
    • date_contract_end is the variable representing the date in which the contract ends for a given household.
    Could someone help me with this please? Thank you so much in advance.
    Michael
    Last edited by Michael Duarte Goncalves; 18 Dec 2023, 01:28.
Working...
X