Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new variable based on observations from two different dataset

    I have 2 datasets with information on type and date of infection of each patients. They are from two different databases and I am trying to identify the earliest infection.

    I have merged and identified the date of earliest infection:
    Code:
    egen earliest_infdate=rowmin(inf_date1 inf_date2)
    Now that I have the earliest infection date, I cant figure out which command should I input to get the infection type based on the date I've identified

    This is the example data:

    HTML Code:
          patid            inf_date1        inf_HES       inf_date2       inf_cprd    _merge           earliest_infdate
          2.231e+08        2007/12/19        Brain        2007/12/21       Chest     Matched (3)        2007/12/19
          5.215e+08        2014/12/24        Skin         2015/01/13       Chest     Matched (3)         2014/12/24
          9.407e+08        2011/08/04        Skin                                    Master only (1)      2011/08/04
          1.002e+09        2014/03/14        Skin         2014/03/01       Chest     Matched (3)         2014/03/01
    For example, looking at row 1:

    Patient was recorded as having a brain infection on the 1st database but chest on the second database. The dates showed that the brain infection occured first and I would like to generate a new variable which shows all of the earlier infections. Would appreciate any help, thanks




  • #2
    Given what you want to do, it would have been better to -append- the data sets rather than -merge- them. Given that you didn't do that, we can instead get you what -append- would have created by going to long layout. From there, it's very simple.

    Code:
    rename inf_date1 inf_date_HES
    rename inf_date2 inf_date_cprd
    drop _merge earliest_infdate
    reshape long inf_ inf_date_, i(patid) j(source) string
    
    rename *_ *
    by patid (inf_date), sort: gen earliest_infection = inf[1]
    by patid (inf_date): gen earliest_date = date[1]
    Note: The above code is untested. I am confident the gist of it is correct, but it may contain typos or other errors. I did not test the code because the tableau shown in #1 requires extensive surgery to import into Stata and I lack the time and patience for that. In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
    Last edited by Clyde Schechter; 10 Nov 2022, 09:58.

    Comment


    • #3
      Thank you very much. Can I ask what are the commands to input if I were to -append- the data?

      Comment


      • #4
        Well, without seeing example data from both data sets, I can't be exact, but it would be something like this:

        Code:
        use dataset_2, clear
        rename inf_cprd inf
        rename inf_date2 inf_date
        gen source == "cprd"
        tempfile holding
        save `holding'
        
        use dataset_1, clear
        rename inf_HES inf
        rename inf_date1 inf_date
        gen source == "HES"
        
        append using `holding'

        Comment

        Working...
        X