Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subtracting times across observations and applying weights

    Hello Everyone,

    I am new to STATA and would really appreciate your help with a problem I am stuck on. I have a dataset A which looks like this:
    id date_time f_id
    1 14jan2020 06:34:43 other
    1 14jan2020 18:11:13 B
    1 14jan2020 20:32:56 C
    1 17jan2020 13:45:29 exit

    I have a dataset B which looks like this:
    date hour f_id weight
    08 Jan 20 0 A 0.94
    08 Jan 20 1 A 0.92
    08 Jan 20 2 A 0.91
    .
    .
    .
    .
    08 Jan 20 23 A 0.94
    08 Jan 20 0 B 0.87
    08 Jan 20 1 B 0.89
    .
    .
    08 Jan 20 23 B 0.78
    (and so I have weights for each hour of each f_id for each day in that year)

    What I want to accomplish is that for every id in dataset A, I want to calculate a variable weighted_sum which for id 1 would look like 49(mins spent in that hour)*(weight for the 18th hour of f_id B on 14th jan 2020) + 60*(weight for the 19th hour of f_id B on 14th jan 2020) + 32*(weight for the 20th hour of f_id B on 14th jan 2020) + 28*(weight for the 20th hour of f_id C on 14th jan 2020) + ....+ 45*(weight for the 13th hour of f_id C on 17th jan 2020).

    I am confused about how I should loop to subtract times, get hours, multiply appropriate hourly weights, and adding them. Any help would be greatly appreciated!! Thanks!!

    Best,
    Joe

  • #2
    Well, for starters, you won't get far with those dates and times in those formats. You will need to create real Stata internal format date and time variables in both data sets in order to work with them. That part is straightforward, if tedious:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id str18 date_time str5 f_id
    1 "14jan2020 06:34:43" "other"
    1 "14jan2020 18:11:13" "B"    
    1 "14jan2020 20:32:56" "C"    
    1 "17jan2020 13:45:29" "exit" 
    end
    tempfile dataset1
    save `dataset1'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 date byte hour str1 f_id float weight
    "08 Jan 20"  0 "A" .94
    "08 Jan 20"  1 "A" .92
    "08 Jan 20"  2 "A" .91
    "08 Jan 20" 23 "A" .94
    "08 Jan 20"  0 "B" .87
    "08 Jan 20"  1 "B" .89
    "08 Jan 20" 23 "B" .78
    end
    tempfile dataset2
    save `dataset2'
    
    //    PREPARE BOTH DATA FILES BY CREATING REAL STATA INTERNAL FORMAT
    //    DATE AND TIME VARIABLES
    use `dataset1', clear
    gen double sif_date_time = clock(date_time, "DMYhms")
    assert missing(date_time) == missing(sif_date_time)
    format sif_date_time %tc
    drop date_time
    rename sif_date_time date_time
    tempfile dataset1_sif
    save `dataset1_sif'
    
    use `dataset2', clear
    egen date_time = concat(date hour), punct(" ")
    replace date_time = date_time + ":00:00"
    gen double sif_date_time = clock(date_time, "DM20Yhms")
    assert missing(sif_date_time) == missing(date, hour)
    format sif_date_time %tc
    drop date hour date_time
    rename sif_date_time date_time
    The next step will be to put the two cleaned-up datasets together. But here is where I am completely baffled by your explanation. Why are you dredging up weights from the 8th of January in the second data set to apply to activities that are dated 14 January and 17 January? How do we know which dates in one dataset to connect with which dates in the other?

    Also where do the numbers 49, 60, 32, 28 and 45 come from. You seem to say they represent minutes of an hour spent in some activity, but I don't find those anywhere in the data set. So what's that all about.

    Once it is clear how to pair up the observations across the two data sets and where those numbers you want weighted averages of come from, putting the data sets together will almost certainly be an application of -rangejoin- (by Robert Picard, available from SSC), and then calculating the weighted averages themselves for each id will be an application of -collapse-.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment

    Working...
    X