Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pick particular observations from a certain interval (without a loop)

    Hi there,

    I have a dataset including time-series data with a binary variable (X) which is 0 most of the time and 1 else. I need to generate a new variable (X_new) that is based on X and picks the first observation that is 1 in an interval of [-10,+10] around that observation (so, all potential other observations being 1 in that interval shall be neglected). Also, I need another variable (T) that assigns the “distance” [-10,+10] to the event (i.e., where X_new is 1). Those intervals in T are required not to overlap (i.e., there need to be at least 20 observations between the 1s in X_new).

    Example of the data (the last two columns are needed):
    date X X_new T
    01.01.2019 0 0 .
    02.01.2019 0 0 .
    03.01.2019 0 0 .
    04.01.2019 0 0 -10
    05.01.2019 0 0 -9
    06.01.2019 0 0 -8
    07.01.2019 0 0 -7
    08.01.2019 0 0 -6
    09.01.2019 0 0 -5
    10.01.2019 0 0 -4
    11.01.2019 0 0 -3
    12.01.2019 0 0 -2
    13.01.2019 0 0 -1
    14.01.2019 1 1 0
    15.01.2019 0 0 1
    16.01.2019 0 0 2
    17.01.2019 0 0 3
    18.01.2019 1 0 4
    19.01.2019 0 0 5
    20.01.2019 0 0 6
    21.01.2019 1 0 7
    22.01.2019 0 0 8
    23.01.2019 0 0 9
    24.01.2019 0 0 10
    25.01.2019 0 0 .
    26.01.2019 0 0 -10
    27.01.2019 1 0 -9
    28.01.2019 0 0 -8
    29.01.2019 0 0 -7
    30.01.2019 0 0 -6
    31.01.2019 0 0 -5
    01.02.2019 0 0 -4
    02.02.2019 0 0 -3
    03.02.2019 0 0 -2
    04.02.2019 0 0 -1
    05.02.2019 1 1 0
    06.02.2019 0 0 1
    07.02.2019 0 0 2
    08.02.2019 0 0 3
    09.02.2019 0 0 4
    10.02.2019 0 0 5
    11.02.2019 0 0 6
    12.02.2019 0 0 7
    13.02.2019 0 0 8
    14.02.2019 0 0 9
    15.02.2019 0 0 10

    Anybody has an idea how to solve this with in a fast way?
    I managed to do this in a loop, but since I need to run this on several GB of data, this will take weeks. :/

    Thanks a lot!
    Last edited by Rolf Miller; 13 Feb 2019, 04:19.

  • #2
    In future, please use dataex to present data examples. Your date variable needs to be converted to a Stata date prior to attempting any solution. You could reduce the complexity of the problem by working only with the sample of ones. See if the following helps.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 date byte(x x_new t)
    "01.01.2019" 0 0   .
    "02.01.2019" 0 0   .
    "03.01.2019" 0 0   .
    "04.01.2019" 0 0 -10
    "05.01.2019" 0 0  -9
    "06.01.2019" 0 0  -8
    "07.01.2019" 0 0  -7
    "08.01.2019" 0 0  -6
    "09.01.2019" 0 0  -5
    "10.01.2019" 0 0  -4
    "11.01.2019" 0 0  -3
    "12.01.2019" 0 0  -2
    "13.01.2019" 0 0  -1
    "14.01.2019" 1 1   0
    "15.01.2019" 0 0   1
    "16.01.2019" 0 0   2
    "17.01.2019" 0 0   3
    "18.01.2019" 1 0   4
    "19.01.2019" 0 0   5
    "20.01.2019" 0 0   6
    "21.01.2019" 1 0   7
    "22.01.2019" 0 0   8
    "23.01.2019" 0 0   9
    "24.01.2019" 0 0  10
    "25.01.2019" 0 0   .
    "26.01.2019" 0 0 -10
    "27.01.2019" 1 0  -9
    "28.01.2019" 0 0  -8
    "29.01.2019" 0 0  -7
    "30.01.2019" 0 0  -6
    "31.01.2019" 0 0  -5
    "01.02.2019" 0 0  -4
    "02.02.2019" 0 0  -3
    "03.02.2019" 0 0  -2
    "04.02.2019" 0 0  -1
    "05.02.2019" 1 1   0
    "06.02.2019" 0 0   1
    "07.02.2019" 0 0   2
    "08.02.2019" 0 0   3
    "09.02.2019" 0 0   4
    "10.02.2019" 0 0   5
    "11.02.2019" 0 0   6
    "12.02.2019" 0 0   7
    "13.02.2019" 0 0   8
    "14.02.2019" 0 0   9
    "15.02.2019" 0 0  10
    end
    
    
    gen date2 = date(date, "DMY")
    format date2 %td
    sort date2
    gen xn= x==1 &_n>10
    gen max=date2+20
    format max %td
    preserve
    keep if xn==1
    qui sum xn
    forval i=1/`r(N)'{
    drop if date2< max[`i'] &_n>`i'
    }
    tempfile tomerge
    save `tomerge'
    restore
    merge 1:1 * using `tomerge'
    gen xnew2=_merge==3
    drop _merge

    Comment

    Working...
    X