Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Means of variable in period before a categorical variable change

    Dear Statlisters,
    I have a dataset which includes the daily Covid-19 cases in Italy from November 2020 to November 2021. In this period the government assigned a color (red, orange, yellow, white) to each region based on the number of cases. In other words, the dataset looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float date_calendar long(region color) double nuovi_positivi
    22225 1 2 395
    22226 1 2 432
    22227 1 2 584
    22228 1 2 625
    22229 1 2 746
    22230 1 3 662
    22231 1 3 541
    22232 1 3 683
    22233 1 3 939
    22234 1 3 470
    22235 1 3 642
    22237 1 3 641
    22238 1 3 649
    22239 1 3 705
    22240 1 3 541
    22241 1 4 560
    22242 1 4 640
    22243 1 4 536
    22244 1 4 623
    22245 1 4 570
    22246 1 4 510
    22247 1 4 532
    22248 1 4 413
    22249 1 4 556
    22250 1 4 396
    22251 1 4 381
    22252 1 4 395
    22253 1 4 408
    22254 1 4 376
    22255 1 4 294
    22256 1 4 124
    22257 1 4 312
    22258 1 4 264
    22259 1 4 227
    22260 1 4 262
    22261 1 4 277
    22262 1 3 344
    22264 1 3 100
    22265 1 3 257
    22266 1 3 246
    22267 1 3 227
    22268 1 3 156
    22269 1 3 216
    22270 1 2  64
    22271 1 2  86
    22272 1 2 255
    22273 1 4 223
    22274 1 4 339
    22275 1 4  34
    22276 1 4  25
    22277 1 3  41
    22278 1 3  47
    22279 1 3 278
    22280 1 4 456
    22281 1 4 410
    22282 1 4  23
    22283 1 4 207
    22284 1 3 121
    22285 1 4 213
    22286 1 4 365
    22287 1 2 229
    22288 1 2 160
    22289 1 3 400
    22290 1 3 315
    22291 1 2 120
    22292 1 2 152
    22293 1 2 314
    22294 1 2 256
    22295 1 2 240
    22296 1 2 196
    22297 1 3 285
    22298 1 3 107
    22299 1 3 113
    22300 1 3 279
    22301 1 3 212
    22302 1 3 318
    22303 1 3 221
    22304 1 3 325
    22305 1 3  69
    22306 1 3 152
    22307 1 3 344
    22308 1 3 268
    22309 1 3 343
    22310 1 3 385
    22311 1 3 402
    22312 1 2 161
    22313 1 2 210
    22314 1 2 449
    22315 1 2 526
    22316 1 2 276
    22317 1 2 509
    22318 1 2 436
    22319 1 2 184
    22320 1 2 241
    22321 1 2 314
    22322 1 2 540
    22323 1 2 357
    22324 1 2 508
    22325 1 3 222
    22326 1 3 533
    end
    format %td date_calendar
    label values region reg
    label def reg 1 "Abruzzo", modify
    label values color color
    label def color 2 "Gialla", modify
    label def color 3 "Arancione", modify
    label def color 4 "Rossa", modify
    What I need is the mean of new positives in each period before a red zone occured (for every region). I am trying this for some time without success. Any help would be appreciated.

  • #2
    Code:
    by region (date_calendar), sort: gen period = sum(color != color[_n-1])
    frame put region period color, into(periods)
    frame periods {
        duplicates drop
        by region (period), sort: gen byte before_red = color[_n+1] == "Rossa":color
    }
    frlink m:1 region period, frame(periods)
    frget before_red, from(periods)
    
    by region period (date_calendar), sort: ///
        egen wanted = mean(nuovi_positivi) if before_red

    Comment


    • #3
      Thank you so much, it worked perfectly.

      Comment

      Working...
      X