Turning half-year and quarterly data into monthly data

River Huang

Join Date: Mar 2016
Posts: 1908

Turning half-year and quarterly data into monthly data

22 Feb 2017, 00:48

Suppose that I have many variables (in the following example, only one variable x1 is present) for a lot of companies (id) across many years (year).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id float tm long x1
1101 564        .
1101 565        .
1101 566        .
1101 567        .
1101 568        .
1101 569 11922113
1101 570        .
1101 571        .
1101 572        .
1101 573        .
1101 574        .
1101 575  9817626
1101 576        .
1101 577        .
1101 578 12423117
1101 579        .
1101 580        .
1101 581 11760556
1101 582        .
1101 583        .
1101 584 10615331
1101 585        .
1101 586        .
1101 587 13033535
1102 564        .
1102 565        .
1102 566        .
1102 567        .
1102 568        .
1102 569  6037845
1102 570        .
1102 571        .
1102 572        .
1102 573        .
1102 574        .
1102 575  6207117
1102 576        .
1102 577        .
1102 578 12815413
1102 579        .
1102 580        .
1102 581 20180703
1102 582        .
1102 583        .
1102 584 17554606
1102 585        .
1102 586        .
1102 587 16931368
end
format %tm tm

In the year 2007 (and before), the data on x1 (and many others) are in half-year frequency; while in the year 2008 (and after) the data on x1 (and many others) are in quarterly frequency.
I'd like to construct monthly data so that in the same half-year (2007 and before) or the same quarter, the monthly observations are the same as the end observations of the half-year or quarterly observations. Any suggestions?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Tags: None

River Huang

Join Date: Mar 2016
Posts: 1908

22 Feb 2017, 02:27

The expected results are "y1" as below.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id float tm long(x1 y1)
1101 564        . 11922113
1101 565        . 11922113
1101 566        . 11922113
1101 567        . 11922113
1101 568        . 11922113
1101 569 11922113 11922113
1101 570        .  9817626
1101 571        .  9817626
1101 572        .  9817626
1101 573        .  9817626
1101 574        .  9817626
1101 575  9817626  9817626
1101 576        . 12423117
1101 577        . 12423117
1101 578 12423117 12423117
1101 579        . 11760556
1101 580        . 11760556
1101 581 11760556 11760556
1101 582        . 10615331
1101 583        . 10615331
1101 584 10615331 10615331
1101 585        . 13033535
1101 586        . 13033535
1101 587 13033535 13033535
1102 564        .  6037845
1102 565        .  6037845
1102 566        .  6037845
1102 567        .  6037845
1102 568        .  6037845
1102 569  6037845  6037845
1102 570        .  6207117
1102 571        .  6207117
1102 572        .  6207117
1102 573        .  6207117
1102 574        .  6207117
1102 575  6207117  6207117
1102 576        . 12815413
1102 577        . 12815413
1102 578 12815413 12815413
1102 579        . 20180703
1102 580        . 20180703
1102 581 20180703 20180703
1102 582        . 17554606
1102 583        . 17554606
1102 584 17554606 17554606
1102 585        . 16931368
1102 586        . 16931368
1102 587 16931368 16931368
end
format %tm tm

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

Charlie Joyez

Join Date: Dec 2014
Posts: 421

22 Feb 2017, 02:46

It seems you want to fill missing monthly values with the next non missing value (half-year before 2007; quarterly after).
I hope you know why you're that, because giving arbitrary values to missing values is generally not a good idea.

Anyway here with your data you can do:

Code:

gen year = yofd(dofm(tm))
gen x2=.
bysort id year (tm) : replace x2=x1[6] if _n<=6  & year<2008
bysort id year (tm) : replace x2=x1[_N] if _n<=12 & _n>6  & year<2008
bysort id year (tm) : replace x2=x1[3] if _n<=3  & year>=2008
bysort id year (tm) : replace x2=x1[6] if _n<=6 & _n>3  & year>=2008
bysort id year (tm) : replace x2=x1[9] if _n<=9 & _n>6  & year>=2008
bysort id year (tm) : replace x2=x1[12] if _n<=12 & _n>9  & year>=2008

format %13.0g x2

But again, I doubt it would be really useful,
Best,
Charlie

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#4

22 Feb 2017, 02:57

Originally posted by Charlie Joyez View Post

It seems you want to fill missing monthly values with the next non missing value (half-year before 2007; quarterly after).
I hope you know why you're that, because giving arbitrary values to missing values is generally not a good idea.

But again, I doubt it would be really useful,
Best,
Charlie

Charlie, Thanks for the suggestion. It seems to work with the example but I need to test your suggested procedure in a much larger dataset. In fact, it is not uncommon to replace (missing, unavailable) monthly observations (in the same quarter or half-year) with observed data of quarterly or half-year frequency.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

22 Feb 2017, 03:39

You can do this easily. Here is another way using mipolate (SSC): For more on mipolate, see its help or http://www.statalist.org/forums/foru...-interpolation or other mentions here.

Code:

clear
input int id float tm long(x1 y1)
1101 564        . 11922113
1101 565        . 11922113
1101 566        . 11922113
1101 567        . 11922113
1101 568        . 11922113
1101 569 11922113 11922113
1101 570        .  9817626
1101 571        .  9817626
1101 572        .  9817626
1101 573        .  9817626
1101 574        .  9817626
1101 575  9817626  9817626
1101 576        . 12423117
1101 577        . 12423117
1101 578 12423117 12423117
1101 579        . 11760556
1101 580        . 11760556
1101 581 11760556 11760556
1101 582        . 10615331
1101 583        . 10615331
1101 584 10615331 10615331
1101 585        . 13033535
1101 586        . 13033535
1101 587 13033535 13033535
1102 564        .  6037845
1102 565        .  6037845
1102 566        .  6037845
1102 567        .  6037845
1102 568        .  6037845
1102 569  6037845  6037845
1102 570        .  6207117
1102 571        .  6207117
1102 572        .  6207117
1102 573        .  6207117
1102 574        .  6207117
1102 575  6207117  6207117
1102 576        . 12815413
1102 577        . 12815413
1102 578 12815413 12815413
1102 579        . 20180703
1102 580        . 20180703
1102 581 20180703 20180703
1102 582        . 17554606
1102 583        . 17554606
1102 584 17554606 17554606
1102 585        . 16931368
1102 586        . 16931368
1102 587 16931368 16931368
end
format %tm tm
 
mipolate x1 tm, by(id) gen(y2) backward

assert y1 == y2

But I agree with Charlie. For this kind of data, the procedure is arbitrary, the increase in sample size is illusory and the correlation (dependence) structure of the result is meaningless.

I don't doubt that this is a common procedure in some fields, presumably because people have data on different bases which they wish to relate. The more easily defensible procedure is to aggregate monthly data to coarser resolution.

Comment

Charlie Joyez

Join Date: Dec 2014

Posts: 421
#6

22 Feb 2017, 03:45

In fact, it is not uncommon to replace (missing, unavailable) monthly observations (in the same quarter or half-year) with observed data of quarterly or half-year frequency.

Ok, so good for you, but I guess it depends what you aim at. Because you are transforming some discrete interval into a (sort of) continuous one, but with large discontinuities, and stagnation parts, which could be confusing, because the "real" trend is probably smoother than the one here constructed.

Anyway, the code should be easily extended to several variables, if the break between 2007 and 2008 periodicity of values is the same. A loop could be useful for the extension e.g:

Code:

foreach v in x1 y1 x2 y2 .... { /* Add all variables you want*/ bysort id year (tm) : replace `v'=`v'[6] if _n<=6 & year<2008 bysort id year (tm) : replace `v'=`v'[_N] if _n<=12 & _n>6 & year<2008 bysort id year (tm) : replace `v'=`v'[3] if _n<=3 & year>=2008 bysort id year (tm) : replace `v'=`v'[6] if _n<=6 & _n>3 & year>=2008 bysort id year (tm) : replace `v'=`v'[9] if _n<=9 & _n>6 & year>=2008 bysort id year (tm) : replace `v'=`v'[12] if _n<=12 & _n>9 & year>=2008 }

The size of the dataset shouldn't change anything as long as the id / tm observation is unique.

Best,
Charlie
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#7

22 Feb 2017, 03:46

Originally posted by Nick Cox View Post

You can do this easily. Here is another way using mipolate (SSC): For more on mipolate, see its help or http://www.statalist.org/forums/foru...-interpolation or other mentions here.

But I agree with Charlie. For this kind of data, the procedure is arbitrary, the increase in sample size is illusory and the correlation (dependence) structure of the result is meaningless.

I don't doubt that this is a common procedure in some fields, presumably because people have data on different bases which they wish to relate. The more easily defensible procedure is to aggregate monthly data to coarser resolution.

Nick, thanks again. I agree with you that aggregation might be a better way.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Announcement

Turning half-year and quarterly data into monthly data

Comment

Comment

Comment

Comment

Comment

Comment