Creating different times series variables from one times series variable

Lisa Wilson

Join Date: Aug 2016

Posts: 158
#1

Creating different times series variables from one times series variable

06 Aug 2018, 20:46

Hi All
I have a string variable representing industry of the firm. There are 5 different industries. My dataset is quarterly cross section time series. I am interested in the aggregate profits of firms in each industry in a given quarter.
I measured aggregate profits in each industry in a given quarter with weights based on lagged market capitalization as following:

egen aggregate= wtmean(profit), by (fqdate industry) weight (lag_market)
duplicates drop fqdate, force

This now gives me a new variable “aggregate” that takes values representing the aggregate profit of each industry in a given quarter. I want now to create 5 time series where each time series represent each industry. In other words, I want to have 5 times series industry variables instead one industry variables.

Thanks for your help in advance
Lisa
Tags: None
Lisa Wilson

Join Date: Aug 2016

Posts: 158
#2

06 Aug 2018, 22:31

Hi again
Thinking about it again, I believe that the duplicate drop fqdate here is not correct because it will keep only an aggregate representing one industry each quarter. I may use this after creating the mulyiple time series....
Any help?
Lisa
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#3

07 Aug 2018, 01:54

In #1 you use but do not explain the community-contributed egen function wtmean() from the package _gwtmean on SSC. (Please note FAQ Advice #12 on explaining references to community-contributed software.)

It would seem that aggregate profit is the sum not the mean (???) suggesting that

Code:

collapse (sum) profit [w=lag_market], by(industry qdate) egen panelid = group(industry), label

is a more direct route for what you want, although collapse does support means directly if that is indeed what you want. Then your data are in standard panel data structure, but if there is a good reason for different variables for different industries, then

Code:

separate profit, by(industry) veryshortlabel

is one possibility.
Comment
Lisa Wilson

Join Date: Aug 2016

Posts: 158
#4

07 Aug 2018, 02:18

Thanks Nick
In fact, my industry aggregate should be value weighted cross sectional average rather than sum.
I changed

HTML Code:

sum

in the first line of your code to

HTML Code:

mean

or

Code:

wtmean

?!

The problem now is that the resulting dataset runs from fqdate say 1970q1 to 2015q4 and shows the industry and the aggregate profit in each quarter whereas all other industry series show missing values, then the time variable starts again from 1970q1 to 2015q4 and show the second industry and the aggregate profit in each quarter whereas all other industry series show missing values and so on...

I want the dataset to show time and series without missing values each time a series show values. Is this clear? Should I send a sample of my dataset to clarify this problem?
Thanks
Lisa

Last edited by Lisa Wilson; 07 Aug 2018, 02:30.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#5

07 Aug 2018, 02:25

The help for collapse answers your question. It supports weighted means by allowing weights. Means are the default in collapse, but you're at liberty to specify

Code:

collapse (mean)

wtmean is just David Kantor's name for his egen function; it's not generic Stata syntax.
Comment
Lisa Wilson

Join Date: Aug 2016

Posts: 158
#6

07 Aug 2018, 02:31

Thanks Nick.
I have actually revised my post after doing that and encountered another problem. Do you have any suggestion to my revised post #4?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#7

07 Aug 2018, 02:46

I don't understand the difficulty without seeing enough original data to reproduce it. Usually missing data after collapse mean missing data in the original.
Comment

Lisa Wilson

Join Date: Aug 2016
Posts: 158

07 Aug 2018, 03:06

I use dataex to upload the data as following:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long gvkey int(datadate fqdate) float(lag_market industry profit)
 8252 7760 84 122.05431 1 3.9592724
 5968 7760 84 587.43774 1  3.910057
 7754 7760 84   72.1735 1 1.8557103
 5073 7760 84  9954.955 1  3.363194
 7675 7760 84  859.8544 1 1.5741892
 2479 7760 84         . 1 2.9935496
10540 7760 84   393.312 1  .8415915
 5476 7760 84 164.28525 1   .575923
 7373 7760 84  281.1525 1 2.0901177
 6639 7760 84         . 1  2.313471
 1663 7760 84 1234.9847 1 1.5138514
11284 7760 84 2126.9133 1  .8250414
 1246 7760 84         . 1  2.528295
 8935 7760 84  692.3658 1  3.013576
 1919 7760 84   785.757 1  1.110183
 9985 7760 84         . 1   1.53541
 4062 7760 84   950.136 1  .4701737
 2090 7760 84    14.172 1    4.4822
 1108 7760 84   6.56425 1  5.216558
 4769 7760 84  164.9545 1  1.481566
 9388 7760 84  62.60475 1  1.484355
 6845 7760 84 10.660687 1 4.4851575
 2290 7760 84  908.6122 1  2.582276
 9504 7760 84         . 1  .9077477
 3746 7760 84 2154.0784 1 2.3640664
11017 7760 84  270.8359 1 1.6063673
 1120 7760 84  12.94875 1  4.463491
 6573 7760 84  60.89762 1  2.561683
 4535 7760 84         . 1  6.250216
 8253 7760 84   585.395 1 2.0204425
 8730 7760 84   194.523 1 1.4483584
 9284 7760 84 26.022125 1  5.037015
 8005 7760 84  752.8281 1  2.723059
 6475 7760 84  888.2648 1 1.3669474
 2919 7760 84    33.867 1  2.303609
 7154 7760 84 1950.9956 1 1.0388031
 5055 7760 84 1504.0603 1 2.1720443
10443 7760 84  6062.633 1 1.9180068
 5764 7760 84   62.1335 1   7.55559
10952 7760 84  193.0906 1  4.959033
13990 7760 84  61.26563 1  2.610246
 7230 7760 84 33.838375 1  1.395119
11266 7760 84         . 1 2.0601277
 3167 7760 84    103.76 1 2.2195196
 3239 7760 84       270 1  .3576266
 1344 7760 84         . 1 2.2038078
 6326 7760 84  60.42738 1   2.05855
 5256 7760 84  487.7955 1 1.1592128
11453 7760 84 113.92545 1 1.6651073
 5453 7760 84         . 1 2.9128954
 2157 7760 84  1353.401 1  2.691121
10131 7760 84   160.208 1 2.1311116
 5731 7760 84   21.4285 1  1.723308
 1498 7760 84  52.69012 1 4.6999025
 9463 7760 84 239.76974 1  2.163188
 8510 7760 84  1336.715 1  .7057932
10884 7760 84 184.17175 1  8.812179
 1935 7760 84     11.75 1  3.270246
 3657 7760 84    17.829 1 3.2670064
 3125 7760 84   84.2745 1  4.236088
 7241 7760 84  768.1357 1 1.1646166
 8543 7760 84  1253.523 1  4.291421
 2043 7760 84         . 1   3.02263
 9113 7760 84    3661.1 1   2.40946
 7325 7760 84  38.95975 1 2.6517644
 3642 7760 84 24.578314 1  6.568151
 3612 7760 84   128.439 1  .8942488
 1278 7760 84  304.4275 1  2.745142
 3869 7760 84   165.813 1 1.5869882
 1643 7760 84 262.41187 1 4.0497856
10915 7760 84         . 1  4.382927
 5726 7760 84     81.34 1  1.779362
 5268 7760 84  50.91275 1 4.0629444
10160 7760 84  249.2065 1 3.0018775
 9416 7760 84   47.7445 1  2.409673
 8875 7760 84  83.86088 1  4.166289
 5722 7760 84   98.0515 1  1.761272
 5772 7760 84    76.969 1  1.777035
 2033 7760 84 35.855637 1  2.632339
 7139 7760 84  346.3628 1 1.0844092
 1408 7760 84  983.5838 1   4.69546
 7891 7760 84  39.24375 1  2.792379
 5185 7760 84    44.084 1  2.815101
21504 7760 84   218.994 1 3.5999286
 2489 7760 84   505.494 1  4.109499
 7938 7760 84   40.3144 1  4.629065
 4945 7760 84 218.21213 1   3.34945
 4608 7760 84   110.448 1  .8892611
 3733 7760 84  83.41125 1 4.1231203
 9293 7760 84   133.055 1 1.4808687
 1599 7760 84  365.7795 1  2.812956
 6753 7760 84   50.8743 1 1.7619166
 6670 7760 84 143.19463 1 1.4494872
 6179 7760 84  22.65375 1 4.0960655
 6288 7760 84 217.22313 1  1.081877
 5669 7760 84  44.80766 1  5.685085
 9656 7760 84         . 1  5.221913
 8852 7760 84  572.1066 1 2.3024843
 1863 7760 84     8.577 1 2.3949826
 3863 7760 84   470.296 1  .8693243
end
format %d datadate
format %tq fqdate
label values industry industry
label def industry 1 "Consumer Durables, NonDurables, Wholesale, Retail, and Some Services (Laundries, Repair Shops)", modify

When I use now:

HTML Code:

collapse (mean) profit[w=lag_market], by (industry fqdate)

egen panelid=group(industry), label

separate profit, by (industry) veryshortlabel

I encounter the problem I was trying to describe (which is having missing values for other series each time a series has values)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35651
#9

07 Aug 2018, 03:26

Yes; if an observation is in one industry, necessarily it cannot be in any other. You've not said why you want the separate variables, but it may be that for some analyses you want a reshape wide. For most Stata purposes, your data are better off in long layout.
Comment
Lisa Wilson

Join Date: Aug 2016

Posts: 158
#10

07 Aug 2018, 03:50

I want to separate the variables because I need to measure the value weighted profit in each industry separately. I will then combine the resulting time series with another time series data that includes aggregaate investment in each quarter. I want to run a time series regression of aggregate investment on aggregate profit in each industry in order to understand which industry in contributing more to aggregate investment.

All what I need is a series of aggregate profit for each industry (as if each series representing a different variable).

If reshape is the way to go, how can this be implemented?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#11

07 Aug 2018, 04:11

I need to measure the value weighted profit in each industry separately.

Your present data layout (structure, format) qualifies.

I will then combine the resulting time series with another time series dataset that includes aggregate investment in each quarter.

We need to know the layout of that other dataset to advise how to do it.

I want to run a time series regression of aggregate investment on aggregate profit in each industry

We need to know what that means precisely but you can run regressions separately for each industry with your present data layout using if to select.

Notes: the word format is so overloaded that I try to avoid it here.

If reshape is the way to go, how can this be implemented?

Well documented. Start with the help.
Comment
Lisa Wilson

Join Date: Aug 2016

Posts: 158
#12

07 Aug 2018, 06:21

The data set I will merge with is simply a time series of aggregate investment and the same time variable fqdate.The merge will be 1:1 based on fqdate.

I am struggling with how to add reshape to Nick's code. Shall I use reshape instead of Nick's code or I reshape the variables after collapse?

To clarify, I need a data set similar to this (where the values in ind1 to ind4 represent the value weighted average profit in each industry)
fqdate ind1 ind2 ind3 ind4

1980q1 0.2 0.3 2.1 2.1

1980q2 0.4 0.1 0.3 1.1

1980q3 0.1 0.8 2.3 0.4

Therefore after merging this dataset with aggregate investment, I will run a regression of investment on each of these series (i.e. each series is a separate variable). For example reg inv ind1 ind2 ind3 ind4, r
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#13

07 Aug 2018, 06:49

This really is the easiest kind of reshape wide. If you're totally dependent on people telling you the simplest things, you won't progress with Stata at all. What's the struggle? What code did you try? What happened?
Comment
Lisa Wilson

Join Date: Aug 2016

Posts: 158
#14

08 Aug 2018, 02:17

I tried the following now:

HTML Code:

collapse (mean) profit[w=lag_market], by (industry fqdate) egen panelid=group(industry), label reshape wide Tbtm, i(panelid) j(fqdate)

However, it did not produce the dataset I am looking for.

Do I implement reshape incorrectly?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#15

08 Aug 2018, 03:19

I took your data example from #8 and tried your code from #14, but there is no variable Tbtm. Sorry to seem awkward but as yet I can't reproduce your results and "did not produce the dataset I am looking for" doesn't explain why the results you do get but we can't see are not as desired.

Old story this, but understanding how your code produces your results with real(istic) data is still elusive.
Comment

fqdate	ind1	ind2	ind3	ind4
1980q1	0.2	0.3	2.1	2.1
1980q2	0.4	0.1	0.3	1.1
1980q3	0.1	0.8	2.3	0.4

Announcement

Creating different times series variables from one times series variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment