Combining time variables in panel data

Hidde Steenbeek

Join Date: Jan 2019

Posts: 5
#1

Combining time variables in panel data

02 Jan 2019, 05:45

Hi,

This is my first time here on statalist, I am trying to do a portfolio analysis in Stata and ran into some problems. I am hoping one of you could help me.

1) I have a dataset with data for about 1500 companies and monthly return data, however this monthly return data is all in different variables. So a variable for January 2000, one for February 2000 etc until June 2018. However i want to forecast this data and therefore set it as panel data, however for this to work i would need to have one variable that has all returns as observations to use xtset. So my question is how can i combine the different variables into one such that i have for all companies 222 (18,5 years * 12 months) observations of return?

2) In the same data set i have other scores (environmental, social and governmental) for the companies, however these are yearly. I would also like to forecast these. I think that I would need to split these before declaring them as panel data because of the difference in time periods (monthly vs yearly), is that correct? Or is there someway to work around this?

Thanks in advance!

Kind regards,

Last edited by Hidde Steenbeek; 02 Jan 2019, 05:48.
Tags: data, forecast, panel, xtset
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

02 Jan 2019, 09:56

1) You will need to use the -reshape- command to do this. Run -help reshape-, read that, and then scroll back to the top and click on the link to the PDF documentation (in blue). Then read that chapter of the documentation. The -reshape- command takes some getting used to, and as you are not, I assume, familiar with it, you may find it difficult to figure out how to apply it to your data. If that proves to be the case, you can post back for specific help with that--but useful help would only be possible if you show an example of your data. If and when you do that, be sure to use the -dataex- command to show the example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

I have no comment on 2) as I do not understand how this data relates to your other data, nor how you want to use it.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#3

02 Jan 2019, 11:49

Hidde:
welcome to this forum.
As an aside to Clyde's helpful advice, and despite the lack of any data excerpt/example of what you're experiencing, in order to analyse the data you describe at 1) and 2) you should consider using -year- as time variable.
As far as data described at 1) are concerned, you can extract the -year- component from them via the following code (let's assume that your date are in -string- format):

Code:

. set obs 1 . gen mystring="02jan2019" . gen eventdate = date(mystring, "DMY") . format eventdate %td . gen eventyear = yofd(eventdate) . format eventyear %ty

The, if you have repeated panel values within the same year, provided that you're not planning to use time-series related commands (eg, lags and leads), you can simply -xtset- your data with the -panelid- only and eventually run you regression.

Kind regards,
Carlo
(Stata 19.0)
Comment

David Benson

Join Date: Oct 2018
Posts: 489

02 Jan 2019, 12:48

Hidde Steenbeek , welcome to Statalist!

So, this will be easier (and we can provide more help) if you could share a sample of your data using Stata's dataex command. If you need help using Stata's dataex command there is a Youtube video on it here

But without your data, I took a stab at creating some toy data (I used Excel's randbetween() to create random integers between 10-500):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str9 firm_name int(jan2000 feb2000 mar2000 jan2001 feb2001 mar2001 jan2002 feb2002 mar2002)
1 "Google"    150 389  19 213 451 110 392 399 471
2 "Apple"      34  95 313 325 143  91  32 440  27
3 "Microsoft" 206 487 255 315 359 233 290 175  50
end

. list, noobs

  +----------------------------------------------------------------------------------------------------------+
  | id   firm_name   jan2000   feb2000   mar2000   jan2001   feb2001   mar2001   jan2002   feb2002   mar2002 |
  |----------------------------------------------------------------------------------------------------------|
  |  1      Google       150       389        19       213       451       110       392       399       471 |
  |  2       Apple        34        95       313       325       143        91        32       440        27 |
  |  3   Microsoft       206       487       255       315       359       233       290       175        50 |
  +----------------------------------------------------------------------------------------------------------+

Code:

. reshape long jan feb mar, i(id) j(year)
* jan, feb, and mar are "stubs" (i.e. jan2000 jan2001 feb2000 feb2001, etc)
* Think of 2000, 2001, and 2002 as "tails". j(year) is how you tell Stata what the tail represents (in this case years)
/*
(note: j = 2000 2001 2002)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        3   ->       9
Number of variables                  11   ->       6
j variable (3 values)                     ->   year
xij variables:
                jan2000 jan2001 jan2002   ->   jan
                feb2000 feb2001 feb2002   ->   feb
                mar2000 mar2001 mar2002   ->   mar
-----------------------------------------------------------------------------
*/
. list, sepby(id) noobs

  +-----------------------------------------+
  | id   year   firm_name   jan   feb   mar |
  |-----------------------------------------|
  |  1   2000      Google   150   389    19 |
  |  1   2001      Google   213   451   110 |
  |  1   2002      Google   392   399   471 |
  |-----------------------------------------|
  |  2   2000       Apple    34    95   313 |
  |  2   2001       Apple   325   143    91 |
  |  2   2002       Apple    32   440    27 |
  |-----------------------------------------|
  |  3   2000   Microsoft   206   487   255 |
  |  3   2001   Microsoft   315   359   233 |
  |  3   2002   Microsoft   290   175    50 |
  +-----------------------------------------+

The above gets you halfway there (months are still separate variables).

Code:

* Adding m_ to months (so they all have the same prefix or "stub")
rename jan m_jan
rename feb m_feb
rename mar m_mar

reshape long m_, i(id year) j(month) string  // have to indicate "string" because new var "month" will be a string containing "jan", "feb", & "mar"
order firm_name, after(id)  // so id is first, firm_name 2nd in Stata

list, sepby(id) noobs

  +-------------------------------------+
  | id   firm_name   year   month    m_ |
  |-------------------------------------|
  |  1      Google   2000     feb   389 |
  |  1      Google   2000     jan   150 |
  |  1      Google   2000     mar    19 |
  |  1      Google   2001     feb   451 |
  |  1      Google   2001     jan   213 |
  |  1      Google   2001     mar   110 |
  |  1      Google   2002     feb   399 |
  |  1      Google   2002     jan   392 |
  |  1      Google   2002     mar   471 |
  |-------------------------------------|
  |  2       Apple   2000     feb    95 |
  |  2       Apple   2000     jan    34 |
  |  2       Apple   2000     mar   313 |
  |  2       Apple   2001     feb   143 |
  |  2       Apple   2001     jan   325 |
  |  2       Apple   2001     mar    91 |
  |  2       Apple   2002     feb   440 |
  |  2       Apple   2002     jan    32 |
  |  2       Apple   2002     mar    27 |
  |-------------------------------------|
  |  3   Microsoft   2000     feb   487 |
  |  3   Microsoft   2000     jan   206 |
  |  3   Microsoft   2000     mar   255 |
  |  3   Microsoft   2001     feb   359 |
  |  3   Microsoft   2001     jan   315 |
  |  3   Microsoft   2001     mar   233 |
  |  3   Microsoft   2002     feb   175 |
  |  3   Microsoft   2002     jan   290 |
  |  3   Microsoft   2002     mar    50 |
  +-------------------------------------+

Note that "feb" is listed before "jan" because month is sorted alphabetically. To combine month and year into a single variable and format it as a recognized Stata date:

Code:

egen year_month_str = concat(month year)  // the _str is my way of indicating this will be a string variable
gen yr_month = daily( year_month_str, "MY")
format yr_month %td
format yr_month %tdmCY // formatting as Mar2001
drop year_month_str  // no longer needed
sort id yr_month

* NOTE: you will still need to rename m_ to whatever it really represents
. list, sepby(id) noobs

  +------------------------------------------------+
  | id   firm_name   year   month    m_   yr_month |
  |------------------------------------------------|
  |  1      Google   2000     jan   150    Jan2000 |
  |  1      Google   2000     feb   389    Feb2000 |
  |  1      Google   2000     mar    19    Mar2000 |
  |  1      Google   2001     jan   213    Jan2001 |
  |  1      Google   2001     feb   451    Feb2001 |
  |  1      Google   2001     mar   110    Mar2001 |
  |  1      Google   2002     jan   392    Jan2002 |
  |  1      Google   2002     feb   399    Feb2002 |
  |  1      Google   2002     mar   471    Mar2002 |
  |------------------------------------------------|
  |  2       Apple   2000     jan    34    Jan2000 |
  |  2       Apple   2000     feb    95    Feb2000 |
  |  2       Apple   2000     mar   313    Mar2000 |
  |  2       Apple   2001     jan   325    Jan2001 |
  |  2       Apple   2001     feb   143    Feb2001 |
  |  2       Apple   2001     mar    91    Mar2001 |
  |  2       Apple   2002     jan    32    Jan2002 |
  |  2       Apple   2002     feb   440    Feb2002 |
  |  2       Apple   2002     mar    27    Mar2002 |
  |------------------------------------------------|
  |  3   Microsoft   2000     jan   206    Jan2000 |
  |  3   Microsoft   2000     feb   487    Feb2000 |
  |  3   Microsoft   2000     mar   255    Mar2000 |
  |  3   Microsoft   2001     jan   315    Jan2001 |
  |  3   Microsoft   2001     feb   359    Feb2001 |
  |  3   Microsoft   2001     mar   233    Mar2001 |
  |  3   Microsoft   2002     jan   290    Jan2002 |
  |  3   Microsoft   2002     feb   175    Feb2002 |
  |  3   Microsoft   2002     mar    50    Mar2002 |
  +------------------------------------------------+

Last edited by David Benson; 02 Jan 2019, 12:57.

Comment

Hidde Steenbeek

Join Date: Jan 2019

Posts: 5
#5

04 Jan 2019, 04:24

Thank you all for the responses! I used David's solution and that worked just fine for my first problem!
This is a small sample of my data created via dataex:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str12 ISIN str34 Name double(RIMonthly200001 RIMonthly200002 RIMonthly200101 RIMonthly200102) long(ME2000 ME2001) double(ESGScore2000 ESGScore2001) "ARP6806N1051" "HOLCIM ARGENTINA" 74.77 74.4 67.15 54.8 273812 129524 17.68 19.448 "ARP6823S1295" "MIRGOR 'C'" 12.22 11.03 10.58 10.33 9002 8213 14.79 16.269000000000002 "ARP9028N1016" "TELECOM ARGENTINA" 233.62 268.37 149.5 108.43 2993070 1784596 34.97 38.467 "ARP9308R1039" "TRANSPORTADORA DE GAS DEL SUR" 181.16 217.52 192.89 176.89 1112499 1034506 60.85 66.935 "ARP9897X1319" "YPF" 210.67 212.85 183.91 176.14 10380114 9297116 57 62.7 "ARSIDE010029" "TERNIUM ARGENTINA SOCIEDAD ANONIMA" 177.53 184.55 142.18 109.63 955715 375884 57.94 63.734 "ARSMIG010018" "SA SAN MIGUEL" 31.13 33.64 32.4 31.34 41945 26807 41.84 46.02400000000001 end

Note: In this data the RIMonthly200001 is the monthly return of January 2000 and 200002 would be February.
Note: The ME2000 is the market value for the year 2000 of the company and ESGScore is a certain score for the year 2000 of the company.

I would like to make a monthly forecast for the returns as well as the market value and ESG scores, however to forecast I assumed that I would first need to set it as panel data. Therefore I made the assumption that the market value and the ESG score would not change between months in a certain year. So i would like to create something that I would have the years and months specified in the way David did it in the previous post and that the market value (ME) and ESG score are for the years different but for the months the same. Is there any way to get to this? Or is there a better way to handle these kinds of problems?

Thanks in advance!
Comment

David Benson

Join Date: Oct 2018
Posts: 489

04 Jan 2019, 18:41

You could do it one of two ways:
1) Save the annual data into a new dataset and then merge that into the monthly data, matching on ISIN year.

Code:

* Using your data from #5
reshape long ME ESGScore, i( ISIN) j(year)
format ESGScore %9.2fc
format ME %12.0gc
list ISIN Name year ME ESGScore, sepby(ISIN) noobs

  +----------------------------------------------------------------------------------+
  |         ISIN                                 Name   year           ME   ESGScore |
  |----------------------------------------------------------------------------------|
  | ARP6806N1051                     HOLCIM ARGENTINA   2000      273,812      17.68 |
  | ARP6806N1051                     HOLCIM ARGENTINA   2001      129,524      19.45 |
  |----------------------------------------------------------------------------------|
  | ARP6823S1295                           MIRGOR 'C'   2000        9,002      14.79 |
  | ARP6823S1295                           MIRGOR 'C'   2001        8,213      16.27 |
  |----------------------------------------------------------------------------------|
  | ARP9028N1016                    TELECOM ARGENTINA   2000    2,993,070      34.97 |
  | ARP9028N1016                    TELECOM ARGENTINA   2001    1,784,596      38.47 |
  |----------------------------------------------------------------------------------|
  | ARP9308R1039        TRANSPORTADORA DE GAS DEL SUR   2000    1,112,499      60.85 |
  | ARP9308R1039        TRANSPORTADORA DE GAS DEL SUR   2001    1,034,506      66.94 |
  |----------------------------------------------------------------------------------|
  | ARP9897X1319                                  YPF   2000   10,380,114      57.00 |
  | ARP9897X1319                                  YPF   2001    9,297,116      62.70 |
  |----------------------------------------------------------------------------------|

* Drop any monthly vars, save this file as annual_data.dta
use monthly_data.dta, clear
merge ISIN year m:1 using annual_data.dta, nonotes keepusing(vars to bring over) keep(match master) gen(merge_annual)

2) You could actually do it at the same time as you reshape the monthly data. The annual data will remain constant over the year, just as the firm name and firm_id remain constant over all of the firm data in the resulting long format.

Code:

*  Using the data I used earlier, but adding ME and employees (as annual measures)
clear
input byte id str9 firm_name int(jan2000 feb2000 mar2000 jan2001 feb2001 mar2001 jan2002 feb2002 mar2002) float(ME2000 ME2001 emp2000 emp2001 ME2002 emp2002)
1 "Google"    150 389  19 213 451 110 392 399 471 10000 15000 500 750 14452 725
2 "Apple"      34  95 313 325 143  91  32 440  27  8762 12214 450 600 13500 589
3 "Microsoft" 206 487 255 315 359 233 290 175  50  5623  7521 325 422 12750 475
end

* 1st Reshaping to long (to get year) 
format ME* %12.0gc
reshape long jan feb mar ME emp, i(id) j(year)
list, sepby(id) noobs

  +--------------------------------------------------------+
  | id   year   firm_name   jan   feb   mar       ME   emp |
  |--------------------------------------------------------|
  |  1   2000      Google   150   389    19   10,000   500 |
  |  1   2001      Google   213   451   110   15,000   750 |
  |  1   2002      Google   392   399   471   14,452   725 |
  |--------------------------------------------------------|
  |  2   2000       Apple    34    95   313    8,762   450 |
  |  2   2001       Apple   325   143    91   12,214   600 |
  |  2   2002       Apple    32   440    27   13,500   589 |
  |--------------------------------------------------------|
  |  3   2000   Microsoft   206   487   255    5,623   325 |
  |  3   2001   Microsoft   315   359   233    7,521   422 |
  |  3   2002   Microsoft   290   175    50   12,750   475 |
  +--------------------------------------------------------+

* 2nd Reshaping to long (to get month) 
rename jan m_jan
rename feb m_feb
rename mar m_mar
reshape long m_, i(id year) j(month) string  
order firm_name, after(id) 

. list if year<=2001, sepby(id year) noobs

  +----------------------------------------------------+
  | id   firm_name   year   month    m_       ME   emp |
  |----------------------------------------------------|
  |  1      Google   2000     feb   389   10,000   500 |
  |  1      Google   2000     jan   150   10,000   500 |
  |  1      Google   2000     mar    19   10,000   500 |
  |----------------------------------------------------|
  |  1      Google   2001     feb   451   15,000   750 |
  |  1      Google   2001     jan   213   15,000   750 |
  |  1      Google   2001     mar   110   15,000   750 |
  |----------------------------------------------------|
  |  2       Apple   2000     feb    95    8,762   450 |
  |  2       Apple   2000     jan    34    8,762   450 |
  |  2       Apple   2000     mar   313    8,762   450 |
  |----------------------------------------------------|
  |  2       Apple   2001     feb   143   12,214   600 |
  |  2       Apple   2001     jan   325   12,214   600 |
  |  2       Apple   2001     mar    91   12,214   600 |
  |----------------------------------------------------|
  |  3   Microsoft   2000     feb   487    5,623   325 |
  |  3   Microsoft   2000     jan   206    5,623   325 |
  |  3   Microsoft   2000     mar   255    5,623   325 |
  |----------------------------------------------------|
  |  3   Microsoft   2001     feb   359    7,521   422 |
  |  3   Microsoft   2001     jan   315    7,521   422 |
  |  3   Microsoft   2001     mar   233    7,521   422 |
  +----------------------------------------------------+

Comment

Hidde Steenbeek

Join Date: Jan 2019

Posts: 5
#7

05 Jan 2019, 06:39

Thanks for the response, David! Your first method worked great and we got it to merge and have it sorted the way we liked.
At the moment we are trying to forecast the data. Thanks!
1 like
Comment

Announcement