Matching data with different time intervals

Bob Ferguson

Join Date: Jul 2019
Posts: 23

Matching data with different time intervals

28 Jul 2019, 12:48

Dear Statalist,

For my research, I need to link accounting data with stock price data (both from WRDS) for a period of more than 50 years. I have used the merged (CCM) Compustat database for this and have retrieved the variables I need for my accounting data and the variables for my stock price data.

However, the accounting data is quarterly data while the stock price data is monthly data. I would like to link these 2 data sets given that they have several unique identifiers (GVKEY, LPERMNO, LPERMCO).
Note: GVKEY is basically an identifier for a firm (so Apple would have GVKEY: 1 and Microsoft would have GVKEY: 2 and Oracle would have GVKEY: 3 etc.), the same thing goes for LPERMCO.

Now I have used the -merge- commands (1:1, 1:m, m:1) for this but it keeps giving me errors: variables gvkey/lpermco do not uniquely identify the observations.

So if I use:

merge 1:1 (or m:1 or 1:m) gvkey using CRSP.dta

It will give me the above error.

What I would like to do is merge these 2 datasets and afterwards have 1 datasets with monthly data only. I would assign the quarterly data to the appropriate quarters in which they belong in their respective fiscal year. So in case of variable "X" I would end up with:

GVKEY	Year	Month	X
1	1990	4	67
1	1990	5	67
1	1990	6	67
2	1990	5	30

Here is an example from my quarterly (accounting) dataset, here "atq" = Total Assets:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double lpermco long datadate str58 conm str10 cusip double atq
"001118" 25563 8125 "ADAMS DRUG INC"              "006203103"  85.812
"001118" 25563 8216 "ADAMS DRUG INC"              "006203103"  87.108
"001118" 25563 8308 "ADAMS DRUG INC"              "006203103"  91.059
"001118" 25563 8400 "ADAMS DRUG INC"              "006203103" 100.637
"001118" 25563 8490 "ADAMS DRUG INC"              "006203103"  99.764
"001118" 25563 8581 "ADAMS DRUG INC"              "006203103"  98.414
"001118" 25563 8673 "ADAMS DRUG INC"              "006203103" 103.948
"001118" 25563 8765 "ADAMS DRUG INC"              "006203103" 104.944
"001118" 25563 8856 "ADAMS DRUG INC"              "006203103" 102.939
"001118" 25563 8947 "ADAMS DRUG INC"              "006203103" 106.404
"001119" 20023 8125 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8216 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8308 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8400 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8490 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8581 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8673 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8765 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8856 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 8947 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
"001119" 20023 9039 "ADAMS DIVERSIFIED EQUITY FD" "006212104"       .
end
format %d datadate

And here is an example from my monthly (stock price) dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double lpermco long datadate str58 conm str10 cusip double prccm
"001118" 25563 8796 "ADAMS DRUG INC"              "006203103" 13.375
"001118" 25563 8825 "ADAMS DRUG INC"              "006203103" 14.625
"001118" 25563 8856 "ADAMS DRUG INC"              "006203103"  14.25
"001118" 25563 8886 "ADAMS DRUG INC"              "006203103"  14.25
"001118" 25563 8917 "ADAMS DRUG INC"              "006203103"  12.25
"001118" 25563 8947 "ADAMS DRUG INC"              "006203103"     13
"001118" 25563 8978 "ADAMS DRUG INC"              "006203103"   14.5
"001118" 25563 9009 "ADAMS DRUG INC"              "006203103"  16.25
"001118" 25563 9039 "ADAMS DRUG INC"              "006203103"  18.25
"001118" 25563 9070 "ADAMS DRUG INC"              "006203103"  24.25
"001118" 25563 9100 "ADAMS DRUG INC"              "006203103" 24.125
"001119" 20023 8066 "ADAMS DIVERSIFIED EQUITY FD" "006212104"  13.25
"001119" 20023 8094 "ADAMS DIVERSIFIED EQUITY FD" "006212104"   13.5
"001119" 20023 8125 "ADAMS DIVERSIFIED EQUITY FD" "006212104" 13.875
"001119" 20023 8155 "ADAMS DIVERSIFIED EQUITY FD" "006212104"   14.5
"001119" 20023 8186 "ADAMS DIVERSIFIED EQUITY FD" "006212104" 13.625
"001119" 20023 8216 "ADAMS DIVERSIFIED EQUITY FD" "006212104" 12.625
"001119" 20023 8247 "ADAMS DIVERSIFIED EQUITY FD" "006212104" 12.875
"001119" 20023 8278 "ADAMS DIVERSIFIED EQUITY FD" "006212104"  14.25
"001119" 20023 8308 "ADAMS DIVERSIFIED EQUITY FD" "006212104"   14.5
"001119" 20023 8339 "ADAMS DIVERSIFIED EQUITY FD" "006212104" 15.375
end
format %d datadate

Last edited by Bob Ferguson; 28 Jul 2019, 12:58.

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30083

28 Jul 2019, 13:03

Well, just visually scanning your data it is very obvious that gvkey does not uniquely identify the observations in either data set. And if you think about your code, merging on just gvkey, were that possible, would fail to restrict matches to those where the dates belong together. So your problem is that your merge key needs to be more than just gvkey, it must include some kind of date variable.

Now, you cannot just use datadate, because it is just a straight daily date in the stock data, but is the last date of the quarter in the accounting data. So you need to create a new variable in both data sets that is a proper Stata internal format quarterly date. Then you can merge on the combination of gvkey and that variable.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double(lpermno lpermco) long datadate double(fqtr fyearq) str58 conm str10 cusip double atq
"012994" 10001 7953 15065 3 2001 "GAS NATURAL INC"             "367204104"   68.008
"012994" 10001 7953 14791 4 2000 "GAS NATURAL INC"             "367204104"   50.553
"012994" 10001 7953 17074 1 2007 "GAS NATURAL INC"             "367204104"   59.815
"012994" 10001 7953 11322 2 1991 "GAS NATURAL INC"             "367204104"    21.71
"012994" 10001 7953 15521 4 2002 "GAS NATURAL INC"             "367204104"   56.855
"012994" 10001 7953 18443 2 2010 "GAS NATURAL INC"             "367204104"  117.984
"012994" 10001 7953 16070 2 2004 "GAS NATURAL INC"             "367204104"   68.046
"012994" 10001 7953 17347 4 2007 "GAS NATURAL INC"             "367204104"    50.52
"012994" 10001 7953 21000 2 2017 "GAS NATURAL INC"             "367204104"   183.84
"012994" 10001 7953 10500 1 1989 "GAS NATURAL INC"             "367204104"   12.126
"012994" 10001 7953 10865 1 1990 "GAS NATURAL INC"             "367204104"   18.088
"012994" 10001 7953 15430 3 2002 "GAS NATURAL INC"             "367204104"   60.603
"012994" 10001 7953 19813 1 2014 "GAS NATURAL INC"             "367204104"  212.147
"012994" 10001 7953 13787 1 1998 "GAS NATURAL INC"             "367204104"   45.408
"019049" 10002 7954 13879 4 1997 "BANCTRUST FINANCIAL GRP INC" "05978R107"  369.595
"019049" 10002 7954 11778 1 1992 "BANCTRUST FINANCIAL GRP INC" "05978R107"        .
"019049" 10002 7954 12234 2 1993 "BANCTRUST FINANCIAL GRP INC" "05978R107"        .
"019049" 10002 7954 15978 3 2003 "BANCTRUST FINANCIAL GRP INC" "05978R107"  681.627
"019049" 10002 7954 17439 3 2007 "BANCTRUST FINANCIAL GRP INC" "05978R107" 1331.299
"019049" 10002 7954 16801 4 2005 "BANCTRUST FINANCIAL GRP INC" "05978R107" 1305.497
"019049" 10002 7954 13330 2 1996 "BANCTRUST FINANCIAL GRP INC" "05978R107"  241.518
"019049" 10002 7954 13239 1 1996 "BANCTRUST FINANCIAL GRP INC" "05978R107"  240.782
"019049" 10002 7954 14334 1 1999 "BANCTRUST FINANCIAL GRP INC" "05978R107"  503.982
"019049" 10002 7954 10592 4 1988 "BANCTRUST FINANCIAL GRP INC" "05978R107"        .
"019049" 10002 7954 16617 2 2005 "BANCTRUST FINANCIAL GRP INC" "05978R107" 1320.542
end
format %d datadate
gen qdate = qofd(datadate)
format qdate %tq

tempfile accounting
save `accounting'

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double(lpermno lpermco) long datadate str58 conm str10 cusip double(prccm cshoq)
"012994" 10001 7953 20727 "GAS NATURAL INC"             "367204104"   7.67 10.516
"012994" 10001 7953 20758 "GAS NATURAL INC"             "367204104"  12.35      .
"012994" 10001 7953 20788 "GAS NATURAL INC"             "367204104"   12.5      .
"012994" 10001 7953 20819 "GAS NATURAL INC"             "367204104"  12.55  10.52
"012994" 10001 7953 20850 "GAS NATURAL INC"             "367204104"  12.65      .
"012994" 10001 7953 20878 "GAS NATURAL INC"             "367204104"  12.65      .
"012994" 10001 7953 20909 "GAS NATURAL INC"             "367204104"   12.7  10.52
"012994" 10001 7953 20939 "GAS NATURAL INC"             "367204104"   12.5      .
"012994" 10001 7953 20970 "GAS NATURAL INC"             "367204104"   12.7      .
"012994" 10001 7953 21000 "GAS NATURAL INC"             "367204104" 12.925  10.52
"012994" 10001 7953 21031 "GAS NATURAL INC"             "367204104"  12.95      .
"012994" 10001 7953 21062 "GAS NATURAL INC"             "367204104"   13.1      .
"019049" 10002 7954  9527 "BANCTRUST FINANCIAL GRP INC" "05978R107"   11.5      .
"019049" 10002 7954  9555 "BANCTRUST FINANCIAL GRP INC" "05978R107" 13.125      .
"019049" 10002 7954  9586 "BANCTRUST FINANCIAL GRP INC" "05978R107"     14      .
"019049" 10002 7954  9616 "BANCTRUST FINANCIAL GRP INC" "05978R107"  14.75      .
"019049" 10002 7954  9647 "BANCTRUST FINANCIAL GRP INC" "05978R107"   14.5      .
"019049" 10002 7954  9677 "BANCTRUST FINANCIAL GRP INC" "05978R107" 12.625    1.4
"019049" 10002 7954  9708 "BANCTRUST FINANCIAL GRP INC" "05978R107" 13.125      .
"019049" 10002 7954  9739 "BANCTRUST FINANCIAL GRP INC" "05978R107" 13.125      .
"019049" 10002 7954  9769 "BANCTRUST FINANCIAL GRP INC" "05978R107"     14    1.4
"019049" 10002 7954  9800 "BANCTRUST FINANCIAL GRP INC" "05978R107"  13.25      .
"019049" 10002 7954  9830 "BANCTRUST FINANCIAL GRP INC" "05978R107"  13.25      .
"019049" 10002 7954  9861 "BANCTRUST FINANCIAL GRP INC" "05978R107"  12.25    1.4
"019049" 10002 7954  9892 "BANCTRUST FINANCIAL GRP INC" "05978R107" 13.375      .
"019049" 10002 7954  9920 "BANCTRUST FINANCIAL GRP INC" "05978R107"  13.25      .
end
format %d datadate

gen qdate = qofd(datadate)
format qdate %tq

merge m:1 gvkey qdate using `accounting'

Comment

Bob Ferguson

Join Date: Jul 2019

Posts: 23
#3

28 Jul 2019, 13:17

Thank you very much for your explanation Mr. Schechter, that is exactly what I have been trying but failing to do.
I had generated a month and year variable using: gen month = month(datadate) and gen year = year(datadate).

After this, I tried to match the gvkey with each month in each year by:

gen key = gvkey*month*year

(I'm rather unsure of how to combine those 3 in Stata however, not surprisingly I got the following error)

insufficient memory
Stata could not obtain sufficient memory either because set max_memory is set too low or because the operating system said no. Stata needed more memory to process
character strings, probably strLs.

If you just issued a replace command or any other command that changes existing values in the data, the change may be partially made. You need to check or to assume the
worst.

In any case, you are pushing the limits and this error is likely to reoccur.

Aside: The error occurred during generate. The variable being generated has been dropped. Your data are undamaged.

Alas, that is what I came up with and it didn't work. Your solution however has done fantastic so far, until the -merge- command.

After running the -merge m:1 gvkey qdate using filename- command from your example I get the following error:

variables gvkey qdate do not uniquely identify observations in the using data
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30083
#4

28 Jul 2019, 14:12

Well, there are a few possibilities here. The code ran just fine with your example data: I tested it before posting it.

One possibility is that you have reversed the order of the data sets. In my code, the stocks data is in memory before the -merge- and the accounting data is the -using- dataset. Perhaps you have that reversed? If so, with the accounting data starting in memory you can -merge 1:m gvkey qdate using name_of_the_accounting_dataset-. This would be the simpler possibility.

If that's not it, another possibility is that your accounting data set does in fact contain multiple observations having the same gvkey and qdate. Given what you describe about the data, that sounds like it means the data are erroneous: there should be only one quarterly report per quarter for the same firm, right? So, at this point you need to identify the offending observations and then fix the data. To identify the offending observations you can run:

Code:

use name_of_the_accounting_dataset, clear gen qdate = qofd(datadate) format qdate %tq duplicates tag gvkey qdate, gen(flag) browse if flag

to bring them up in the browser. It may be that there are simply flat-out duplicate observations that contain the same values for all variables. In that case -duplicates drop- will be your fix. Re-save the data set and try the merge again. But if you have multiple observations for a given gvkey in the same quarter and they have different values for some variables, then you have a more serious problem. You need to discover how to reconcile the inconsistencies. That will depend on your understanding of the data themselves and how they were assembled. You might have to go back to the publisher of the dataset to get help with this process.

Another possibility is that I (and you, if I have not misunderstood you) are misunderstanding the data. That is, perhaps there are multiple quarterly accounting reports for some firms in the same quarter, and it is supposed to be that way. In that case, you need to rethink how you plan to work with this data, as a merge between the two data sets is not possible: there is no way for Stata to know which accounting report for a given gvkey in a given quarter to match with a given stock price record. It might be that the solution is to create duplicates for those stock price records so that each stock price record is paired with every accounting report for that firm in that quarter. That can be done with the -joinby- command, but don't go this route unless you are sure this is the correct way to structure your data for analysis. Another possibility is that the different quarterly reports for the same firm in the same quarter are distinguishable by some other variable(s) that appears in both data sets and can be used to identify which stock price records go with which accounting reports. In that case, the distinguishing variable(s) need to be added to the merge key.

As you can see, there are several possible problems here. All but the first fall under the general rubric of "the data are not what you think they are." You'll have to pursue these possibilities to discern which applies to your situation and take appropriate actions.
Comment

Bob Ferguson

Join Date: Jul 2019
Posts: 23

29 Jul 2019, 16:45

First of all, I would like to sincerely thank you for taking the time to reply to my (novice level) question in such a detailed manner.
You were indeed right in your assumption that my accounting data might contain duplicate observations. With the code you offered, I managed to find these duplicates and see what they contained. They were essentially as you said, flat-out duplicate observations, as they contained the same values for all variables.
In my (accounting) data set of about 900.000 observations there were 8000 duplicates and I thankfully got rid of them all!

After doing this, the -merge- command worked effortlessly and my data was merged exactly as I intended it to be!

I'm thankful however, for the other helpful suggestions you have made and I'm learning a lot from them.

I would like to check one more thing:

I now have monthly stock price data and quarterly accounting data matched (accounting variables like: total assets, liabilities, common equity, depreciation etc.) from the 1960's until 2018.
However, there are almost no firms with quarterly accounting data from 1960-1975 (I suppose the quarterly data for this period is hard to come by), but there is annual accounting data for these firms in this same period (1960-1975).
So in order to avoid leaving a big gap in my data set (for this period), I instead want to opt for the annual accounting data and fill the gaps during the 1960-1975 period with this type of data.

I managed to download a data set containing these annual accounting variables and now want to try and merge this with the already merged data set that I have (as we did above), so that later on I can fill in the gaps caused by missing quarterly data, with annual accounting data.

So in a nutshell (for the firms that have missing values for the quarterly accounting variables) after merging the annual accounting data set I would end up with something like the following:

GVKEY	Year	Month	Total Assets
1	1963	1	3000
1	1963	2	3000
1	1963	3	3000
1	1963	4	3000
1	1963	5	3000
1	1963	6	3000
1	1963	7	3000
1	1963	8	3000
1	1963	9	3000
1	1963	10	3000
1	1963	11	3000
1	1963	12	3000
2	1963	1	245
2	1963	2	245
2	1963	3	245
2	1963	4	245

etc..

So the way I want to achieve this is by the following

use name_of_annual_accounting_dataset, clear
gen ydate = yofd(datadate)
format ydate %ty
duplicates taq gvkey ydate, gen(flag)
browse if flag
duplicates drop gvkey ydate, force

This way I would get rid of the duplicates in the annual accounting data set.
Then I would create the variable ydate again in the "main" data set (the now merged data set, that we previously merged the quarterly accounting data with the monthly stock price data).

At last I would run the command

merge m:1 gvkey ydate using name_of_annual_accounting_dataset

Would this be the right way of going about this? Unfortunately I don't have Stata on this computer at the moment so I'm not able to produce a -dataex- for the annual accounting data, but essentially it's the same data set as the quarterly accounting data that I posted in the first post, with the only difference being that it now contains 1 observation per firm, per year (instead of 4 observations per firm, per year in the quarterly data set).

So I'm just wondering if I would be doing it right in this way or if there's something else I could be doing. If a -dataex- of the annual accounting data is needed, I would be glad to post it as soon as possible.

Comment

Bob Ferguson

Join Date: Jul 2019
Posts: 23

30 Jul 2019, 10:54

It's not allowing me to edit my previous post so I will post an example of the Annual Accounting data set down here:

Note: at = Total Assets, conm = Company name, fyear = fiscal year, cusip/lpermco = another identifier (like gvkey)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey double lpermco long datadate double fyear str8 tic str10 cusip str58 conm double at
"001118" 25563 6209 1976 "ADG.1" "006203103" "ADAMS DRUG INC"     54.696
"001118" 25563 6574 1977 "ADG.1" "006203103" "ADAMS DRUG INC"     53.112
"001118" 25563 6939 1978 "ADG.1" "006203103" "ADAMS DRUG INC"     54.287
"001118" 25563 7304 1979 "ADG.1" "006203103" "ADAMS DRUG INC"     64.105
"001118" 25563 7670 1980 "ADG.1" "006203103" "ADAMS DRUG INC"     65.566
"001118" 25563 8035 1981 "ADG.1" "006203103" "ADAMS DRUG INC"     87.187
"001118" 25563 8400 1982 "ADG.1" "006203103" "ADAMS DRUG INC"    100.637
"001118" 25563 8765 1983 "ADG.1" "006203103" "ADAMS DRUG INC"    104.944
"001120" 22757 1095 1962 "ALL.2" "006284103" "ADAMS-MILLIS CORP"    10.4
"001120" 22757 1460 1963 "ALL.2" "006284103" "ADAMS-MILLIS CORP"    10.8
"001120" 22757 1826 1964 "ALL.2" "006284103" "ADAMS-MILLIS CORP"    11.8
"001120" 22757 2191 1965 "ALL.2" "006284103" "ADAMS-MILLIS CORP"    15.6
"001120" 22757 2556 1966 "ALL.2" "006284103" "ADAMS-MILLIS CORP"    16.7
"001120" 22757 2921 1967 "ALL.2" "006284103" "ADAMS-MILLIS CORP"    17.7
"001120" 22757 3287 1968 "ALL.2" "006284103" "ADAMS-MILLIS CORP"    22.7
"001120" 22757 3652 1969 "ALL.2" "006284103" "ADAMS-MILLIS CORP"  26.424
"001120" 22757 4017 1970 "ALL.2" "006284103" "ADAMS-MILLIS CORP"  33.676
"001120" 22757 4382 1971 "ALL.2" "006284103" "ADAMS-MILLIS CORP"  38.732
"001120" 22757 4748 1972 "ALL.2" "006284103" "ADAMS-MILLIS CORP"  49.696
"001120" 22757 5113 1973 "ALL.2" "006284103" "ADAMS-MILLIS CORP"  49.781
"001120" 22757 5478 1974 "ALL.2" "006284103" "ADAMS-MILLIS CORP"  41.078
end
format %d datadate

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30083
#7

30 Jul 2019, 16:04

Yes, this looks like the right approach. You need to create the variable ydate in both the accounting and the stock price data sets (unless it already exists).

One subtle safety point. I would not use -duplicates drop gvkey ydate, force-. The reason is that you are relying on your visual appraisal from the browser that all 8000 of the observations involved with duplicate gvkey and ydate are complete duplicates on all variables. I wouldn't trust my own visual inspection on that; I wouldn't trust anybody else's either. It may well be the case, but surely in a set of 8000 observations something could be overlooked on visual inspection. And if something was overlooked, then you will be selecting one observation from each pair that has a real difference somewhere at random and irreproducibly. That's what the force option does. So what I would do is -duplicates drop-, with no variables specified and no -force- option. That will remove all and only those observations that are duplications for every variable: we don't have to worry about Stata's inspection of the data the way we have to worry about our visual inspections. Stata will definitely get that right. And if it turns out that there really are some duplications of gvkey and ydate that differ on some other variables, when you hit the -merge- command Stata will complain that the observations are not uniquely identified in the accounting data set and will halt. You will then have to go back and once again inspect those observations that have duplication of gvkey and ydate (there shouldn't be very many this time) and figure out what to do with those.

Remember, the goal in coding is not to suppress error messages. Error messages keep you from being led down the garden path by erroneous data (or by your own logic or coding errors). You should code so as to maximize the chance of getting an error message whenever something is wrong, and as soon as possible. You want problems surfaced early so you can fix them before they propagate more damage later on. You don't want code to hide problems, only to have them show up much later in some embarrassing and possibly disastrous way (as when you are presenting your results and somebody in the audience notices that they can't possibly be right), far downstream where it will be much harder to figure out what went wrong and how to fix it. My motto as a programmer is: fail early and fail often.
Comment

Bob Ferguson

Join Date: Jul 2019
Posts: 23

01 Aug 2019, 07:32

Clyde Schechter Thank you very much, also for the warning on the use of -force- option in the duplicates command.
The reason why I used -, force- was indeed because simply using -duplicates drop- was resulting in 0 duplicates being dropped (and rightfully so).

After taking another look at the duplicates subset, I found that there was basically only 1 differentiating variable in the duplicates. The values for all variables indeed looked like they were duplicates, except for the values of 1 irrelevant variable (I know this variable is irrelevant because it’s basically a security-level identifier variable which I won’t be using in my dataset). After dropping this variable completely from my dataset, I could use the regular -duplicates drop- command which resulted in the dropping of around 7700 observations.

This “forced” dropping of observations would have definitely resulted in problems at a later stage and thus I’m thankful to get the warning and prevent this from happening.

Now when it comes to the -merge- command, after playing around with it using the syntax you have given me, I realize that for a large subset of my whole data (around 66% of data) has merged as it should (meaning: the accounting data is assigned to the right stock price data at the right dates). The dates are extremely important in this and I’ve noticed that for some firms/observations the dates and the observations do not align as they should.

I suspect this is because Stata works with calendar years which works out well for most firms (as their Fiscal years go from Jan-Dec). But when we have firms that differ from this (i.e. they have a Fiscal year that goes from May 2010 - April 2011 instead of Jan 2010 - Dec 2010) the merging goes sideways.

Unfortunately, in my case, about 33% of my whole data set has Fiscal years that do not end in December of each year but rather in any of the other 11 calendar months (I know this because there is a variable in my dataset that specifically refers to the “Fiscal year ending month” of each firm/stock in my data and this variable ranges from 1 to 12). Let’s call this variable “FYearEnding”

So in this case, the merging of both datasets goes well for stocks that have their “FYearEnding” in December (that is, a value of 12 for this variable). Thus, the accounting values do align with the right dates in my stock price dataset, as they should.

But whenever a firm/stock has a different value than “12” for “FYearEnding”, the data is still merged but the accounting variables are not aligned with the right dates in the Stock price dataset.
So after the merging I would get for example the following case: Quarterly Accounting values for several account variables (let’s say: Total Assets, Total Liabilities, Net income) that are supposed to be released/reported in May 2010 (this is the value for the variable “datadate”) but in my merged dataset they already show up for the months April, May, June (whereas they should show up for the months: May, June, July).
So as you can see, there is a difference of 1 month in this case, but sometimes it could be a difference of 2 months. It just depends on which value the FYearEnding variable takes I’m guessing.

The good thing is that I have this variable “FYearEnding” in both my Accounting and Stock price datasets.

The bad thing is, I’ve been struggling with this issue for the past few days trying to solve it by myself, but for the life of me I cannot come up with the right commands to do so. I would be

I will provide a new Dataex of my (unmerged) Accounting and Stock price data below, which includes this “FYearEnding” variable (this variables comes with the dataset already and does not need generating). I will also include the “qdate” variable which we generated previously.

The Accounting dataset also includes another variable FiscalYrQtr, which specifies exactly in which Fiscal Year and Quarter this observation is. Notice that it differs from the variable “datadate” as “datadate” is simply the calendar date on which the firms released their quarterly reports.

The FiscalYrQtr is unfortunately not included in the Stock price dataset (otherwise I could’ve probably merged based on this variable and “Gvkey”).

My (Quarterly) Accounting dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey long datadate float qdate str6 FiscalYrQtr double FYearEnding str8 Ticker str58 Company_Name double(Total_Assets Total_Liabilities)
"001004" 21243 232 "2017Q3"  5 "AIR"  "AAR CORP"            1512.2   597
"001004" 21335 233 "2017Q4"  5 "AIR"  "AAR CORP"            1524.7 588.4
"001004" 21427 234 "2018Q1"  5 "AIR"  "AAR CORP"            1537.8 608.7
"001004" 21518 235 "2018Q2"  5 "AIR"  "AAR CORP"            1604.2 668.6
"001004" 21608 236 "2018Q3"  5 "AIR"  "AAR CORP"            1546.5 647.1
"001004" 21700 237 "2018Q4"  5 "AIR"  "AAR CORP"            1517.2 611.3
"001011"  8490  92 "1983Q1" 12 "ACSE" "ACS ENTERPRISES INC"  4.554 1.727
"001011"  8581  93 "1983Q2" 12 "ACSE" "ACS ENTERPRISES INC"  4.277 1.401
"001011"  8673  94 "1983Q3" 12 "ACSE" "ACS ENTERPRISES INC"  4.403 1.563
"001011"  8765  95 "1983Q4" 12 "ACSE" "ACS ENTERPRISES INC"  4.689 1.922
"001011"  8856  96 "1984Q1" 12 "ACSE" "ACS ENTERPRISES INC"  4.973 2.062
"001011"  8947  97 "1984Q2" 12 "ACSE" "ACS ENTERPRISES INC"  5.321 2.393
end
format %d datadate
format %tq qdate

My (Monthly) Stock Price dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gvkey long datadate float qdate double FYearEnding str8 Ticker str58 Company_Name double Price
"001004" 21243 232  5 "AIR"  "AAR CORP"                 42.58
"001004" 21274 232  5 "AIR"  "AAR CORP"                 44.11
"001004" 21304 233  5 "AIR"  "AAR CORP"                  43.3
"001004" 21335 233  5 "AIR"  "AAR CORP"                 44.69
"001004" 21365 233  5 "AIR"  "AAR CORP"                 46.49
"001004" 21396 234  5 "AIR"  "AAR CORP"                 47.41
"001004" 21427 234  5 "AIR"  "AAR CORP"                 46.67
"001004" 21457 234  5 "AIR"  "AAR CORP"                 47.89
"001004" 21488 235  5 "AIR"  "AAR CORP"                 47.58
"001004" 21518 235  5 "AIR"  "AAR CORP"                 43.69
"001004" 21549 235  5 "AIR"  "AAR CORP"                 37.34
"001011"  8490  92 12 "ACSE" "ACS ENTERPRISES INC" 2.37499967
"001011"  8520  93 12 "ACSE" "ACS ENTERPRISES INC" 2.49999947
"001011"  8551  93 12 "ACSE" "ACS ENTERPRISES INC" 3.99999938
"001011"  8581  93 12 "ACSE" "ACS ENTERPRISES INC" 3.24999942
"001011"  8612  94 12 "ACSE" "ACS ENTERPRISES INC" 2.49999947
"001011"  8643  94 12 "ACSE" "ACS ENTERPRISES INC" 2.37499967
"001011"  8673  94 12 "ACSE" "ACS ENTERPRISES INC" 2.49999947
"001011"  8704  95 12 "ACSE" "ACS ENTERPRISES INC" 2.49999947
"001011"  8734  95 12 "ACSE" "ACS ENTERPRISES INC" 2.12499949
"001011"  8765  95 12 "ACSE" "ACS ENTERPRISES INC"       1.75
"001011"  8796  96 12 "ACSE" "ACS ENTERPRISES INC"        1.5
"001011"  8825  96 12 "ACSE" "ACS ENTERPRISES INC"      1.375
"001011"  8856  96 12 "ACSE" "ACS ENTERPRISES INC"      1.125
"001011"  8886  97 12 "ACSE" "ACS ENTERPRISES INC"      1.875
"001011"  8917  97 12 "ACSE" "ACS ENTERPRISES INC"      1.625
"001011"  8947  97 12 "ACSE" "ACS ENTERPRISES INC"      1.625
"001011"  8978  98 12 "ACSE" "ACS ENTERPRISES INC"          2
"001011"  9009  98 12 "ACSE" "ACS ENTERPRISES INC"          2
end
format %d datadate
format %tq qdate

So in this case I tried to give an example of companies that have a different fiscal year ending month (FYearEnding) one company's Fiscal year ends in May of every year and another's ends in December. You can also see the difference with the calendar dates and the actual Fiscal year&quarter a company is in from the accounting dataset when you compare the variables "qdate" (actual calendar date) and the variable "FiscalYrQtr" (fiscal year and quarter of the observed company for which they are reporting the values for Total_Assets and Total_Liabilities).

So in a nutshell, the table below is what I want to end up with after the merging process. So that both the accounting and stock price variables are aligned with the correct "Datadate" (calendar date) even though for some companies the fiscal periods end in different months.
In the merger I have done now, this is not the case and the firm that ends their fiscal year in May (FYearEnding = 5) has it's values for Total_Assets and Total_Liabilities not aligned with the right "Datadate" value where they belong. And of course this is not just about the firms that have FYearEnding of 5 but it's the case for all firms with FYearEnding of anything other than 12 (december).

What I want to end up having:

Company_Name	Ticker	Datadate	Price	Total_Assets	Total_Liabilities
AAR CORP	AIR	28feb2018	42.58	1512.2	597
AAR CORP	AIR	31mar2018	44.11	1512.2	597
AAR CORP	AIR	30apr2018	43.3	1512.2	597
AAR CORP	AIR	31may2018	44.69	1524.7	588.4
AAR CORP	AIR	30jun2018	46.49	1524.7	588.4
AAR CORP	AIR	31jul2018	47.41	1524.7	588.4
....
....
....
ACS ENTERPRISES INC	ACSE	31mar1984	1.125	4.973	2.062
ACS ENTERPRISES INC	ACSE	30apr1984	1.875	4.973	2.062
ACS ENTERPRISES INC	ACSE	31may1984	1.625	4.973	2.062
ACS ENTERPRISES INC	ACSE	30jun1984	1.625	5.321	2.393
ACS ENTERPRISES INC	ACSE	31jul1984	2	5.321	2.393
ACS ENTERPRISES INC	ACSE	31aug1984	2	5.321	2.393

Sorry for the long post, but I just wanted to explain everything in as much detail as possible to get the right answers. Thanks again for your time.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30083
#9

01 Aug 2019, 17:17

I have always found this fiscal year business confusing, and while I appreciate the long explanation, I don't think I fully understand it, and probably never will. But here is how to calculate the FiscalYrQtr from datadate and FYearEnding (in both data sets):

Code:

gen mdate = mofd(datadate) format mdate %td gen FiscalYrQtr = qofd(dofm(mdate - mod(FYearEnding, 12))) format FiscalYrQtr %tq

Then you can do the -merge- on gvkey and FiscalYrQtr and things should come out the way you want them to.

Note: The FiscalYrQtr variable created here is numeric, with a Stata quarterly date display format, not a string. But to the human eye, it matches the string version of the variable you show.

Just be careful, further in the data, not to confuse FiscalYrQtr with a real calendar date. The real calendar quarter is still in qdate. The fiscal year quarter is in FiscalYrQtr. They are often different and not even different in a way that is consistent across different firms. Since you work with this kind of thing all the time, I suppose you know that anyway--but I would have to constantly be reminding myself of that.
Comment

Announcement

Matching data with different time intervals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment