Merge a dataset of a list of dates

Matteo Bagnara

Join Date: Jun 2018

Posts: 27
#1

Merge a dataset of a list of dates

28 Jun 2018, 08:40

Hi everybody,
My database contains is of the form:

DATE--------ID------RETURN MKTRETURN RISKFREE
02jan1986-12222 0.02------- 0.011-------------0.001

I furthermore have a list of dates in an Excel file of the form:

LIST1 ------LIST2 LIST3 LIST1+LIST2 LIST2+LIST3 LIST1+LIST3 LIST1+LIST2+LIST3
03jul1987 09sep1943....

I would like to create a dummy variable in the return dataset for every list of dates that is 1 when the corresponding date is equal to one of the dates in one list. For example: if 02jan1986 is a date contained in list1 but not in the other lists, the first dummy variable dummy variable should be 1 and all the other variables should be zero for that return and that date.

My solution would be to create a esparated dta file for every list, create a dummy variable which is always equalt to one, rename the list of date as "DATE" and merge the return database (the first one) with this new database for every list of dates, so that at the end I would have 1 for the dates that match and a missing value otherwise, so that the dummy variable I desire would be basically done. However, in trying to do this,

merge m:1 DATE using Listofdate1.dta

I got the error: variable DATE does not uniquely identify observations in the using data. The every list of dates has unique values, except for the last four lists, which are combinations of the former original lists. However, I get this error even if I use a list with unique values. DATE is not unique in the database I start with.
Hope someone can help

Thank you in advance
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

28 Jun 2018, 18:32

When I welcomed you to Statalist a few days ago, after answering your question I neglected to give you some important advice for future questions, and my failure to do that means your current question is not expressed as usefully as it could be and thus has not gotten the attention you wish for it.

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

It this case, I find it difficult to imagine what your data are like from your description. The FAQ recommends posting sample data, and samples from both your datasets would be immensely helpful. The first step will be to import your Excel data into Stata and saving it as a Stata dataset for future use. Do not follow your plan of creating multiple separate datasets. Just be sure you have gone through the effort of getting your data into Stata, because any answer will crucially depend on the details of the actual data in Stata, not the potential data in Excel.

Once all your data is in Stata datasets, post samples of the data from each dataset.

Please be sure to use the dataex command to show your example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex to read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use dataex.
Comment
Matteo Bagnara

Join Date: Jun 2018

Posts: 27
#3

29 Jun 2018, 02:21

Dear mr. Lisowski,
thank you for your thorough reply. Below and example of my data (the original dataset with returns data; file name: exdata) and an example of the dataset for the lists of dates (filename exdates).

Attached Files

Last edited by Matteo Bagnara; 29 Jun 2018, 02:25.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

29 Jun 2018, 05:06

From the output of help dataex we read

Code:

dataex is for producing a data example to include in a post on Statalist. Make sure that you have read the FAQ before posting. Users who read your post will be able to copy the code generated by dataex and re-create the dataset shown. ... Copy what is produced by dataex in the Stata Results window to your post on Statalist. Make sure to include the [CODE] and [/CODE] lines. You can use the Preview button, just to the left of the Post Reply button, to verify within Statalist that the data example is correctly formatted.

From the Statalist FAQ we read

12.5 Posting attachments: please don't...
...
In particular, please do not post screenshots. Many members will not be able to read them at all; they usually can't be read easily; and they do not allow copy and paste of data or code, which is highly desirable to allow experienced members to make precise suggestions for your questions.

Unfortunately I now leave home for several days and will not be able to respond further until my return. If you create a new post following the instructions from help dataex to copy-and-paste the output into a post, perhaps someone else will be able to help. Do paste the entire output of dataex as instructed; your screenshots have eliminated the formatting information that follows the data. You can limit the amount of output produced (the default is 100 observations) to, for example, 20 observations by using the if clause:

Code:

dataex in 1/20

as the examples in help dataex show.
Comment

Matteo Bagnara

Join Date: Jun 2018
Posts: 27

02 Jul 2018, 03:02

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date PFNumber Return MeR SMB HML RF)
-12237  1  -.8 .1 -.24 -.28 .009
-12237  2 1.36 .1 -.24 -.28 .009
-12237  3 -.44 .1 -.24 -.28 .009
-12237  4 1.32 .1 -.24 -.28 .009
-12237  5 -.57 .1 -.24 -.28 .009
-12237  6  .37 .1 -.24 -.28 .009
-12237  7 1.73 .1 -.24 -.28 .009
-12237  8 -.02 .1 -.24 -.28 .009
-12237  9 -.45 .1 -.24 -.28 .009
-12237 10  .09 .1 -.24 -.28 .009
-12237 11  -.3 .1 -.24 -.28 .009
-12237 12  .34 .1 -.24 -.28 .009
-12237 13 -.58 .1 -.24 -.28 .009
-12237 14 -.34 .1 -.24 -.28 .009
-12237 15 -.68 .1 -.24 -.28 .009
-12237 16    0 .1 -.24 -.28 .009
-12237 17   .3 .1 -.24 -.28 .009
-12237 18 -.32 .1 -.24 -.28 .009
-12237 19  .02 .1 -.24 -.28 .009
-12237 20  .41 .1 -.24 -.28 .009
end
format %td date

Last edited by Matteo Bagnara; 02 Jul 2018, 03:04.

Comment

Matteo Bagnara

Join Date: Jun 2018
Posts: 27

02 Jul 2018, 03:09

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(InflationSW FOMC UnInfl UnFOMC InflFOMC All Unemployment)
-710 6592 -717 -717 -710 -717 -717
-675 6634 -689 -689 -675 -689 -689
-651 6655 -661 -661 -651 -661 -661
-618 6683 -633 -633 -618 -633 -633
-589 6711 -612 -612 -589 -612 -612
-556 6746 -574 -574 -556 -574 -574
-527 6774 -542 -542 -527 -542 -542
-497 6802 -511 -511 -497 -511 -511
-469 6837 -479 -479 -469 -479 -479
-435 6865 -448 -448 -435 -448 -448
-402 6900 -417 -417 -402 -417 -417
-374 6928 -387 -387 -374 -387 -387
-343 6977 -353 -353 -343 -353 -353
-315 7019 -325 -325 -315 -325 -325
-282 7047 -296 -296 -282 -296 -296
-254 7082 -269 -269 -254 -269 -269
-224 7132 -235 -235 -224 -235 -235
-196 7166 -205 -205 -196 -205 -205
-163 7201 -171 -171 -163 -171 -171
-133 7219 -143 -143 -133 -143 -143
end
format %td InflationSW
format %td FOMC
format %td UnInfl
format %td UnFOMC
format %td InflFOMC
format %td All
format %td Unemployment

So what I posted is first the dataset with returns whereas the second one is the one of dates from which I have to create a dummy variable for each one of them if the date of the first dataset with returns is in the corresponding list (e.g. if the first return date is not part of the "InfaltionSW list, the corresponding dummy should be zero, whereas it will be one if the date is part of the list).

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 Jul 2018, 07:43

Thank you for the presentation of your sample data.

Based on your explanation in post #1, I have prepared modified sample data which will better demonstrate the workings of the code I show below. Based on my understanding of post #1, I think your datasets look like the following. In the exdates dataset, I believe that the UnInfl list will contain the entries in the Unemployment list followed by the entries in the InflationSW list, and so on for the other combined lists. In the exdata dataset, I show two dates with just two PFNumbers per date, and have modified the so one matches a date in the exdates dataset.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Unemployment InflationSW FOMC UnInfl UnFOMC InflFOMC All)
-717 -710 6592 -717 -717 -710 -717
-689 -675 6634 -689 -689 -675 -689
   .    .    . -710 6592 6592 -710
   .    .    . -675 6634 6634 -675
   .    .    .    .    .    . 6592
   .    .    .    .    .    . 6634
end
format %td Unemployment
format %td InflationSW
format %td FOMC
format %td UnInfl
format %td UnFOMC
format %td InflFOMC
format %td All
save exdates, replace

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date PFNumber Return MeR SMB HML RF)
-717 1  -.8 .1 -.24 -.28 .009
-717 2 1.36 .1 -.24 -.28 .009
-716 1 -.44 .1 -.24 -.28 .009
-716 2 1.32 .1 -.24 -.28 .009
end
format %td date
save exdata, replace

Below is the code that I think does what you want. But note the following: You do not need the combined lists. Once you have your dummy variables for the three "basic" lists, you can use those to construction the combined dummies.

Code:

use exdata, clear
save newexdata, replace

foreach var in Unemployment InflationSW FOMC {
    use exdates, clear
    keep `var'
    drop if missing(`var')
    duplicates drop
    rename `var' date
    merge 1:m date using newexdata, keep(match using)
    generate byte in`var' = _merge==3
    drop _merge
    save newexdata, replace
    }

generate byte inUnInfl   = inUnemployment==1 | inInflationSW==1
generate byte inUnFOMC   = inUnemployment==1 | inFOMC==1
generate byte inInflFOMC = inInflationSW==1  | inFOMC==1
generate byte inAll      = inUnInfl==1       | inFOMC==1
    
sort date PFNumber
save newexdata, replace

Code:

. use exdates, clear

. list, clean noobs

    Unemplo~t   Inflati~W        FOMC      UnInfl      UnFOMC    InflFOMC         All  
    14jan1958   21jan1958   18jan1978   14jan1958   14jan1958   21jan1958   14jan1958  
    11feb1958   25feb1958   01mar1978   11feb1958   11feb1958   25feb1958   11feb1958  
            .           .           .   21jan1958   18jan1978   18jan1978   21jan1958  
            .           .           .   25feb1958   01mar1978   01mar1978   25feb1958  
            .           .           .           .           .           .   18jan1978  
            .           .           .           .           .           .   01mar1978  

. use newexdata, clear

. list date inUnemployment inInflationSW inFOMC inUnInfl inUnFOMC inInflFOMC inAll, clean noobs

         date   inUnem~t   inInfl~W   inFOMC   inUnInfl   inUnFOMC   inInfl~C   inAll  
    14jan1958          1          0        0          1          1          0       1  
    14jan1958          1          0        0          1          1          0       1  
    15jan1958          0          0        0          0          0          0       0  
    15jan1958          0          0        0          0          0          0       0

But with that said, let me add the following. If you are going to be using these dummy variables in a regression, you should be using factor variable notation. Instead of

Code:

regress y inUnemployment inInflationSW inFOMC inUnInfl inUnFOMC inInflFOMC inAll

you would write

Code:

regress y inUnemployment##inInflationSW##inFOMC

which would give you all the interaction effects as well as the main effects. This is described thoroughly in section 11.4.3 of the Stata User's Guide PDF included with your Stata installation and accessible through Stata's Help menu.

Comment

Matteo Bagnara

Join Date: Jun 2018

Posts: 27
#8

02 Jul 2018, 10:17

Dear Mr. Lisowski,
thank you for your answer. I really appreciate your help.
I am aware I can generate the "combination" lists from the original 3 dummies, it is just that I had them already in the excel sheet and thus I could import them all.
For the regession, I will keep in mind your suggestion.
Comment

Announcement