Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge a dataset of a list of dates

    Hi everybody,
    My database contains is of the form:

    DATE--------ID------RETURN MKTRETURN RISKFREE
    02jan1986-12222 0.02------- 0.011-------------0.001

    I furthermore have a list of dates in an Excel file of the form:

    LIST1 ------LIST2 LIST3 LIST1+LIST2 LIST2+LIST3 LIST1+LIST3 LIST1+LIST2+LIST3
    03jul1987 09sep1943....

    I would like to create a dummy variable in the return dataset for every list of dates that is 1 when the corresponding date is equal to one of the dates in one list. For example: if 02jan1986 is a date contained in list1 but not in the other lists, the first dummy variable dummy variable should be 1 and all the other variables should be zero for that return and that date.

    My solution would be to create a esparated dta file for every list, create a dummy variable which is always equalt to one, rename the list of date as "DATE" and merge the return database (the first one) with this new database for every list of dates, so that at the end I would have 1 for the dates that match and a missing value otherwise, so that the dummy variable I desire would be basically done. However, in trying to do this,

    merge m:1 DATE using Listofdate1.dta

    I got the error: variable DATE does not uniquely identify observations in the using data. The every list of dates has unique values, except for the last four lists, which are combinations of the former original lists. However, I get this error even if I use a list with unique values. DATE is not unique in the database I start with.
    Hope someone can help

    Thank you in advance

  • #2
    When I welcomed you to Statalist a few days ago, after answering your question I neglected to give you some important advice for future questions, and my failure to do that means your current question is not expressed as usefully as it could be and thus has not gotten the attention you wish for it.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    It this case, I find it difficult to imagine what your data are like from your description. The FAQ recommends posting sample data, and samples from both your datasets would be immensely helpful. The first step will be to import your Excel data into Stata and saving it as a Stata dataset for future use. Do not follow your plan of creating multiple separate datasets. Just be sure you have gone through the effort of getting your data into Stata, because any answer will crucially depend on the details of the actual data in Stata, not the potential data in Excel.

    Once all your data is in Stata datasets, post samples of the data from each dataset.

    Please be sure to use the dataex command to show your example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex to read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use dataex.


    Comment


    • #3
      Dear mr. Lisowski,
      thank you for your thorough reply. Below and example of my data (the original dataset with returns data; file name: exdata) and an example of the dataset for the lists of dates (filename exdates).











      Attached Files
      Last edited by Matteo Bagnara; 29 Jun 2018, 02:25.

      Comment


      • #4
        From the output of help dataex we read
        Code:
            dataex is for producing a data example to include in a post on Statalist.  Make sure that you
            have read the FAQ before posting.  Users who read your post will be able to copy the code
            generated by dataex and re-create the dataset shown.
        
        ...
        
            Copy what is produced by dataex in the Stata Results window to your post on Statalist.  Make
            sure to include the [CODE] and [/CODE] lines.  You can use the Preview button, just to the left
            of the Post Reply button, to verify within Statalist that the data example is correctly
            formatted.
        From the Statalist FAQ we read

        12.5 Posting attachments: please don't...
        ...
        In particular, please do not post screenshots. Many members will not be able to read them at all; they usually can't be read easily; and they do not allow copy and paste of data or code, which is highly desirable to allow experienced members to make precise suggestions for your questions.
        Unfortunately I now leave home for several days and will not be able to respond further until my return. If you create a new post following the instructions from help dataex to copy-and-paste the output into a post, perhaps someone else will be able to help. Do paste the entire output of dataex as instructed; your screenshots have eliminated the formatting information that follows the data. You can limit the amount of output produced (the default is 100 observations) to, for example, 20 observations by using the if clause:
        Code:
        dataex in 1/20
        as the examples in help dataex show.

        Comment


        • #5
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(date PFNumber Return MeR SMB HML RF)
          -12237  1  -.8 .1 -.24 -.28 .009
          -12237  2 1.36 .1 -.24 -.28 .009
          -12237  3 -.44 .1 -.24 -.28 .009
          -12237  4 1.32 .1 -.24 -.28 .009
          -12237  5 -.57 .1 -.24 -.28 .009
          -12237  6  .37 .1 -.24 -.28 .009
          -12237  7 1.73 .1 -.24 -.28 .009
          -12237  8 -.02 .1 -.24 -.28 .009
          -12237  9 -.45 .1 -.24 -.28 .009
          -12237 10  .09 .1 -.24 -.28 .009
          -12237 11  -.3 .1 -.24 -.28 .009
          -12237 12  .34 .1 -.24 -.28 .009
          -12237 13 -.58 .1 -.24 -.28 .009
          -12237 14 -.34 .1 -.24 -.28 .009
          -12237 15 -.68 .1 -.24 -.28 .009
          -12237 16    0 .1 -.24 -.28 .009
          -12237 17   .3 .1 -.24 -.28 .009
          -12237 18 -.32 .1 -.24 -.28 .009
          -12237 19  .02 .1 -.24 -.28 .009
          -12237 20  .41 .1 -.24 -.28 .009
          end
          format %td date
          Last edited by Matteo Bagnara; 02 Jul 2018, 03:04.

          Comment


          • #6
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(InflationSW FOMC UnInfl UnFOMC InflFOMC All Unemployment)
            -710 6592 -717 -717 -710 -717 -717
            -675 6634 -689 -689 -675 -689 -689
            -651 6655 -661 -661 -651 -661 -661
            -618 6683 -633 -633 -618 -633 -633
            -589 6711 -612 -612 -589 -612 -612
            -556 6746 -574 -574 -556 -574 -574
            -527 6774 -542 -542 -527 -542 -542
            -497 6802 -511 -511 -497 -511 -511
            -469 6837 -479 -479 -469 -479 -479
            -435 6865 -448 -448 -435 -448 -448
            -402 6900 -417 -417 -402 -417 -417
            -374 6928 -387 -387 -374 -387 -387
            -343 6977 -353 -353 -343 -353 -353
            -315 7019 -325 -325 -315 -325 -325
            -282 7047 -296 -296 -282 -296 -296
            -254 7082 -269 -269 -254 -269 -269
            -224 7132 -235 -235 -224 -235 -235
            -196 7166 -205 -205 -196 -205 -205
            -163 7201 -171 -171 -163 -171 -171
            -133 7219 -143 -143 -133 -143 -143
            end
            format %td InflationSW
            format %td FOMC
            format %td UnInfl
            format %td UnFOMC
            format %td InflFOMC
            format %td All
            format %td Unemployment
            So what I posted is first the dataset with returns whereas the second one is the one of dates from which I have to create a dummy variable for each one of them if the date of the first dataset with returns is in the corresponding list (e.g. if the first return date is not part of the "InfaltionSW list, the corresponding dummy should be zero, whereas it will be one if the date is part of the list).

            Comment


            • #7
              Thank you for the presentation of your sample data.

              Based on your explanation in post #1, I have prepared modified sample data which will better demonstrate the workings of the code I show below. Based on my understanding of post #1, I think your datasets look like the following. In the exdates dataset, I believe that the UnInfl list will contain the entries in the Unemployment list followed by the entries in the InflationSW list, and so on for the other combined lists. In the exdata dataset, I show two dates with just two PFNumbers per date, and have modified the so one matches a date in the exdates dataset.
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(Unemployment InflationSW FOMC UnInfl UnFOMC InflFOMC All)
              -717 -710 6592 -717 -717 -710 -717
              -689 -675 6634 -689 -689 -675 -689
                 .    .    . -710 6592 6592 -710
                 .    .    . -675 6634 6634 -675
                 .    .    .    .    .    . 6592
                 .    .    .    .    .    . 6634
              end
              format %td Unemployment
              format %td InflationSW
              format %td FOMC
              format %td UnInfl
              format %td UnFOMC
              format %td InflFOMC
              format %td All
              save exdates, replace
              
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(date PFNumber Return MeR SMB HML RF)
              -717 1  -.8 .1 -.24 -.28 .009
              -717 2 1.36 .1 -.24 -.28 .009
              -716 1 -.44 .1 -.24 -.28 .009
              -716 2 1.32 .1 -.24 -.28 .009
              end
              format %td date
              save exdata, replace
              Below is the code that I think does what you want. But note the following: You do not need the combined lists. Once you have your dummy variables for the three "basic" lists, you can use those to construction the combined dummies.
              Code:
              use exdata, clear
              save newexdata, replace
              
              foreach var in Unemployment InflationSW FOMC {
                  use exdates, clear
                  keep `var'
                  drop if missing(`var')
                  duplicates drop
                  rename `var' date
                  merge 1:m date using newexdata, keep(match using)
                  generate byte in`var' = _merge==3
                  drop _merge
                  save newexdata, replace
                  }
              
              generate byte inUnInfl   = inUnemployment==1 | inInflationSW==1
              generate byte inUnFOMC   = inUnemployment==1 | inFOMC==1
              generate byte inInflFOMC = inInflationSW==1  | inFOMC==1
              generate byte inAll      = inUnInfl==1       | inFOMC==1
                  
              sort date PFNumber
              save newexdata, replace
              Code:
              . use exdates, clear
              
              . list, clean noobs
              
                  Unemplo~t   Inflati~W        FOMC      UnInfl      UnFOMC    InflFOMC         All  
                  14jan1958   21jan1958   18jan1978   14jan1958   14jan1958   21jan1958   14jan1958  
                  11feb1958   25feb1958   01mar1978   11feb1958   11feb1958   25feb1958   11feb1958  
                          .           .           .   21jan1958   18jan1978   18jan1978   21jan1958  
                          .           .           .   25feb1958   01mar1978   01mar1978   25feb1958  
                          .           .           .           .           .           .   18jan1978  
                          .           .           .           .           .           .   01mar1978  
              
              . use newexdata, clear
              
              . list date inUnemployment inInflationSW inFOMC inUnInfl inUnFOMC inInflFOMC inAll, clean noobs
              
                       date   inUnem~t   inInfl~W   inFOMC   inUnInfl   inUnFOMC   inInfl~C   inAll  
                  14jan1958          1          0        0          1          1          0       1  
                  14jan1958          1          0        0          1          1          0       1  
                  15jan1958          0          0        0          0          0          0       0  
                  15jan1958          0          0        0          0          0          0       0
              But with that said, let me add the following. If you are going to be using these dummy variables in a regression, you should be using factor variable notation. Instead of
              Code:
              regress y inUnemployment inInflationSW inFOMC inUnInfl inUnFOMC inInflFOMC inAll
              you would write
              Code:
              regress y inUnemployment##inInflationSW##inFOMC
              which would give you all the interaction effects as well as the main effects. This is described thoroughly in section 11.4.3 of the Stata User's Guide PDF included with your Stata installation and accessible through Stata's Help menu.

              Comment


              • #8
                Dear Mr. Lisowski,
                thank you for your answer. I really appreciate your help.
                I am aware I can generate the "combination" lists from the original 3 dummies, it is just that I had them already in the excel sheet and thus I could import them all.
                For the regession, I will keep in mind your suggestion.

                Comment

                Working...
                X