Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata 13 - merge 2 files using date

    Hi all,

    I am trying to merge two files by date, Dataset - 1 that has multiple identical dates, and Dataset 2 that has single instances dates with variables, that will need to be assigned multiple times. Essentially I am using the date as an identifier, because the different companies will experience the factor I am trying to merge identically and so no differentiation is required on a company basis. I have searched this forum and found similar threads but not one that directly answered my question.

    The merge m:m function yields no matches, and if helpful gives the below. All the other combinations m:1 etc result in errors. A simple version of what I am trying to achieve has been inserted below.

    As an aside I am not allowed to match these up outside STATA in excel for instance which I'm aware would be much, much easier.

    Result # of obs.
    -----------------------------------------
    not matched 554,625
    from master 554,427 (_merge==1)
    from using 198 (_merge==2)

    matched 0 (_merge==3)
    Dataset 1 Dataset 2 Merge
    Date Company Date Macro Factor Date Company Macro Factor 1
    30/09/1965 Company A 30/09/1965 3 30/09/1965 Company A 3
    30/09/1965 Company B 31/12/1965 5 30/09/1965 Company B 3
    31/12/1965 Company C 31/12/1965 Company C 5
    Thanks in advance,

    Toby
    Last edited by Toby Jones; 19 Apr 2018, 09:59.

  • #2
    What you need based on your description and what you show is
    Code:
    use dataset1, clear
    merge m:1 Date using datsaet2
    You seem to imply that you tried that but got errors. If that is the case, then you need to show the exact -merge m:1- command you tried and the exact output that you got from Stata. We can't troubleshoot imaginary code and imaginary output.

    The -merge m:m- command (not function) is a trap for the unwary. It produces useless gibberish in nearly all circumstances. Many of us have requested that StataCorp eliminate it altogether because it serves almost no useful purpose. I have used Stata on a daily basis since 1994 and in that entire time I have only encountered one situation where -merge m:m- would have produced a sensible, useful result. Even in that situation, however, there was a better way to get the same result. It seems unlikely that StataCorp will, in fact, eliminate -merge m:m-. So you would do well to erase its existence from your mind.

    As an aside I am not allowed to match these up outside STATA in excel for instance which I'm aware would be much, much easier.
    "Easier" is in the eye of the beholder. In my opinion it is much, much easier to do this kind of operation in Stata.

    Be that as it may, you should never do any data management or analysis in Excel, unless it is just "for fun." If you plan to use or rely on results, or have others do so, then you need an audit trail of everything that you do from start to finish. Excel is inherently incapable of providing that. Excel is fine as a data container to share data among people who do not share a common statistical package. And it is also OK for human-readable displays of results. But it is simply an inadequate tool for data management and analysis. Not only should you not use it that way yourself, you should refuse to accept work done in Excel from anyone else.

    Comment


    • #3
      Welcome to Statalist.

      Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

      Rather than Excel pictures, it would be useful to have samples of your data as it appears in Stata. Please read FAQ #12 and help those whose help you seek by posting example data using the dataex command. If you are running Stata 15.1 or later, it is already installed. For earlier versions of Stata, install dataex by typing ssc install dataex. Type help dataex to read the simple instructions for using it. Using dataex will enable those who want to help you to quickly and easily create a 100% faithful replica of your situation to test their ideas and code on.

      Show some data from each of your datasets, and be sure that some of the dates in the sample data for your first dataset appear in the sample data for your second dataset, so that the correct technique can be demonstrated.

      The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

      Comment


      • #4
        Hi Clyde,

        Thanks a lot for your response. I mean to type that I received zero matches also, as below.

        use "Dataset1", clear

        . merge m:1 date using "Dataset2"

        Result # of obs.
        -----------------------------------------
        not matched 554,625
        from master 554,427 (_merge==1)
        from using 198 (_merge==2)

        matched 0 (_merge==3)
        -----------------------------------------

        Can it be the date var? After converting the "date" var in both files to double I assumed this would mean that STATA views them the same, however I see that they are in different formats (please correct my stata semantics here if necessary).

        Dataset 1 dataex:

        input double date
        9.544608e+11
        9.544608e+11

        Dataset 2 dataex:

        clear
        input double date
        2007
        2099
        2191
        2281
        2372

        Current merge result dataex:

        clear
        input double(date Var1)
        9.544608e+11 .
        9.544608e+11 .
        9.544608e+11 .
        9.544608e+11 .
        9.544608e+11 .

        Many thanks.

        Comment


        • #5
          Clearly, your date variable is not the same in the two data sets, and that is what you need to fix.

          The numbers in the first data set make some sense to me. For example 9.544608e+11 is the Stata internal format clock value for 31mar1990 at midnight.

          The origin of the numbers in dataset2 is less clear. At first they look like years, but I doubt you are working with data from hundreds of years into the future. I suspect that these numbers are Stata internal format dates, because, for example, 2372 is the value for 30jun1966.

          Here's a hunch:

          Code:
          use dataset1
          rename date date_time
          format date_time %tc
          gen date = dofc(date_time)
          format date %td
          merge m:1 date using dataset2

          Comment


          • #6
            That looks absolutely perfect ran first time. Thanks a lot Clyde, and for the heads up on the m:m command, erase it I shall. And thank you William I've installed dataex.

            Toby

            Comment

            Working...
            X