Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge data1 to data2 between certain dates

    Dear Stata Users,

    I need to merge data1 to data2 such that data1 is merged to data2 between two date ("start" and "end"). For this I open data2 and use the following command:

    Code:
    rangejoin data1_date start end using "D:\Research\date1.dta", by(lenderid)
    My question is whether what I am doing is right and how to keep only merged observations? Please, help me with this issue.

    Best regards,
    Alberto

  • #2
    I need to merge data1 to data2 such that data1 is merged to data2 between two date ("start" and "end").
    This description is too vague to determine what you want.

    What your -rangejoin- command does is this: it pairs up each observation in data2 with any observations in date1 that have the same value of variable lenderid and where the variable data1_date in dataset data2 falls between the values of variables start and end in dataset data1. Is that what you wanted? If not, give a clear and complete description of what you did want.

    As for eliminating observations that did not match, just -drop- them after the -rangejoin-. You can identify them because they will have all missing values on all variables that appear in data1.dta but not in data2.dta. (For variables other than lenderid that occur in both data1 and data2, -rangejoin- puts a suffix _U on the variable that comes from data1.dta, and these will also have missing values for unmatched observations.)

    Comment


    • #3
      Thank you for the reply! I proceeded exactly as you mentioned. However, what I find in my data is that after I run "rangejoin" command specified above it merges almost all "lenderid", but that should not be the case as there different numbers of "lenderid" in data1 and data2. As as example I attach both datasets here (I did not use -dataex- as it limits the number of obs and I wanted to present the whole sample). Can you, please help me to merge "data1" obs to "data2" so that "data1" lies between "start" and "end" from "data2"?

      Thank you!
      Attached Files

      Comment


      • #4
        We do not need to see all of your data as csv, and csv data is not helpful.

        Show, using dataex, small subsets of data2 and data1 which when joined using your rangejoin command do not give the results you anticipate. Then we can look at those results and try to understand and answer your question.

        Comment


        • #5
          Quoted from a subsequent posting of this same topic, please continue the discussion here.

          The thing is that when I use small subsamples the results do hold. However, once I use the whole the sample I encounter the issue stated above.
          Then when you use the whole sample, find examples of results that you do not anticipate, and find the input that would have been joined in data2 and data1 and post those along with the results. Note that you can use an if clause with dataex to select particular observations from your datasets.
          Last edited by William Lisowski; 12 Mar 2019, 08:48.

          Comment

          Working...
          X