Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets

    Hi stata experts,

    I am trying to merge two datasets. The first dataset (A) has 39,082 observations and the second (B) one has 56,521. I merge A with B. This is the result:

    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	4.8 KB
ID:	1551934

    (total number of observations 70,497)
    The dataset B contains two dummy variables and I match the two datasets based on CIK-code. After merging I want to remove the observations from dataset B that did not match in my dataset A. So essentially, I would want to have 39,082 observations in the end again.
    Does someone know what I can do to achieve this?

    Kind regards,
    Sarah

  • #2
    Your math does not add up. Do you really have 39,082 observations in the master dataset? The results suggest that you have 29,886+10,186=40,072 observations. To drop observations not matched in the using dataset

    Code:
    drop if _merge==2
    or using the option

    Code:
    merge ..., keep(match master)
    in the merge command.

    Comment


    • #3
      actually, my interpretation is a little different than Andrew Musau's: I'm guessing that you used either "merge m:1" or "merge 1:m" and that is why the numbers don't add up; you need to show us the exact code you used

      Comment


      • #4
        Hey #2 the math indeed didn't seem to add up. That's why I was so confused. I found this function through merging manually that I had to click type of merge: one to one by observation.
        This is the code it generated:
        merge 1:1 _n using "location", assert(master using match)
        keep if _merge==3

        With this command I did keep the 39,082 observations. I am just not sure how to interpret the _n that stata put in the command. Does someone know what it means?

        Comment


        • #5
          _n is the observation number.

          merge 1:1 _n using "location", assert(master using match)
          keep if _merge==3
          What the command does is to match the first observation in one dataset to the first observation in the other, the second to the second, third to third and so on. Is this what you intend? Usually, one merges based on identifiers, which could be variables common to both datasets.

          Comment


          • #6
            Oh! This is not what I meant at all! I want to merge with my identifier.

            Comment


            • #7
              I have one identifier which is a string variable that represents the fiscal year, month of the fiscal year and CIK code. It looks something like this: 2017Jun0001694741. So, in this case what type of merge command should I use in order to keep my 39,082 observations from the master dataset?

              Comment


              • #8
                Provided that the identifier uniquely identifies observations in each of the datasets, you can run the following command

                Code:
                merge 1:1 id using "location"
                where you replace "id" above with the name of the identifier (or identifiers - could be more than 1). To find out whether the identifier is unique, in each of the datasets before merging, run

                Code:
                isid id
                If you get an error message, then you need to merge 1:m or m:1. See examples in

                Code:
                help merge

                Comment


                • #9
                  When using the code isid id Stata says that the identifier does not uniquely identify the observations. Is this because it is a string variable?
                  When trying to merge 1:m and m:1 and m:m I get error messages.

                  Comment


                  • #10
                    Run

                    Code:
                    dataex in 1/50
                    in each of the datasets and then copy and paste the outputs here.

                    Comment


                    • #11
                      I have merged it the same way as in post #1. But after the merge I use these commands:

                      tostring _merge, gen(me)
                      destring me, replace
                      gen keepit=0
                      replace keepit=1 if me==1
                      replace keepit=2 if me ==3
                      drop if keepit<1

                      I end up with 29,886+10,186=40,072 observations. Still more than the 39,082. But I think this way I am still keeping the original 39,082 from dataset A right?

                      Comment


                      • #12
                        Dataset A:

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input str17 identifier0 double fyear str10 cik
                        "2014May0000001750" 2014 "0000001750"
                        "2015May0000001750" 2015 "0000001750"
                        "2016May0000001750" 2016 "0000001750"
                        "2017May0000001750" 2017 "0000001750"
                        "2018May0000001750" 2018 "0000001750"
                        "2014Dec0000006201" 2014 "0000006201"
                        "2015Dec0000006201" 2015 "0000006201"
                        "2016Dec0000006201" 2016 "0000006201"
                        "2017Dec0000006201" 2017 "0000006201"
                        "2018Dec0000006201" 2018 "0000006201"
                        "2019Dec0000006201" 2019 "0000006201"
                        "2014Dec0000003197" 2014 "0000003197"
                        "2015Dec0000003197" 2015 "0000003197"
                        "2016Dec0000003197" 2016 "0000003197"
                        "2017Dec0000003197" 2017 "0000003197"
                        "2018Dec0000003197" 2018 "0000003197"
                        "2019Dec0000003197" 2019 "0000003197"
                        "2014Nov0001230869" 2014 "0001230869"
                        "2014Nov0001230869" 2014 "0001230869"
                        "2015Nov0001230869" 2015 "0001230869"
                        "2015Nov0001230869" 2015 "0001230869"
                        "2016Nov0001230869" 2016 "0001230869"
                        "2016Nov0001230869" 2016 "0001230869"
                        "2017Nov0001230869" 2017 "0001230869"
                        "2017Nov0001230869" 2017 "0001230869"
                        "2018Nov0001230869" 2018 "0001230869"
                        "2018Nov0001230869" 2018 "0001230869"
                        "2019Nov0001230869" 2019 "0001230869"
                        "2019Nov0001230869" 2019 "0001230869"
                        "2014Dec0000764622" 2014 "0000764622"
                        "2015Dec0000764622" 2015 "0000764622"
                        "2016Dec0000764622" 2016 "0000764622"
                        "2017Dec0000764622" 2017 "0000764622"
                        "2018Dec0000764622" 2018 "0000764622"
                        "2019Dec0000764622" 2019 "0000764622"
                        "2014Dec0000706688" 2014 "0000706688"
                        "2015Dec0000706688" 2015 "0000706688"
                        "2016Dec0000706688" 2016 "0000706688"
                        "2017Dec0000706688" 2017 "0000706688"
                        "2018Dec0000706688" 2018 "0000706688"
                        "2019Dec0000706688" 2019 "0000706688"
                        "2014Dec0000001800" 2014 "0000001800"
                        "2015Dec0000001800" 2015 "0000001800"
                        "2016Dec0000001800" 2016 "0000001800"
                        "2017Dec0000001800" 2017 "0000001800"
                        "2018Dec0000001800" 2018 "0000001800"
                        "2019Dec0000001800" 2019 "0000001800"
                        "2014Dec0000001961" 2014 "0000001961"
                        "2015Dec0000001961" 2015 "0000001961"
                        "2016Dec0000001961" 2016 "0000001961"
                        "2017Dec0000001961" 2017 "0000001961"
                        "2018Dec0000001961" 2018 "0000001961"
                        "2019Dec0000001961" 2019 "0000001961"
                        "2014Jun0000002034" 2014 "0000002034"
                        "2015Jun0000002034" 2015 "0000002034"
                        "2016Jun0000002034" 2016 "0000002034"
                        "2017Jun0000002034" 2017 "0000002034"
                        "2018Jun0000002034" 2018 "0000002034"
                        "2014Dec0000002062" 2014 "0000002062"
                        "2015Dec0000002062" 2015 "0000002062"
                        "2016Dec0000002062" 2016 "0000002062"
                        "2017Dec0000002062" 2017 "0000002062"
                        "2018Dec0000002062" 2018 "0000002062"
                        "2014Dec0000002098" 2014 "0000002098"
                        "2015Dec0000002098" 2015 "0000002098"
                        "2016Dec0000002098" 2016 "0000002098"
                        "2017Dec0000002098" 2017 "0000002098"
                        "2018Dec0000002098" 2018 "0000002098"
                        "2019Dec0000002098" 2019 "0000002098"
                        "2014Dec0000002186" 2014 "0000002186"
                        "2015Dec0000002186" 2015 "0000002186"
                        "2016Dec0000002186" 2016 "0000002186"
                        "2017Dec0000002186" 2017 "0000002186"
                        "2018Dec0000002186" 2018 "0000002186"
                        "2019Dec0000002186" 2019 "0000002186"
                        "2014Dec0000002178" 2014 "0000002178"
                        "2015Dec0000002178" 2015 "0000002178"
                        "2016Dec0000002178" 2016 "0000002178"
                        "2017Dec0000002178" 2017 "0000002178"
                        "2018Dec0000002178" 2018 "0000002178"
                        "2019Dec0000002178" 2019 "0000002178"
                        "2014Dec0000002488" 2014 "0000002488"
                        "2015Dec0000002488" 2015 "0000002488"
                        "2016Dec0000002488" 2016 "0000002488"
                        "2017Dec0000002488" 2017 "0000002488"
                        "2018Dec0000002488" 2018 "0000002488"
                        "2019Dec0000002488" 2019 "0000002488"
                        "2014Dec0000351483" 2014 "0000351483"
                        "2015Dec0000351483" 2015 "0000351483"
                        "2016Dec0000351483" 2016 "0000351483"
                        "2017Dec0000351483" 2017 "0000351483"
                        "2018Dec0000351483" 2018 "0000351483"
                        "2019Dec0000351483" 2019 "0000351483"
                        "2014Dec0000002809" 2014 "0000002809"
                        "2015Dec0000002809" 2015 "0000002809"
                        "2016Dec0000002809" 2016 "0000002809"
                        "2017Dec0000002809" 2017 "0000002809"
                        "2018Dec0000002809" 2018 "0000002809"
                        "2019Dec0000002809" 2019 "0000002809"
                        "2014Sep0000002969" 2014 "0000002969"
                        end

                        Dataset B:

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input str17 identifier0 float AUCH double restatement str10 cik
                        "2013Apr0000027093" 1 0 "0000027093"
                        "2013Apr0000771856" 1 0 "0000771856"
                        "2013Apr0000870778" 1 0 "0000870778"
                        "2013Apr0000880562" 1 0 "0000880562"
                        "2013Apr0000890898" 1 0 "0000890898"
                        "2013Apr0000925583" 1 0 "0000925583"
                        "2013Apr0001060910" 1 0 "0001060910"
                        "2013Apr0001092796" 1 0 "0001092796"
                        "2013Apr0001137083" 1 . "0001137083"
                        "2013Apr0001161070" 1 0 "0001161070"
                        "2013Apr0001223026" 1 0 "0001223026"
                        "2013Apr0001314414" 1 . "0001314414"
                        "2013Apr0001365784" 1 0 "0001365784"
                        "2013Apr0001388126" 1 0 "0001388126"
                        "2013Apr0001388485" 1 0 "0001388485"
                        "2013Apr0001400728" 1 0 "0001400728"
                        "2013Apr0001412130" 1 . "0001412130"
                        "2013Apr0001414767" 1 0 "0001414767"
                        "2013Apr0001489644" 1 0 "0001489644"
                        "2013Apr0001493040" 1 0 "0001493040"
                        "2013Apr0001568628" 1 0 "0001568628"
                        "2013Apr0001579541" 1 0 "0001579541"
                        "2013Feb0000818850" 1 . "0000818850"
                        "2013Feb0000818851" 1 0 "0000818851"
                        "2013Feb0000878198" 1 0 "0000878198"
                        "2013Feb0000878200" 1 0 "0000878200"
                        "2013Feb0000878201" 1 0 "0000878201"
                        "2013Feb0000892992" 1 0 "0000892992"
                        "2013Feb0000895464" 1 0 "0000895464"
                        "2013Feb0001074952" 1 0 "0001074952"
                        "2013Feb0001084597" 1 0 "0001084597"
                        "2013Feb0001087786" 1 . "0001087786"
                        "2013Feb0001195738" 1 0 "0001195738"
                        "2013Feb0001195739" 1 . "0001195739"
                        "2013Feb0001360564" 1 0 "0001360564"
                        "2013Feb0001372184" 1 0 "0001372184"
                        "2013Feb0001402486" 1 . "0001402486"
                        "2013Feb0001404943" 1 0 "0001404943"
                        "2013Feb0001436568" 1 . "0001436568"
                        "2013Feb0001454978" 1 . "0001454978"
                        "2013Feb0001454979" 1 0 "0001454979"
                        "2013Feb0001454980" 1 . "0001454980"
                        "2013Feb0001454981" 1 . "0001454981"
                        "2013Feb0001493212" 1 0 "0001493212"
                        "2013Feb0001516079" 1 0 "0001516079"
                        "2013Feb0001517389" 1 0 "0001517389"
                        "2013Feb0001529516" 1 0 "0001529516"
                        "2013Feb0001542013" 1 0 "0001542013"
                        "2013Jan0000067215" 1 . "0000067215"
                        "2013Jan0000093456" 1 . "0000093456"
                        "2013Jan0000798081" 1 0 "0000798081"
                        "2013Jan0000798081" 1 0 "0000798081"
                        "2013Jan0000875622" 1 . "0000875622"
                        "2013Jan0000875622" 1 . "0000875622"
                        "2013Jan0001285236" 1 0 "0001285236"
                        "2013Jan0001374135" 1 . "0001374135"
                        "2013Jan0001379246" 1 . "0001379246"
                        "2013Jan0001383097" 1 0 "0001383097"
                        "2013Jan0001400683" 1 0 "0001400683"
                        "2013Jan0001464165" 1 0 "0001464165"
                        "2013Jan0001484674" 1 0 "0001484674"
                        "2013Jan0001503658" 1 0 "0001503658"
                        "2013Jan0001504167" 1 . "0001504167"
                        "2013Jan0001520047" 1 . "0001520047"
                        "2013Jan0001542918" 1 . "0001542918"
                        "2013Jan0001549872" 1 0 "0001549872"
                        "2013Jan0001561865" 1 0 "0001561865"
                        "2013Jun0000013573" 1 0 "0000013573"
                        "2013Jun0000023666" 1 . "0000023666"
                        "2013Jun0000033533" 1 0 "0000033533"
                        "2013Jun0000073605" 1 0 "0000073605"
                        "2013Jun0000276776" 1 0 "0000276776"
                        "2013Jun0000310354" 1 0 "0000310354"
                        "2013Jun0000319156" 1 0 "0000319156"
                        "2013Jun0000705318" 1 . "0000705318"
                        "2013Jun0000750199" 1 . "0000750199"
                        "2013Jun0000750199" 1 . "0000750199"
                        "2013Jun0000826326" 1 0 "0000826326"
                        "2013Jun0000835403" 1 0 "0000835403"
                        "2013Jun0000838131" 1 0 "0000838131"
                        "2013Jun0000852747" 1 1 "0000852747"
                        "2013Jun0000863061" 1 . "0000863061"
                        "2013Jun0000873540" 1 . "0000873540"
                        "2013Jun0000880406" 1 0 "0000880406"
                        "2013Jun0000895516" 1 . "0000895516"
                        "2013Jun0000898770" 1 0 "0000898770"
                        "2013Jun0000918965" 1 0 "0000918965"
                        "2013Jun0000928366" 1 . "0000928366"
                        "2013Jun0000937136" 1 1 "0000937136"
                        "2013Jun0001016951" 1 0 "0001016951"
                        "2013Jun0001016951" 1 0 "0001016951"
                        "2013Jun0001018281" 1 0 "0001018281"
                        "2013Jun0001040721" 1 0 "0001040721"
                        "2013Jun0001041177" 1 0 "0001041177"
                        "2013Jun0001077800" 1 0 "0001077800"
                        "2013Jun0001082733" 1 0 "0001082733"
                        "2013Jun0001096555" 1 0 "0001096555"
                        "2013Jun0001099290" 1 0 "0001099290"
                        "2013Jun0001127993" 1 . "0001127993"
                        "2013Jun0001262976" 1 0 "0001262976"
                        end

                        Comment


                        • #13
                          You have duplicate observations in both datasets. Focusing on B, are there any differences between these observations in terms of other variables not included in the dataex example? In other words, can we drop duplicates without losing any information?

                          Code:
                          . duplicates list identifier0 cik, sepby(identifier0)
                          
                          Duplicates in terms of identifier0 cik
                          
                            +------------------------------------------------+
                            | group:   obs:         identifier0          cik |
                            |------------------------------------------------|
                            |      1     51   2013Jan0000798081   0000798081 |
                            |      1     52   2013Jan0000798081   0000798081 |
                            |------------------------------------------------|
                            |      2     53   2013Jan0000875622   0000875622 |
                            |      2     54   2013Jan0000875622   0000875622 |
                            |------------------------------------------------|
                            |      3     76   2013Jun0000750199   0000750199 |
                            |      3     77   2013Jun0000750199   0000750199 |
                            |------------------------------------------------|
                            |      4     90   2013Jun0001016951   0001016951 |
                            |      4     91   2013Jun0001016951   0001016951 |
                            +------------------------------------------------+

                          Comment


                          • #14
                            I have now removed duplicates from both datasets.

                            Dataset A:
                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input double fyear str17 identifier0 str10 cik
                            2014 "2014May0000001750" "0000001750"
                            2015 "2015May0000001750" "0000001750"
                            2016 "2016May0000001750" "0000001750"
                            2017 "2017May0000001750" "0000001750"
                            2018 "2018May0000001750" "0000001750"
                            2014 "2014Dec0000001800" "0000001800"
                            2015 "2015Dec0000001800" "0000001800"
                            2016 "2016Dec0000001800" "0000001800"
                            2017 "2017Dec0000001800" "0000001800"
                            2018 "2018Dec0000001800" "0000001800"
                            2019 "2019Dec0000001800" "0000001800"
                            2014 "2014Dec0000001961" "0000001961"
                            2015 "2015Dec0000001961" "0000001961"
                            2016 "2016Dec0000001961" "0000001961"
                            2017 "2017Dec0000001961" "0000001961"
                            2018 "2018Dec0000001961" "0000001961"
                            2019 "2019Dec0000001961" "0000001961"
                            2014 "2014Jun0000002034" "0000002034"
                            2015 "2015Jun0000002034" "0000002034"
                            2016 "2016Jun0000002034" "0000002034"
                            2017 "2017Jun0000002034" "0000002034"
                            2018 "2018Jun0000002034" "0000002034"
                            2014 "2014Dec0000002062" "0000002062"
                            2015 "2015Dec0000002062" "0000002062"
                            2016 "2016Dec0000002062" "0000002062"
                            2017 "2017Dec0000002062" "0000002062"
                            2018 "2018Dec0000002062" "0000002062"
                            2014 "2014Dec0000002098" "0000002098"
                            2015 "2015Dec0000002098" "0000002098"
                            2016 "2016Dec0000002098" "0000002098"
                            2017 "2017Dec0000002098" "0000002098"
                            2018 "2018Dec0000002098" "0000002098"
                            2019 "2019Dec0000002098" "0000002098"
                            2014 "2014Dec0000002178" "0000002178"
                            2015 "2015Dec0000002178" "0000002178"
                            2016 "2016Dec0000002178" "0000002178"
                            2017 "2017Dec0000002178" "0000002178"
                            2018 "2018Dec0000002178" "0000002178"
                            2019 "2019Dec0000002178" "0000002178"
                            2014 "2014Dec0000002186" "0000002186"
                            2015 "2015Dec0000002186" "0000002186"
                            2016 "2016Dec0000002186" "0000002186"
                            2017 "2017Dec0000002186" "0000002186"
                            2018 "2018Dec0000002186" "0000002186"
                            2019 "2019Dec0000002186" "0000002186"
                            2014 "2014Dec0000002488" "0000002488"
                            2015 "2015Dec0000002488" "0000002488"
                            2016 "2016Dec0000002488" "0000002488"
                            2017 "2017Dec0000002488" "0000002488"
                            2018 "2018Dec0000002488" "0000002488"
                            2019 "2019Dec0000002488" "0000002488"
                            2014 "2014Dec0000002809" "0000002809"
                            2015 "2015Dec0000002809" "0000002809"
                            2016 "2016Dec0000002809" "0000002809"
                            2017 "2017Dec0000002809" "0000002809"
                            2018 "2018Dec0000002809" "0000002809"
                            2019 "2019Dec0000002809" "0000002809"
                            2014 "2014Sep0000002969" "0000002969"
                            2015 "2015Sep0000002969" "0000002969"
                            2016 "2016Sep0000002969" "0000002969"
                            2017 "2017Sep0000002969" "0000002969"
                            2018 "2018Sep0000002969" "0000002969"
                            2019 "2019Sep0000002969" "0000002969"
                            2014 "2014Dec0000003116" "0000003116"
                            2015 "2015Dec0000003116" "0000003116"
                            2016 "2016Dec0000003116" "0000003116"
                            2017 "2017Dec0000003116" "0000003116"
                            2018 "2018Dec0000003116" "0000003116"
                            2019 "2019Dec0000003116" "0000003116"
                            2014 "2014Dec0000003124" "0000003124"
                            2015 "2015Dec0000003124" "0000003124"
                            2016 "2016Dec0000003124" "0000003124"
                            2017 "2017Dec0000003124" "0000003124"
                            2018 "2018Dec0000003124" "0000003124"
                            2019 "2019Dec0000003124" "0000003124"
                            2014 "2014Sep0000003146" "0000003146"
                            2015 "2015Sep0000003146" "0000003146"
                            2016 "2016Sep0000003146" "0000003146"
                            2017 "2017Sep0000003146" "0000003146"
                            2018 "2018Sep0000003146" "0000003146"
                            2019 "2019Sep0000003146" "0000003146"
                            2014 "2014Dec0000003153" "0000003153"
                            2015 "2015Dec0000003153" "0000003153"
                            2016 "2016Dec0000003153" "0000003153"
                            2017 "2017Dec0000003153" "0000003153"
                            2018 "2018Dec0000003153" "0000003153"
                            2019 "2019Dec0000003153" "0000003153"
                            2014 "2014Dec0000003197" "0000003197"
                            2015 "2015Dec0000003197" "0000003197"
                            2016 "2016Dec0000003197" "0000003197"
                            2017 "2017Dec0000003197" "0000003197"
                            2018 "2018Dec0000003197" "0000003197"
                            2019 "2019Dec0000003197" "0000003197"
                            2017 "2017Dec0000003270" "0000003270"
                            2018 "2018Dec0000003270" "0000003270"
                            2014 "2014Dec0000003453" "0000003453"
                            2015 "2015Dec0000003453" "0000003453"
                            2016 "2016Dec0000003453" "0000003453"
                            2017 "2017Dec0000003453" "0000003453"
                            2018 "2018Dec0000003453" "0000003453"
                            end

                            Dataset B:

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input float AUCH double restatement str17 identifier0 str10 cik
                            1 0 "2013Apr0000027093" "0000027093"
                            1 0 "2013Apr0000771856" "0000771856"
                            1 0 "2013Apr0000870778" "0000870778"
                            1 0 "2013Apr0000880562" "0000880562"
                            1 0 "2013Apr0000890898" "0000890898"
                            1 0 "2013Apr0000925583" "0000925583"
                            1 0 "2013Apr0001060910" "0001060910"
                            1 0 "2013Apr0001092796" "0001092796"
                            1 0 "2013Apr0001137083" "0001137083"
                            1 0 "2013Apr0001161070" "0001161070"
                            1 0 "2013Apr0001223026" "0001223026"
                            1 0 "2013Apr0001314414" "0001314414"
                            1 0 "2013Apr0001365784" "0001365784"
                            1 0 "2013Apr0001388126" "0001388126"
                            1 0 "2013Apr0001388485" "0001388485"
                            1 0 "2013Apr0001400728" "0001400728"
                            1 0 "2013Apr0001412130" "0001412130"
                            1 0 "2013Apr0001414767" "0001414767"
                            1 0 "2013Apr0001489644" "0001489644"
                            1 0 "2013Apr0001493040" "0001493040"
                            1 0 "2013Apr0001568628" "0001568628"
                            1 0 "2013Apr0001579541" "0001579541"
                            1 0 "2013Feb0000818850" "0000818850"
                            1 0 "2013Feb0000878198" "0000878198"
                            1 0 "2013Feb0000892992" "0000892992"
                            1 0 "2013Feb0000895464" "0000895464"
                            1 0 "2013Feb0001074952" "0001074952"
                            1 0 "2013Feb0001084597" "0001084597"
                            1 0 "2013Feb0001087786" "0001087786"
                            1 0 "2013Feb0001195738" "0001195738"
                            1 0 "2013Feb0001360564" "0001360564"
                            1 0 "2013Feb0001402486" "0001402486"
                            1 0 "2013Feb0001404943" "0001404943"
                            1 0 "2013Feb0001436568" "0001436568"
                            1 0 "2013Feb0001454978" "0001454978"
                            1 0 "2013Feb0001493212" "0001493212"
                            1 0 "2013Feb0001516079" "0001516079"
                            1 0 "2013Feb0001517389" "0001517389"
                            1 0 "2013Feb0001529516" "0001529516"
                            1 0 "2013Feb0001542013" "0001542013"
                            1 0 "2013Jan0000067215" "0000067215"
                            1 0 "2013Jan0000093456" "0000093456"
                            1 0 "2013Jan0000798081" "0000798081"
                            1 0 "2013Jan0000875622" "0000875622"
                            1 0 "2013Jan0001285236" "0001285236"
                            1 0 "2013Jan0001374135" "0001374135"
                            1 0 "2013Jan0001379246" "0001379246"
                            1 0 "2013Jan0001383097" "0001383097"
                            1 0 "2013Jan0001400683" "0001400683"
                            1 0 "2013Jan0001464165" "0001464165"
                            1 0 "2013Jan0001484674" "0001484674"
                            1 0 "2013Jan0001503658" "0001503658"
                            1 0 "2013Jan0001504167" "0001504167"
                            1 0 "2013Jan0001520047" "0001520047"
                            1 0 "2013Jan0001542918" "0001542918"
                            1 0 "2013Jan0001549872" "0001549872"
                            1 0 "2013Jan0001561865" "0001561865"
                            1 0 "2013Jun0000013573" "0000013573"
                            1 0 "2013Jun0000023666" "0000023666"
                            1 0 "2013Jun0000073605" "0000073605"
                            1 0 "2013Jun0000276776" "0000276776"
                            1 0 "2013Jun0000310354" "0000310354"
                            1 0 "2013Jun0000319156" "0000319156"
                            1 0 "2013Jun0000705318" "0000705318"
                            1 0 "2013Jun0000750199" "0000750199"
                            1 0 "2013Jun0000826326" "0000826326"
                            1 0 "2013Jun0000835403" "0000835403"
                            1 0 "2013Jun0000838131" "0000838131"
                            1 0 "2013Jun0000863061" "0000863061"
                            1 0 "2013Jun0000873540" "0000873540"
                            1 0 "2013Jun0000880406" "0000880406"
                            1 0 "2013Jun0000895516" "0000895516"
                            1 0 "2013Jun0000898770" "0000898770"
                            1 0 "2013Jun0000918965" "0000918965"
                            1 0 "2013Jun0000928366" "0000928366"
                            1 1 "2013Jun0000937136" "0000937136"
                            1 0 "2013Jun0001016951" "0001016951"
                            1 0 "2013Jun0001018281" "0001018281"
                            1 0 "2013Jun0001040721" "0001040721"
                            1 0 "2013Jun0001041177" "0001041177"
                            1 0 "2013Jun0001077800" "0001077800"
                            1 0 "2013Jun0001096555" "0001096555"
                            1 0 "2013Jun0001099290" "0001099290"
                            1 0 "2013Jun0001127993" "0001127993"
                            1 0 "2013Jun0001262976" "0001262976"
                            1 0 "2013Jun0001301075" "0001301075"
                            1 0 "2013Jun0001304730" "0001304730"
                            1 0 "2013Jun0001317880" "0001317880"
                            1 0 "2013Jun0001329136" "0001329136"
                            1 0 "2013Jun0001332572" "0001332572"
                            1 0 "2013Jun0001335493" "0001335493"
                            1 0 "2013Jun0001343719" "0001343719"
                            1 0 "2013Jun0001353406" "0001353406"
                            1 0 "2013Jun0001363573" "0001363573"
                            1 0 "2013Jun0001368275" "0001368275"
                            1 0 "2013Jun0001372620" "0001372620"
                            1 0 "2013Jun0001375063" "0001375063"
                            1 0 "2013Jun0001379006" "0001379006"
                            1 0 "2013Jun0001387135" "0001387135"
                            1 0 "2013Jun0001398137" "0001398137"
                            end

                            Comment


                            • #15
                              I think the merge 1:1 worked now:
                              Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	4.8 KB
ID:	1552146

                              Dataset A had 33,468 observations before the merge and this is exactly 30,790+2,678.
                              So, when I do this command I should have all the 33,468 observations from dataset A right?:
                              tostring _merge, gen(me)
                              destring me, replace
                              gen keepit=0
                              replace keepit=1 if me==1
                              replace keepit=2 if me ==3
                              drop if keepit<1

                              Comment

                              Working...
                              X