Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Really need help,error in 1:m and m:1 when merge two datasets

    I need to merge file rst and file ID_mc together, key variable is mother's ID----IDind_m + wave,but 1:m and m:1 don't work,what should I do? I have been buffled by this problem several days, can't solve this, really wish help.
    Code:
    use rst1,clear
    (Written by R.              )
    
    . merge m:1 IDind_m wave using ID_mc1
    variables IDind_m wave do not uniquely identify observations in the using data
    r(459);
    merge 1:m IDind_m wave using ID_mc1
    variables IDind_m wave do not uniquely identify observations in the master data
    r(459);
    The first file is:
    dataex IDind IDind_m wave in 1/3

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(IDind IDind_m wave)
    111101011003 111101011002 2011
    111101012003 111101012002 2011
    111101013003 111101013002 2011
    end
    ------------------ copy up to and including the previous line ------------------
    The second file is:
    . dataex in 1/5

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(IDind_m IDind_c wave hhid_m hhid_c commid t1 t2)
    111101011002 111101011003 2011 111101011 111101011 111101 11 1
    111101012002 111101012003 2011 111101012 111101012 111101 11 1
    111101013002 111101013003 2011 111101013 111101013 111101 11 1
    111101014001 111101014003 2011 111101014 111101014 111101 11 1
    111101015002 111101015003 2011 111101015 111101015 111101 11 1
    end
    ------------------ copy up to and including the previous line ------------------



  • #2
    Spacey:
    like in your previous post (http://www.statalist.org/forums/foru...f-merging-data), the main reason of your problem is a suboptimal approach in creating -id- for your different variables.
    That said, you should consider using a different variable (or more than one variable) for -merge-.
    The mandatory advice, however, is to stay away from -merge m:m-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Spacey:
      like in your previous post (http://www.statalist.org/forums/foru...f-merging-data), the main reason of your problem is a suboptimal approach in creating -id- for your different variables.
      That said, you should consider using a different variable (or more than one variable) for -merge-.
      The mandatory advice, however, is to stay away from -merge m:m-.
      Thanks so much.
      In ID_mc:
      Code:
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
           IDind_m |     59,848    4.00e+11    9.27e+10   1.11e+11   5.52e+11
           IDind_c |     59,848    4.00e+11    9.27e+10   1.11e+11   5.52e+11
              wave |     59,848    2002.596    6.992698       1991       2011
            hhid_m |     59,848    4.00e+08    9.27e+07   1.11e+08   5.52e+08
            hhid_c |     59,848    4.00e+08    9.27e+07   1.11e+08   5.52e+08
      -------------+---------------------------------------------------------
            commid |     59,848    399585.8    92687.17     111101     552304
                t1 |     59,848    39.75966    9.267507         11         55
                t2 |     59,848     1.75802    .4282856          1          2
      In rst:
      Code:
       sum
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
             IDind |    156,645    3.92e+11    9.46e+10   1.11e+11   5.52e+11
                a5 |    151,205    2.144466    1.934509          0         10
              wave |    156,645    2000.816    7.683592       1989       2011
              a5a1 |     29,992    1.019772    .1819866          0          3
              a5c1 |     32,531    1.018137    .1843308          0          9
      -------------+---------------------------------------------------------
             adult |     76,720    .5573514    .4967032          0          1
             child |     76,721    .1137759    .3175409          0          1
              hhid |    156,645    3.92e+08    9.46e+07   1.11e+08   5.52e+08
            commid |    156,645    392238.3    94636.55     111101     552304
                t1 |    156,645    39.03139    9.461549         11         55
      -------------+---------------------------------------------------------
                t2 |    156,645    1.700533     .458026          1          2
           IDind_f |     35,555    3.97e+11    9.18e+10   1.11e+11   5.52e+11
           IDind_m |     38,601    3.98e+11    9.18e+10   1.11e+11   5.52e+11
           IDind_s |     66,118    3.79e+11    9.89e+10   1.11e+11   5.52e+11
      SO, I have to use IDind_m + wave as key, I'm really confused,if this doesn't work, I can't go on the rest steps.
      I won't use m:m, it's absolutely wrong. Is there any other methods to merge these two datasets?
      Sigh.

      Comment


      • #4
        Spacey:
        try use -IDind_m + wave- as match variiables.
        You should also consider with -_merge- option you need for your analysis (see -help merge-, -merge result table-).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Spacey:
          try use -IDind_m + wave- as match variiables.
          You should also consider with -_merge- option you need for your analysis (see -help merge-, -merge result table-).
          Yes, I use IDind_m + wave from the very beginning, try both 1:m and m:1......I merge other files have no problem....what should I do.....

          Comment


          • #6
            Spacey:
            as a last resort, you may want considering splitting your datasets by -wave-, -merge- them year per year and then -append-.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Spacey: I can't help here beyond Carlo's excellent advice.

              I want to underline something different. Wording such as "really need help" in title or post is not a good idea. Do you suppose that the other people don't need help or are less deserving? I don't imagine you do, but there is no ordering or priority here because people mark their cases as desperate or urgent.

              If anything, more people will ignore your question as from someone not understanding how forums work.

              We need concise and clear questions without special pleading.

              Sorry if this appears brutal or insensitive, but it's vital to successful continued participation on Statalist that you understand this.

              Comment


              • #8
                Let me add advice you may have been given already in other threads.

                Stata is telling you exactly what the problem is - in both of your datasets you have at least one case where two or more observations have the same pair of values for IDind_m and wave. We don't see these in the small sample of your data that you show, but Stata tells you they exist.

                Successful merging requires that for one of your two files, there is only 1 observation for any given pair of values for IDind_m and wave, while for the other file there can be multiple observations with any given combination.

                That you tried both merge 1:m and merge m:1, and furthermore that they both failed, suggests you do not understand what your files contain.

                Your next step is to better understand your data. Use of the duplicates command will help you understand how big the problem is in each dataset, and will allow you to view examples.

                Comment


                • #9
                  Originally posted by William Lisowski View Post
                  Let me add advice you may have been given already in other threads.

                  Stata is telling you exactly what the problem is - in both of your datasets you have at least one case where two or more observations have the same pair of values for IDind_m and wave. We don't see these in the small sample of your data that you show, but Stata tells you they exist.

                  Successful merging requires that for one of your two files, there is only 1 observation for any given pair of values for IDind_m and wave, while for the other file there can be multiple observations with any given combination.

                  That you tried both merge 1:m and merge m:1, and furthermore that they both failed, suggests you do not understand what your files contain.

                  Your next step is to better understand your data. Use of the duplicates command will help you understand how big the problem is in each dataset, and will allow you to view examples.
                  Thanks for your thread, I check duplicates in the ID_mc file:
                  Code:
                   bys IDind_m wave:  gen dup2 = cond(_N==1,0,_n)
                  
                  . tabulate dup2
                  
                         dup2 |      Freq.     Percent        Cum.
                  ------------+-----------------------------------
                            0 |     11,777       19.68       19.68
                            1 |     17,722       29.61       49.29
                            2 |     17,722       29.61       78.90
                            3 |      7,787       13.01       91.91
                            4 |      3,022        5.05       96.96
                            5 |      1,149        1.92       98.88
                            6 |        457        0.76       99.65
                            7 |        130        0.22       99.86
                            8 |         51        0.09       99.95
                            9 |         31        0.05      100.00
                  ------------+-----------------------------------
                        Total |     59,848      100.00
                  There are duplicates of mother's IDind, I think the reason is one mother maybe have many children,and the second reason is that she may take part in the survey for several years.
                  SO, how to solve this and to merge it with the master file? Many thanks.
                  Last edited by Spacey Shi; 20 Mar 2017, 20:10.

                  Comment


                  • #10
                    Originally posted by Carlo Lazzaro View Post
                    Spacey:
                    as a last resort, you may want considering splitting your datasets by -wave-, -merge- them year per year and then -append-.
                    Thanks.
                    I check duplicates of ID_mc file:
                    Code:
                    bys IDind_m wave: gen dup2 = cond(_N==1,0,_n)
                    
                    . tabulate dup2
                    
                    dup2 | Freq. Percent Cum.
                    ------------+-----------------------------------
                    0 | 11,777 19.68 19.68
                    1 | 17,722 29.61 49.29
                    2 | 17,722 29.61 78.90
                    3 | 7,787 13.01 91.91
                    4 | 3,022 5.05 96.96
                    5 | 1,149 1.92 98.88
                    6 | 457 0.76 99.65
                    7 | 130 0.22 99.86
                    8 | 51 0.09 99.95
                    9 | 31 0.05 100.00
                    ------------+-----------------------------------
                    Total | 59,848 100.00
                    And I don't understand how to split datasets by -wave-, -merge- them year per year and then -append-, can I have a small example of codes?
                    Many thanks.
                    Last edited by Spacey Shi; 20 Mar 2017, 20:11.

                    Comment


                    • #11
                      Originally posted by Spacey Shi View Post
                      . . . one mother maybe have many children,and the second reason is that she may take part in the survey for several years. SO, how to solve this and to merge it with the master file?
                      Try something like this:
                      Code:
                      use ID_mc1, clear
                      
                      rename IDind_c IDind // The name of the variable in rst1
                      
                      isid IDind_m IDind wave // confirmation of your assumption
                      
                      tempfile ID_mc1
                      quietly save `ID_mc1' // Prevent accidental changes in original
                      
                      // Now
                      use rst1
                      
                      isid IDind_m IDind wave
                      
                      merge 1:1 IDind_m IDind wave using `ID_mc1'

                      Comment


                      • #12
                        Originally posted by Joseph Coveney View Post

                        Try something like this:
                        Code:
                        use ID_mc1, clear
                        
                        rename IDind_c IDind // The name of the variable in rst1
                        
                        isid IDind_m IDind wave // confirmation of your assumption
                        
                        tempfile ID_mc1
                        quietly save `ID_mc1' // Prevent accidental changes in original
                        
                        // Now
                        use rst1
                        
                        isid IDind_m IDind wave
                        
                        merge 1:1 IDind_m IDind wave using `ID_mc1'
                        Thanks!
                        It works, but I don't understand what's the IDind contained now, and I need the IDind_c display in the merged file,but it isn't there.
                        Code:
                        . rename IDind_c IDind
                        
                        . isid IDind_m IDind wave
                        
                        . tempfile ID_mc1
                        
                        . quietly save `ID_mc1'
                        
                        . use rst1
                        (Written by R.              )
                        
                        . isid IDind_m IDind wave
                        variables IDind_m IDind wave should never be missing
                        r(459);
                        
                         merge 1:1 IDind_m IDind wave using `ID_mc1'
                        
                            Result                           # of obs.
                            -----------------------------------------
                            not matched                       161,925
                                from master                   129,361  (_merge==1)
                                from using                     32,564  (_merge==2)
                        
                            matched                            27,284  (_merge==3)
                            -----------------------------------------
                        
                        . sum
                        
                            Variable |        Obs        Mean    Std. Dev.       Min        Max
                        -------------+---------------------------------------------------------
                               IDind |    189,209    3.94e+11    9.42e+10   1.11e+11   5.52e+11
                                  a5 |    151,205    2.144466    1.934509          0         10
                                wave |    189,209    2001.533    7.558792       1989       2011
                                a5a1 |     29,992    1.019772    .1819866          0          3
                                a5c1 |     32,531    1.018137    .1843308          0          9
                        -------------+---------------------------------------------------------
                               adult |     76,720    .5573514    .4967032          0          1
                               child |     76,721    .1137759    .3175409          0          1
                                hhid |    156,645    3.92e+08    9.46e+07   1.11e+08   5.52e+08
                              commid |    189,209    394099.5    94227.56     111101     552304
                                  t1 |    189,209    39.21578    9.420671         11         55
                        -------------+---------------------------------------------------------
                                  t2 |    189,209     1.71624    .4508231          1          2
                             IDind_f |     35,555    3.97e+11    9.18e+10   1.11e+11   5.52e+11
                             IDind_m |     71,165    4.00e+11    9.18e+10   1.11e+11   5.52e+11
                             IDind_s |     66,118    3.79e+11    9.89e+10   1.11e+11   5.52e+11
                              hhid_m |     59,848    4.00e+08    9.27e+07   1.11e+08   5.52e+08
                        -------------+---------------------------------------------------------
                              hhid_c |     59,848    4.00e+08    9.27e+07   1.11e+08   5.52e+08
                              _merge |    189,209    1.460507    .7326962          1          3
                        
                        . tsset IDind wave
                        repeated time values within panel
                        r(451);
                        And I find the problem in the ID_mc file, like this picture:
                        Click image for larger version

Name:	bug.png
Views:	1
Size:	42.6 KB
ID:	1379369



                        The same mother has two children, so mother's IDind_m appears twice, induce the problem, and I can't figure out how to solve this
                        .
                        Last edited by Spacey Shi; 20 Mar 2017, 20:54.

                        Comment


                        • #13
                          Originally posted by Spacey Shi View Post
                          It works, but I don't understand what's the IDind contained now, and I need the IDind_c display in the merged file,but it isn't there.
                          Code:
                          rename IDind Dind_c
                          (If you wanted the children's ID variable to be named IDind_c, then you could have forgone the renaming in the using dataset and just renamed the variable in the master.)

                          Originally posted by Spacey Shi View Post
                          The same mother has two children, so mother's IDind_m appears twice, induce the problem, and I can't figure out how to solve this
                          What you need to do depends upon what you want to accomplish. You'll probably want to think about that first or ask whoever it was who asked you to do this analysis for guidance—the advice might be as simple as
                          Code:
                          bysort IDind_m wave (IDind_c): keep if _n == 1
                          to get a single observation for the mother's ID per wave, I don't know.

                          In the meantime, you've got serious problems with your data that you need to deal with: look at the error message from the isid command.

                          Comment

                          Working...
                          X