Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • variable date does not uniquely identify observations in the master data stata

    Hai
    I was trying to merge two data sets using "date variable" as common. One of the data sets is a panel data with multiple observations for a day and the another have weather observations for each day and some days there are no observations. When I tried to merge these two datasets, I got the error message "variable date does not uniquely identify observations in the master data stata".

    My master dataset looks like:
    date landwk dealr quantity price spp
    1-Jan-07 2007w1 10 40 1 5
    1-Jan-07 2007w1 10 583.1 0.840336 4
    1-Jan-07 2007w1 10 31.57143 1.12 1
    1-Jan-07 2007w1 10 1.565217 0.69 1
    2-Jan-07 2007w1 13 20.52174 0.7475 1
    2-Jan-07 2007w1 13 35.42857 1.12 1
    2-Jan-07 2007w1 10 86.57143 1.12 1
    2-Jan-07 2007w1 13 189.7143 1.12 1
    2-Jan-07 2007w1 12 0 0
    and my second dataset looks like:
    date wtempQuon atempQuon wspeedQuon wgustQuon
    1/1/2007 6.411864 7.785593 6.168644 8.516949
    1/10/2007 6.682353 NA 3.169608 5.35098
    1/11/2007 6.438938 NA 8.125664 11.41416
    1/12/2007 6.578302 8.581132 5.318868 7.106604
    1/13/2007 NA NA NA NA
    1/14/2007 6.366667 NA 99 NA
    1/15/2007 NA NA NA NA
    1/16/2007 NA 4.532353 6.720588 11.21765
    1/17/2007 NA NA 6.668182 10.91818
    1/18/2007 4.647748 -7.18468 5.54955 8.818919
    1/19/2007 4.663077 -3.34615 2.7 3.850769
    Can you please help me to merge these two datasets?
    Thanks in advance

  • #2
    What was the command you used? I think it is a many to 1 merge. Something like

    use masterdata
    merge m:1 date using weatherdata
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thanks Richard, it worked. I tried 1:1 merging and 1:m merging, but somehow I did not try m:1 merging.

      Comment


      • #4
        Hi, I have a similar problem. I'm trying to merge two datasets with daily observations from 01/01/1990 to 31/12/2015. I checked the length of the datasets and it's exactly the same. Of course the merging variable (date) is also equal for both. I get the error code r(459) "variable date does not uniquely identify observations in the using data". I already have 1447 variables and I want to merge other 625 variables, all downloaded from Datastream, so I have not manipulated them. Before you ask I'm using Stata MP, so if I'm not wrong I should be allowed to work with more than 32000 variables.

        I'm struggling because I have not idea what could be the problem.

        Thanks for any help

        Stefano

        Comment


        • #5
          If either dataset is panel data, then you need to merge on an identifier too. Otherwise, use duplicates to find out about duplicates and/or see e.g.

          http://www.stata.com/support/faqs/da...ues/index.html

          Comment


          • #6
            Thanks Nick,
            I tried a "dangerous" solution. I created a new variable in both datasets that allowed me to merge. However, 2 items are not matched, even though in the Excel file the length of the columns is equal. Then I deleted the last two rows, since "apparently" the dataset is the same.

            Thanks

            Stefano

            Comment


            • #7
              Stefano:
              the 2 non-matched observations may alternatively belong to:file in memory (master; _merge==1) ;file on disk (using file; _merge==2); eventually, 1 for each file.
              It would be interesting to check whether what you got from your "dangerous" solution mirrors the goal of your -merge- procedure (I've some bad surprises in the past).
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8

                Comment


                • #9
                  Stefano, there's a reason solutions like that are called "dangerous." If the two data sets are supposed to match up exactly and Stata says they don't (I've never known Stata to be wrong when it says this) then either something is wrong with the data or something is wrong with your expectations about the data. In either case, sweeping the problem under the rug is likely to lead to trouble later in your work with the data. If you are lucky, at some point you will get some obviously ridiculous result which will force you to go back and solve the data problem and then rerun everything. You will have wasted a lot of time doing rework, but you will be fine in the end. On the other hand, if you are unlucky, you will go along blithely unaware of the danger lurking. And then it will strike at the worst possible time. A client will rely on your analysis and will discover through applyling it that it's dead wrong. Or during a presentation, somebody in the audience will observe something strange about your results that you didn't notice and will call you out on it.

                  You should never do this kind of thing. It is likely, as Carlo points out, that the two unmatched items are in fact intended to match each other but one of them has a wrong date, perhaps due to a typo. With only two mismatched items like that, it is likely that the fix is fairly simple. Or it may be that if you look into the specifications of these two files from their source, you will find out that they are not intended to match exactly. Perhaps one covers a period from 1jan2005 through 31dec2005 and the other covers 31dec2004 through 30dec2005. Such data would properly have two unmatched observations when merged, and deleting them as you have done, would be appropriate. But unless you are sure that the two unmatched observations are spurious, deleting them is a really bad idea. You should either find out for sure that they don't belong, or you should find out why one (or both) of them is a data error and get the error corrected.

                  Comment


                  • #10
                    Hi I encountered a problem while reading the NSSO unit level for India with the help of dictionary file. I specified a variable in my dictionary using "D:\NSSO_70th\Data\Visit-1\AH0633V1.txt" {
                    _column(1) ccode %3s " "
                    _column(4) fsu %5s ""
                    _column(9) rnd %2s "". I ran a command "infile using Lev_06.dct, clear"", it read a variables say fsu rnd (and all other variables as well) fsu, as "float %9.0g" , whicseems quite strange to me. I shall be grateful if someone could address my issues

                    Comment


                    • #11
                      Devender Pratap Your question is not related to the topic of this thread. It is important that threads stay on topic so that people who are searching for help similar to what you are asking will be able to find whatever solutions come up. Please re-post your question in a new thread.

                      I would also recommend you make your question clearer. First there is formatting: your code should be posted between code delimiters so it is easily readable. (See FAQ #12 for details on how to use code delimiters). Next, this is an international, interdisciplinary forum. You should not assume that people will automatically understand what NSSO is. Spell it out, and if it refers to data that is available online for others to see, do provide a link. It would probably also be a good idea to show something of what the original data in AH0633V1.txt look like, so that somebody can see whether your dictionary file is an appropriate match to the data. With the limited information you have given, it is anybody's guess what the problem might be.

                      Comment


                      • #12
                        I'm having a similar issue, but the error message about variables not uniquely identifying observations in the master data persists when I run the merge command, even after specifying that it is a m:1 merge. Isn't that the point of specifying the m:1 format? Stata 14, so it will not go forward with the merge (Stata <11 would give the error message but go forward with the merge under the old syntax, I think).

                        Comment


                        • #13
                          Please use code delimiters (see FAQ #12 if you are not familiar with these) and post the exact code you ran and the exact output you got from Stata.

                          Comment


                          • #14
                            I loaded the master set, sorted by the identifying variables (a country code and the year) in the master and the using set, then tried to do the m:1 merge.

                            Code:
                            merge m:1 ccode1 year using "SWIID.dta"
                            where the master set has multiple instances of ccode1-year combinations, the using set has (or should have) only one instance of each ccode1-year combination, and I am trying to insert the using set's data into all instances in the master set.

                            Stata then stops the merge and gives me
                            Code:
                            variables ccode1 year do not uniquely identify observations in the using data
                            r(459);
                            This seems odd, as the duplicates command in the using data set yield no reported duplicates.

                            Comment


                            • #15
                              what, again, exact, duplicates command did you use? in particular, did you limit it to searching for duplicates on only those 2 variables? please show your exact code (copy-and-paste) if this is not clear

                              Comment

                              Working...
                              X