Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable year does not uniquely identify observations in the using data

    Hi everyone,

    I'm having trouble with merging two datasets. The first one "anes_timeseries_cdf_stata12.dta" has survey data and different questions. The second one "faminc_raw.dta" has fewer variables and all I'm interested in is the gini coefficient variable. So I'm trying to use the m:1 command to merge the second into the first, using the common variable they both have, "year", but I keep getting the message "Variable year does not uniquely identify observations in the using data". This was my command:

    use "anes_timeseries_cdf_stata12.dta"
    merge m:1 year using "faminc_raw.dta"
    The year variable for the first data set looks like this:
    Click image for larger version

Name:	image_10199.png
Views:	1
Size:	121.6 KB
ID:	1434663
    and the year variable for the second dataset looks like this:
    Click image for larger version

Name:	Screen Shot 2018-03-15 at 19.40.19.png
Views:	1
Size:	106.2 KB
ID:	1434665

    What am I doing wrong please?

    Thank you!

  • #2
    It seems that in your using data, you have 52 observations for each year. Why is that?

    Are there 52 weekly observations for each year? If that's the case, from which one do you want to get the gini coefficient for that year?

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

    It might be clearer if you posted an example of your data set using the dataex command. If you are running Stata version 15.1, then dataex is part of your installation already. If running an earlier Stata, run ssc install dataex to get the command. Either way, run help dataex to read the simple instructions for using it. It takes just seconds to use dataex to provide a way for those who want to help you to create a complete and faithful replica of your example data in their own Stata. That, in turn, eliminates all sorts of questions that are left unanswered by descriptions or even by data listings and tables. And it enables people to test out code, so that you get the right answer the first time.

    As it stands, the very sparse description of your data means it's not possible for the reader to understand what other variables might explain the structure of your data.

    Comment


    • #3
      Sorry about that. This is for my first dataset:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(year state)
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      1964 .
      end
      label values state VCF0132_
      And this is for my second dataset:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float year str13 state float gini
      1964 "Alabama"     .4077555
      1966 "Alabama"     .3816237
      1968 "Alabama"    .37762225
      1970 "Alabama"     .3520307
      1972 "Alabama"     .3547146
      1974 "Alabama"     .3421975
      1976 "Alabama"     .3349688
      1978 "Alabama"      .348919
      1980 "Alabama"     .3424812
      1982 "Alabama"     .3659413
      1984 "Alabama"     .3592145
      1986 "Alabama"     .3579671
      1988 "Alabama"     .3621861
      1990 "Alabama"     .3682782
      1992 "Alabama"     .3569223
      1994 "Alabama"     .3809595
      1996 "Alabama"     .4401172
      1998 "Alabama"     .3764013
      2000 "Alabama"     .3672322
      2002 "Alabama"      .399734
      2004 "Alabama"     .4038898
      1964 "Alaska"     .25659016
      1966 "Alaska"       .366811
      1968 "Alaska"      .3190614
      1970 "Alaska"      .3319477
      1972 "Alaska"      .3338034
      1974 "Alaska"      .3370805
      1976 "Alaska"      .3604187
      1978 "Alaska"     .34519795
      1980 "Alaska"      .3727292
      1982 "Alaska"      .3660413
      1984 "Alaska"      .3850468
      1986 "Alaska"      .3796411
      1988 "Alaska"      .3660952
      1990 "Alaska"      .3790529
      1992 "Alaska"      .3723627
      1994 "Alaska"      .3652299
      1996 "Alaska"      .3769269
      1998 "Alaska"      .4102067
      2000 "Alaska"      .4068012
      2002 "Alaska"      .3843603
      2004 "Alaska"      .3865215
      1964 "Arizona"     .3613673
      1966 "Arizona"     .3777127
      1968 "Arizona"     .3059706
      1970 "Arizona"     .3202626
      1972 "Arizona"      .345401
      1974 "Arizona"    .35216725
      1976 "Arizona"     .3578327
      1978 "Arizona"     .3581539
      1980 "Arizona"     .3552242
      1982 "Arizona"     .3557354
      1984 "Arizona"     .3775792
      1986 "Arizona"     .3596863
      1988 "Arizona"     .3858389
      1990 "Arizona"    .38416785
      1992 "Arizona"      .370002
      1994 "Arizona"     .3855753
      1996 "Arizona"     .4357419
      1998 "Arizona"     .4655208
      2000 "Arizona"      .409562
      2002 "Arizona"     .4416676
      2004 "Arizona"     .4631516
      1964 "Arkansas"    .3362206
      1966 "Arkansas"    .4414435
      1968 "Arkansas"    .3409018
      1970 "Arkansas"    .3521984
      1972 "Arkansas"    .3712595
      1974 "Arkansas"   .37071675
      1976 "Arkansas"    .3614232
      1978 "Arkansas"    .3794985
      1980 "Arkansas"    .3514598
      1982 "Arkansas"    .3588502
      1984 "Arkansas"   .35628855
      1986 "Arkansas"    .3603032
      1988 "Arkansas"   .38558495
      1990 "Arkansas"    .3568946
      1992 "Arkansas"    .3749918
      1994 "Arkansas"    .3807168
      1996 "Arkansas"   .37542585
      1998 "Arkansas"    .4005397
      2000 "Arkansas"    .3853933
      2002 "Arkansas"   .39266065
      2004 "Arkansas"    .4249859
      1964 "California"  .3284921
      1966 "California"  .3251825
      1968 "California"  .3224367
      1970 "California"  .3226561
      1972 "California" .34635144
      1974 "California"  .3396491
      1976 "California"  .3448448
      1978 "California"  .3487724
      1980 "California"  .3499597
      1982 "California"  .3647356
      1984 "California"  .3650855
      1986 "California"  .3678169
      1988 "California" .37824175
      1990 "California" .38603935
      1992 "California"  .3821376
      1994 "California"   .400698
      end
      Why did the information from my first dataset come out like that?

      Comment


      • #4
        The first problem is that your data is organized by year and state, not just by year, so your merge should be by year and state, not just by year.

        The second problem is that in your second dataset, state is a string variable with the name of the state, while in your first dataset, state is a numeric variable that presumably contains one of the many versions of numeric state codes, and this variable has a value label associated with it to display the state names.

        Your third problem is that you apparently have 52 possible values for state. I'm guessing this is for 52 states, the District of Columbia, and perhaps a "total" for the entire country. But we don't know that for sure.

        You're going to need to convert one or the other of these state variable so that they can be matched by the merge command.

        Comment


        • #5
          Hi,

          Thank you! I was able to merge the to datasets. However, I noticed that with the first dataset a lot of observations came out in the form of "."

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str3 state
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          "."
          end

          Comment


          • #6
            This is a common problem leading to the merge problems - it is probably treating all those observations with missing for state as if they reflect a single state. Missing on date might have the same effects. It is probably best to either fix the missing values to whatever they should be, or delete them before trying to merge or xtset.

            Comment


            • #7
              You don’t tell us how you solved the problem I pointed out in post #4, but it appears to me that you converted, in the first dataset numeric state codes to state names, but somehow the missing state codes (.) were converted to a string "." instead of the string missing value "".

              You might find it helpful to
              Code:
              tab state
              in each of the datasets going into the merge, and make sure they are consistent.
              Last edited by William Lisowski; 16 Mar 2018, 15:14.

              Comment


              • #8
                Originally posted by Phil Bromiley View Post
                This is a common problem leading to the merge problems - it is probably treating all those observations with missing for state as if they reflect a single state. Missing on date might have the same effects. It is probably best to either fix the missing values to whatever they should be, or delete them before trying to merge or xtset.
                Hey, I don't really understand what you mean. Also, I tried regressing one variable from one dataset with another variable from another dataset and it said "no observations", I don't understand why if I've merged it correctly.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte ideology float gini
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                . .
                end
                label values ideology VCF0803_

                Comment


                • #9
                  I just realised that when I do the merge m:1 command, no observations get matched (it says 0). I don't know how to fix this

                  Comment


                  • #10
                    Copy the merge command and its output from the Stata Results window and paste it into your next post.

                    To assure maximum readability of results that you post, please copy them from the Results window or your log file into a code block in the Forum editor, as explained in section 12 of the Statalist FAQ linked to at the top of the page. For example, the following:

                    [code]
                    . sysuse auto, clear
                    (1978 Automobile Data)

                    . describe make price

                    storage display value
                    variable name type format label variable label
                    -----------------------------------------------------------------
                    make str18 %-18s Make and Model
                    price int %8.0gc Price
                    [/code]

                    will be presented in the post as the following:
                    Code:
                    . sysuse auto, clear
                    (1978 Automobile Data)
                    
                    . describe make price
                    
                                  storage   display    value
                    variable name   type    format     label      variable label
                    -----------------------------------------------------------------
                    make            str18   %-18s                 Make and Model
                    price           int     %8.0gc                Price

                    Comment


                    • #11
                      Hi,

                      Okay I was able to fix the problem, I think. A lot of the observations merged, but not all. Is there anything I can do about this? Or is it supposed to be like that?

                      . merge m:1 year statename using "ginidata3.dta"
                      (note: variable statename was float, now double to accommodate using data's values)
                      (label statename_label already defined)

                      Result # of obs.
                      -----------------------------------------
                      not matched 817
                      from master 793 (_merge==1)
                      from using 24 (_merge==2)

                      matched 28,026 (_merge==3)
                      -----------------------------------------

                      Comment


                      • #12
                        If after running the merge you
                        Code:
                        browse if _merge==2
                        Stata's Data Browser window will open displaying the 24 observations that were in the using data that had no match in the master data. And if you
                        Code:
                        browse if _merge==2
                        Stata's Data Browser window will open displaying the 793 observations that were in the master data that had no match in the using data. If your numeric statename variables have labels, so that they display the statename rather than the state code, be sure to click on the statename and see what the numeric value is, because Stata will be matching the numeric values, not the labels.

                        You can also
                        Code:
                        tab statename, nolabel
                        in each of the two datasets and see if there are numeric values that appear in one dataset but not the other.

                        Comment


                        • #13
                          Originally posted by William Lisowski View Post
                          If after running the merge you
                          Code:
                          browse if _merge==2
                          Stata's Data Browser window will open displaying the 24 observations that were in the using data that had no match in the master data. And if you
                          Code:
                          browse if _merge==2
                          Stata's Data Browser window will open displaying the 793 observations that were in the master data that had no match in the using data. If your numeric statename variables have labels, so that they display the statename rather than the state code, be sure to click on the statename and see what the numeric value is, because Stata will be matching the numeric values, not the labels.

                          You can also
                          Code:
                          tab statename, nolabel
                          in each of the two datasets and see if there are numeric values that appear in one dataset but not the other.
                          I can see the observations that have no match, but what do I do about that please?

                          Comment


                          • #14
                            And what were the results of
                            Code:
                            tab statename, nolabel
                            in each of the two datasets?

                            Comment


                            • #15
                              I was able to fix some of the non matching observations, but not all.

                              merge m:1 statename year using "faminc_smooth2.dta"
                              (note: variable statename was float, now double to accommodate using data's values)
                              (label statename_label already defined)

                              Result # of obs.
                              -----------------------------------------
                              not matched 26
                              from master 0 (_merge==1)
                              from using 26 (_merge==2)

                              matched 28,777 (_merge==3)
                              -----------------------------------------

                              When I used "tab statename, nolabel" for both datasets, it shows that they are now both labeled correctly. I'm not sure what else to do.

                              Comment

                              Working...
                              X