Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape Command showing "variable id does not uniquely identify the observations" error

    I am using Stata 14.1 on Windows 10, and am trying to convert a data in wide format into long format.

    After importing the data from Excel file to Stata, I have successfully converted the data from strings to real numbers so that I can run operations on them. Then, I have been able to confirm that the data is indeed in wide format by using the command "reshape wide" which displays 'already wide'.

    However, when I try to use the command "reshape long V, i(DES) j(time)" where V is the stem for years variable (named V8, V9,...V53), DES is variable with approximately 9 categories of income sources, and time is just a name for years (no such variable defined), I get the following error-

    (note: j = 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53)
    variable id does not uniquely identify the observations
    Your data are currently wide. You are performing a reshape long. You specified i(DES) and j(time). In the current wide form, variable DES should uniquely
    identify the observations. Remember this picture:

    long wide
    +---------------+ +------------------+
    | i j a b | | i a1 a2 b1 b2 |
    |---------------| <--- reshape ---> |------------------|
    | 1 1 1 2 | | 1 1 3 2 4 |
    | 1 2 3 4 | | 2 5 7 6 8 |
    | 2 1 5 6 | +------------------+
    | 2 2 7 8 |
    +---------------+
    Type reshape error for a list of the problem observations.



    Typing the "reshape error" command shows the following output-

    reshape error
    (note: j = 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53)

    i (DES) indicates the top-level grouping such as subject id.

    The data are currently in the wide form; there should be a single
    observation per i.

    1674 of 1674 observations have duplicate i values:

    +--------------------------------------------------------------------+
    | DES |
    |--------------------------------------------------------------------|
    1. | Employer contributions for employee pension and insurance funds 6/ |
    2. | Employer contributions for employee pension and insurance funds 6/ | ....

    53. | Employer contributions for employee pension and insurance funds 6/ |
    54. | Employer contributions for employee pension and insurance funds 6/ |
    55. | Employer contributions for government social insurance |
    |--------------------------------------------------------------------|
    56. | Employer contributions for government social insurance |
    57. | Employer contributions for government social insurance |
    58. | Employer contributions for government social insurance | ...




    I suspect there is an error because the variable DES should have dummy categories, but I am not sure how to proceed further.

    Any comments and suggestions would be welcome. Thank you.


    Attached Files

  • #2
    Stata's error message gives you the reason: DES does not uniquely identify observations, and so by itself cannot be used as the id for reshape. Try something like the following.

    .ÿversionÿ14.1

    .ÿ
    .ÿclearÿ*

    .ÿsetÿmoreÿoff

    .ÿ
    .ÿimportÿdelimitedÿCA30_1969_2014_AK.csv
    (53ÿvars,ÿ1678ÿobs)

    .ÿ
    .ÿquietlyÿdropÿinÿ1675/1678

    .ÿdropÿgeofipsÿregionÿtableÿlinecodeÿindustryclassification

    .ÿ
    .ÿforeachÿvarÿofÿvarlistÿv*ÿ{
    ÿÿ2.ÿÿÿÿÿÿÿÿÿlocalÿyearÿ:ÿvariableÿlabelÿ`var'
    ÿÿ3.ÿÿÿÿÿÿÿÿÿrenameÿ`var'ÿv`year'
    ÿÿ4.ÿ}

    .ÿ
    .ÿisidÿgeonameÿdescription

    .ÿencodeÿdescription,ÿgenerateÿ(des)

    .ÿencodeÿgeoname,ÿgenerateÿ(geo)

    .ÿdropÿdescriptionÿgeoname

    .ÿ
    .ÿquietlyÿreshapeÿlongÿv,ÿi(geoÿdes)ÿj(year)

    .ÿ
    .ÿquietlyÿreplaceÿvÿ=ÿ".l"ÿifÿvÿ==ÿ"(L)"

    .ÿquietlyÿreplaceÿvÿ=ÿ".n"ÿifÿvÿ==ÿ"(NA)"

    .ÿdestringÿv,ÿreplace
    vÿhasÿallÿcharactersÿnumeric;ÿreplacedÿasÿlong
    (37524ÿmissingÿvaluesÿgenerated)

    .ÿ
    .ÿlistÿinÿ1/2,ÿnoobs

    ÿÿ+---------------------------------------------------------------------------------------------------------+
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgeoÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿdesÿÿÿyearÿÿÿÿÿÿÿÿvÿ|
    ÿÿ|---------------------------------------------------------------------------------------------------------|
    ÿÿ|ÿAlaskaÿstateÿtotalÿÿÿEmployerÿcontributionsÿforÿemployeeÿpensionÿandÿinsuranceÿfundsÿ6/ÿÿÿ1969ÿÿÿ143940ÿ|
    ÿÿ|ÿAlaskaÿstateÿtotalÿÿÿEmployerÿcontributionsÿforÿemployeeÿpensionÿandÿinsuranceÿfundsÿ6/ÿÿÿ1970ÿÿÿ168957ÿ|
    ÿÿ+---------------------------------------------------------------------------------------------------------+

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Thank you Joseph, that did solve the problem.

      Comment


      • #4
        Dear all,
        I have a similar problem, as I cannot solve the following merger: (I double checked, but the id is unique).
        reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
        (note: j = 1 2 3 4 5 6 7 8 9)
        variable id does not uniquely identify the observations
        Your data are currently wide. You are performing a reshape long. You specified i(id) and j(vignr). In the current wide form, variable id should uniquely
        identify the observations. Remember this picture:

        long wide
        +---------------+ +------------------+
        | i j a b | | i a1 a2 b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1 1 1 2 | | 1 1 3 2 4 |
        | 1 2 3 4 | | 2 5 7 6 8 |
        | 2 1 5 6 | +------------------+
        | 2 2 7 8 |
        +---------------+
        Type reshape error for a list of the problem observations.

        My do file is the following:

        /Merger Vignettes



        // #0
        // program setup

        clear all
        macro drop _all
        version 12.1
        set more off

        // #1
        // load dataset with respondent data


        cd "XX"
        use "spss_probe.dta" , clear

        // #2
        // sort and save data

        sort deck
        save myfile,replace

        // #4
        // load dataset with the vignette setup

        use "vigid.dta"
        drop id_vig vignette
        order id_vignette, last

        // #5
        // reshape data to wide format and sort data

        reshape wide alter - id_vignette, i(deck) j(vignr)
        sort deck

        // #6
        // merge both datasets, generate a numeric id (is good for some analyses) and reshape data to long format

        merge deck using myfile
        drop _merge
        gen id_numeric = _n
        reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
        order id id_numeric, first
        I have added the two datasets.

        I would appreciate any assistance - thank you!
        Bettina
        vigid.dta
        spss_probe.dta


        Attached Files

        Comment


        • #5
          There's no duplicates in the spss_probe dataset, but when you merge, you end up with a number of missing values for id. These are also considered duplicates.

          Code:
          . duplicates report id
          
          Duplicates in terms of id
          
          --------------------------------------
             copies | observations       surplus
          ----------+---------------------------
                  1 |            7             0
                 11 |           11            10
          --------------------------------------
          
          . order id, first
          
          . list id-alter1
          
               +--------------------+
               | id   deck   alter1 |
               |--------------------|
            1. |  .      1       20 |
            2. |  .      2       20 |
            3. |  .      3       20 |
            4. |  .      4       20 |
            5. | 10      5       20 |
               |--------------------|
            6. | 24      6       20 |
            7. |  .      7       20 |
            8. |  .      8       20 |
            9. |  .      9       20 |
           10. | 18     10       20 |
               |--------------------|
           11. | 19     11       20 |
           12. |  .     12       20 |
           13. |  .     13       20 |
           14. | 11     14       20 |
           15. |  .     15       20 |
               |--------------------|
           16. |  .     16       20 |
           17. | 29     11       20 |
           18. |  4     11       20 |
               +---------------
          You may want to drop all observations with missing id's.
          Alternatively, you may want to load the spss_probe.dta into memory, merge with vigid, and drop the observations that have _merge==2, i.e., those that are only available in the using dataset (this would give the result you seem to expect)

          Comment


          • #6
            Thank you Jorrit for your quick reply.
            As I am quite new to stata, I do understand, why I need to drop those obersvations that are not connected to an ID. However, I do not understand how to drop the oberservations with _merge==2.
            Could you please specify the command?

            Bettina

            Comment


            • #7
              As to why you need to drop the observations with ID missing, this is because you have more than one ID missing, and Stata has no way of reshaping long such that the new list of observations is uniquely identified by a combination of earlier ID variable and the list of variables that were previously wide. This is probably not too helpful but difficult to put in words. The error message that Stata gives when a reshape cannot be done is probably the best way to explain it....

              As for a fix for your problem, answering this question properly also requires some understanding of the data transformation you are trying to do, and that is not very clear to me just yet. It helps that you attach your data, but still guessing a little bit what you want to end up with.
              One possibility, however, is that you wanted/should do is use your varibale id_numeric in your reshape, i.e.:
              Code:
              reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id_numeric) j(vignr)
              As for dropping observations, that would be:
              Code:
              drop if _merge==?
              where ? depends on what dataset is your master and which is your using dataset. Make sure to understand the values of the _merge variable by looking into help merge

              Comment


              • #8
                Thank you for patience!
                There are 16 possible decks of vignettes with 9 vignettes each in the first file. I only put in some data for the respondents, who each just answer one deck randomely , to check the merging procedure (7 respondents).

                I rewrote the do file as follows and it worked:
                / #6
                // merge both datasets, generate a numeric id (is good for some analyses) and reshape data to long format

                merge deck using myfile
                drop if _merge==1
                gen id_numeric = _n
                reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
                order id id_numeric, first

                Comment


                • #9
                  Dear Jorrit,
                  sorry to bother you gain - I now got my final data and the merger does not work. And I simply cannot figure out any difference in the trial and final data sets.
                  My syntax is as follows:
                  // #0
                  // program setup

                  clear all
                  macro drop _all
                  version 12.1
                  set more off

                  // #1
                  // load dataset with respondent data

                  cd "?"
                  use "data_final1.dta" , clear

                  // #2
                  // sort and save data

                  sort deck
                  save myfile,replace

                  // #4
                  // load dataset with the vignette setup

                  use "vigid.dta"
                  order id_vignette, last
                  drop id_vig vignette

                  // #5
                  // reshape data to wide format and sort data

                  reshape wide alter - id_vignette, i(deck) j(vignr)
                  sort deck

                  // #6
                  // merge both datasets, generate a numeric id (is good for some analyses) and reshape data to long format
                  //first trial
                  merge deck using myfile
                  drop _merge
                  gen id_numeric = _n
                  reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_ , i(id) j(vignr)
                  order id id_numeric, first
                  //second trial
                  merge deck using myfile
                  drop if _merge==1
                  gen id_numeric = _n
                  reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
                  order id id_numeric, first
                  ///third trial
                  merge deck using myfile
                  drop if _merge==2
                  gen id_numeric = _n
                  reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
                  order id id_numeric, first
                  I appreciate your help a lot!
                  Attached Files

                  Comment


                  • #10
                    You probably wanted:
                    Code:
                    reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_ , i(id_numeric) j(vignr)
                    The issue being that there are duplicates in terms of your id variable

                    Comment


                    • #11
                      Dear All,

                      It looks like I'm having a similar problem as well. I have tried to:

                      Code:
                       reshape long a, i(combined) j(prod, string)
                      But I am met with the error:

                      Code:
                      variable id does not uniquely identify the observations
                          Your data are currently wide.  You are performing a reshape long.  You specified i(combined) and j(prod).  In
                          the current wide form, variable combined should uniquely identify the observations.  Remember this picture:
                      
                               long                                wide
                              +---------------+                   +------------------+
                              | i   j   a   b |                   | i   a1 a2  b1 b2 |
                              |---------------| <--- reshape ---> |------------------|
                              | 1   1   1   2 |                   | 1   1   3   2  4 |
                              | 1   2   3   4 |                   | 2   5   7   6  8 |
                              | 2   1   5   6 |                   +------------------+
                              | 2   2   7   8 |
                              +---------------+
                          Type reshape error for a list of the problem observations.
                      r(9);
                      My goal of this code is to create a "prod" variable with all of the variable names (such as "White bread, 1 kg (supermarket)") inputted as individual entries underneath the "prod" variable. The reason I have 2 rows which (one of which has variable names aa, ab, ac, etc. and the other with the actual product names themselves, "White bread, 1 kg (supermarket)," is due to STATA's 32 character limit. It was causing me issues, so I tried to work around it by reshaping with an additional row.

                      I would also like to make another variable "store_type" in which it either says "supermarket" or "mid-priced store" and no longer having this information captured in the individual data entires themselves. So, to clarify, in the end I would have variables: year, country, city, currency, prod, and store_type. Under "prod" it would display "White bread, 1 kg," for example, and under "store_type" it would show either "supermarket" or "mid-priced store."

                      I'm sorry if this is a simple question--I'm just becoming more familiar with STATA and appreciate any and all suggestions you all may have.

                      Many thanks,
                      Jim
                      5EIU Data.xlsx
                      Submit EIU clean.do
                      EIU submit attempt.dta
                      Attached Files
                      Last edited by Jim Callegari; 10 Jan 2021, 14:01.

                      Comment


                      • #12
                        Post #11 was subsequently reposted into its own topic and responded to at

                        https://www.statalist.org/forums/for...e-observations
                        Last edited by William Lisowski; 10 Jan 2021, 15:56.

                        Comment


                        • #13
                          Please I need a help on this. Iam trying to reshape the data

                          reshape long yr,i(c_id s_id) j(year)
                          (note: j = 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019)
                          variable id does not uniquely identify the observations
                          Your data are currently wide. You are performing a reshape long. You specified i(c_id s_id) and
                          j(year). In the current wide form, variable c_id s_id should uniquely identify the observations.
                          Remember this picture:

                          long wide
                          +---------------+ +------------------+
                          | i j a b | | i a1 a2 b1 b2 |
                          |---------------| <--- reshape ---> |------------------|
                          | 1 1 1 2 | | 1 1 3 2 4 |
                          | 1 2 3 4 | | 2 5 7 6 8 |
                          | 2 1 5 6 | +------------------+
                          | 2 2 7 8 |
                          +---------------+
                          Type reshape error for a list of the problem observations.
                          r(9);

                          end of do-file

                          r(9);

                          Comment


                          • #14
                            The advice given to type


                            Code:
                            reshape error
                            is good. Equivalently

                            Code:
                            duplicates list c_id s_id

                            Comment


                            • #15
                              We are trying to generate a map (Espirito Santo- Brazil) with mean anual Tx of incidence of COVID. We did
                              shp2dta using C:\Users\rlman\Desktop\spmap\ES_Municipios_2022.sh p, database(esmun) coor(es78coor) genid(id) genc(c)
                              merge m:1 id_mun using esmun.dta
                              spmap Txmeano using es78coor, id(id_mun) fcolor(Blues)
                              Variable id_mun specified in option id() does not uniquely identify the observations

                              We did not get any solution looking about the r459 error.

                              Could anybody kindly help?

                              Thank you!

                              Comment

                              Working...
                              X