Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing from csv (wide format) to Stata (long format)

    Hello,

    I want to perform a panel data analysis of environmental degradation and economic growth.

    I am trying to transform the PRIMAP-hist database (national emissions time series) from csv (wide format) to Stata (long format).

    After import the csv data to Stata in wide format, I tried to reshape it by using the following code but I only got one column for all the observations:

    Code:
    reshape long v, i(category country entity scenario unit) j(year)


    I want to obtain a column for each "entity", but I do not know how to do perform that. I tried to use this code:

    Code:
    reshape long v entity, i(category country scenario unit) j(year)
    And I get this error

    Code:
    variable id does not uniquely identify the observations
        Your data are currently wide.  You are performing a reshape long.  You specified i(category country scenario unit) and j(year).  In
        the current wide form, variable category country scenario unit should uniquely identify the observations.  Remember this picture:
    
             long                                wide
            +---------------+                   +------------------+
            | i   j   a   b |                   | i   a1 a2  b1 b2 |
            |---------------| <--- reshape ---> |------------------|
            | 1   1   1   2 |                   | 1   1   3   2  4 |
            | 1   2   3   4 |                   | 2   5   7   6  8 |
            | 2   1   5   6 |                   +------------------+
            | 2   2   7   8 |
            +---------------+
        Type reshape error for a list of the problem observations.
    r(9);

    How can I obtain a regular panel data with this csv dataset? Sorry if this is a trivial question.
    Last edited by sladmin; 02 Mar 2024, 08:23. Reason: anonymize original poster

  • #2
    Data example using dataex please. Welcome to Statalist!

    Comment


    • #3
      Originally posted by Jared Greathouse View Post
      Data example using dataex please. Welcome to Statalist!
      Thanks for your reply! I will try to provide data example although I dont know if it will be correct, since I have a extensive database.

      This is what I get after importing it:
      Note: v6, v7, v8 etc are the years but Stata does not allow me to change v6, v7, v8 etc to 1850, 1851, 1852 etc in the wide format

      Code:
      . dataex category country entity scenario unit v6 v7 v8 v9 in 1/5
      
      ----------------------- copy starting from the next line -----------------------
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str9(category country) str11 entity str6 scenario str7 unit float(v6 v7 v8 v9)
      "IPC1A" "ABW" "CH4" "HISTCR" "Gg" .000366 .000378  .00039 .000402
      "IPC1A" "AFG" "CH4" "HISTCR" "Gg"   .0317   .0319   .0321   .0323
      "IPC1A" "AGO" "CH4" "HISTCR" "Gg"    1.75    1.77     1.8    1.82
      "IPC1A" "AIA" "CH4" "HISTCR" "Gg"  .00012  .00012 .000121 .000123
      "IPC1A" "ALB" "CH4" "HISTCR" "Gg"   .0602   .0606   .0615    .063
      end
      After reshaping, I obtain

      Code:
      . reshape long v, i(category country entity scenario unit) j(year)
      (note: j = 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4
      > 6 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 8
      > 8 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
      >  123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 1
      > 54 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173)
      
      Data                               wide   ->   long
      -----------------------------------------------------------------------------
      Number of obs.                    29728   -> 5.0e+06
      Number of variables                 173   ->       7
      j variable (168 values)                   ->   year
      xij variables:
                               v6 v7 ... v173   ->   v
      -----------------------------------------------------------------------------



      Code:
      . dataex category country entity scenario unit v in 1/5
      
      ----------------------- copy starting from the next line -----------------------
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str9(category country) str11 entity str6 scenario str7 unit float v
      "IPC1" "ABW" "CH4" "HISTCR" "Gg" .000387
      "IPC1" "ABW" "CH4" "HISTCR" "Gg"   .0004
      "IPC1" "ABW" "CH4" "HISTCR" "Gg" .000413
      "IPC1" "ABW" "CH4" "HISTCR" "Gg" .000426
      "IPC1" "ABW" "CH4" "HISTCR" "Gg" .000439
      end


      But this is NOT what I want. I want to have columns for each "entity" (type of pollutant)

      Comment


      • #4

        In case I did not explain myself, this is a simplified example of my problem.

        Sorry if this is not appropiate, but I could not find other way to expose my issue with this reshaping outcome.

        Click image for larger version

Name:	e54cf8d5-2b0d-4e8c-8c5e-853ee994563b.jpg
Views:	1
Size:	279.0 KB
ID:	1711534

        Comment


        • #5
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str9(category country) str11 entity str6 scenario str7 unit float(v6 v7 v8 v9)
          "IPC1A" "ABW" "CH4" "HISTCR" "Gg" .000366 .000378  .00039 .000402
          "IPC1A" "AFG" "CH4" "HISTCR" "Gg"   .0317   .0319   .0321   .0323
          "IPC1A" "AGO" "CH4" "HISTCR" "Gg"    1.75    1.77     1.8    1.82
          "IPC1A" "AIA" "CH4" "HISTCR" "Gg"  .00012  .00012 .000121 .000123
          "IPC1A" "ALB" "CH4" "HISTCR" "Gg"   .0602   .0606   .0615    .063
          "IPC1A" "ABW" "CO2" "HISTCR" "Gg" .000366 .000378  .00039 .000402
          "IPC1A" "AFG" "CO2" "HISTCR" "Gg"   .0317   .0319   .0321   .0323
          "IPC1A" "AGO" "CO2" "HISTCR" "Gg"    1.75    1.77     1.8    1.82
          "IPC1A" "AIA" "CO2" "HISTCR" "Gg"  .00012  .00012 .000121 .000123
          "IPC1A" "ALB" "CO2" "HISTCR" "Gg"   .0602   .0606   .0615    .063
          end
          
          reshape long v, i(category country entity scenario unit) j(year)
          reshape wide v, i(country year) j(entity) string
          rename v* *
          Res.:

          Code:
          . sort country year
          
          . l, sepby(country)
          
               +-----------------------------------------------------------------+
               | country   year       CH4       CO2   category   scenario   unit |
               |-----------------------------------------------------------------|
            1. |     ABW      6   .000366   .000366      IPC1A     HISTCR     Gg |
            2. |     ABW      7   .000378   .000378      IPC1A     HISTCR     Gg |
            3. |     ABW      8    .00039    .00039      IPC1A     HISTCR     Gg |
            4. |     ABW      9   .000402   .000402      IPC1A     HISTCR     Gg |
               |-----------------------------------------------------------------|
            5. |     AFG      6     .0317     .0317      IPC1A     HISTCR     Gg |
            6. |     AFG      7     .0319     .0319      IPC1A     HISTCR     Gg |
            7. |     AFG      8     .0321     .0321      IPC1A     HISTCR     Gg |
            8. |     AFG      9     .0323     .0323      IPC1A     HISTCR     Gg |
               |-----------------------------------------------------------------|
            9. |     AGO      6      1.75      1.75      IPC1A     HISTCR     Gg |
           10. |     AGO      7      1.77      1.77      IPC1A     HISTCR     Gg |
           11. |     AGO      8       1.8       1.8      IPC1A     HISTCR     Gg |
           12. |     AGO      9      1.82      1.82      IPC1A     HISTCR     Gg |
               |-----------------------------------------------------------------|
           13. |     AIA      6    .00012    .00012      IPC1A     HISTCR     Gg |
           14. |     AIA      7    .00012    .00012      IPC1A     HISTCR     Gg |
           15. |     AIA      8   .000121   .000121      IPC1A     HISTCR     Gg |
           16. |     AIA      9   .000123   .000123      IPC1A     HISTCR     Gg |
               |-----------------------------------------------------------------|
           17. |     ALB      6     .0602     .0602      IPC1A     HISTCR     Gg |
           18. |     ALB      7     .0606     .0606      IPC1A     HISTCR     Gg |
           19. |     ALB      8     .0615     .0615      IPC1A     HISTCR     Gg |
           20. |     ALB      9      .063      .063      IPC1A     HISTCR     Gg |
               +-----------------------------------------------------------------+

          Comment


          • #6
            Originally posted by Andrew Musau View Post
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str9(category country) str11 entity str6 scenario str7 unit float(v6 v7 v8 v9)
            "IPC1A" "ABW" "CH4" "HISTCR" "Gg" .000366 .000378 .00039 .000402
            "IPC1A" "AFG" "CH4" "HISTCR" "Gg" .0317 .0319 .0321 .0323
            "IPC1A" "AGO" "CH4" "HISTCR" "Gg" 1.75 1.77 1.8 1.82
            "IPC1A" "AIA" "CH4" "HISTCR" "Gg" .00012 .00012 .000121 .000123
            "IPC1A" "ALB" "CH4" "HISTCR" "Gg" .0602 .0606 .0615 .063
            "IPC1A" "ABW" "CO2" "HISTCR" "Gg" .000366 .000378 .00039 .000402
            "IPC1A" "AFG" "CO2" "HISTCR" "Gg" .0317 .0319 .0321 .0323
            "IPC1A" "AGO" "CO2" "HISTCR" "Gg" 1.75 1.77 1.8 1.82
            "IPC1A" "AIA" "CO2" "HISTCR" "Gg" .00012 .00012 .000121 .000123
            "IPC1A" "ALB" "CO2" "HISTCR" "Gg" .0602 .0606 .0615 .063
            end
            
            reshape long v, i(category country entity scenario unit) j(year)
            reshape wide v, i(country year) j(entity) string
            rename v* *
            Res.:

            Code:
            . sort country year
            
            . l, sepby(country)
            
            +-----------------------------------------------------------------+
            | country year CH4 CO2 category scenario unit |
            |-----------------------------------------------------------------|
            1. | ABW 6 .000366 .000366 IPC1A HISTCR Gg |
            2. | ABW 7 .000378 .000378 IPC1A HISTCR Gg |
            3. | ABW 8 .00039 .00039 IPC1A HISTCR Gg |
            4. | ABW 9 .000402 .000402 IPC1A HISTCR Gg |
            |-----------------------------------------------------------------|
            5. | AFG 6 .0317 .0317 IPC1A HISTCR Gg |
            6. | AFG 7 .0319 .0319 IPC1A HISTCR Gg |
            7. | AFG 8 .0321 .0321 IPC1A HISTCR Gg |
            8. | AFG 9 .0323 .0323 IPC1A HISTCR Gg |
            |-----------------------------------------------------------------|
            9. | AGO 6 1.75 1.75 IPC1A HISTCR Gg |
            10. | AGO 7 1.77 1.77 IPC1A HISTCR Gg |
            11. | AGO 8 1.8 1.8 IPC1A HISTCR Gg |
            12. | AGO 9 1.82 1.82 IPC1A HISTCR Gg |
            |-----------------------------------------------------------------|
            13. | AIA 6 .00012 .00012 IPC1A HISTCR Gg |
            14. | AIA 7 .00012 .00012 IPC1A HISTCR Gg |
            15. | AIA 8 .000121 .000121 IPC1A HISTCR Gg |
            16. | AIA 9 .000123 .000123 IPC1A HISTCR Gg |
            |-----------------------------------------------------------------|
            17. | ALB 6 .0602 .0602 IPC1A HISTCR Gg |
            18. | ALB 7 .0606 .0606 IPC1A HISTCR Gg |
            19. | ALB 8 .0615 .0615 IPC1A HISTCR Gg |
            20. | ALB 9 .063 .063 IPC1A HISTCR Gg |
            +-----------------------------------------------------------------+

            Thanks so much for your answer!

            However, after running this code I get the following error again:

            Code:
            reshape wide v, i(country year) j(entity) string
            Code:
            . reshape wide v, i(country year) j(entity) string
            (note: j = CH4 CO2 FGASES FGASESAR4 HFCS HFCSAR4 KYOTOGHG KYOTOGHGAR4 N2O NF3 PFCS PFCSAR4 SF6)
            values of variable entity not unique within country year
                Your data are currently long.  You are performing a reshape wide.  You specified i(country year) and j(entity).  There are
                observations within i(country year) with the same value of j(entity).  In the long data, variables i() and j() together
                must uniquely identify the observations.
            
                     long                                wide
                    +---------------+                   +------------------+
                    | i   j   a   b |                   | i   a1 a2  b1 b2 |
                    |---------------| <--- reshape ---> |------------------|
                    | 1   1   1   2 |                   | 1   1   3   2  4 |
                    | 1   2   3   4 |                   | 2   5   7   6  8 |
                    | 2   1   5   6 |                   +------------------+
                    | 2   2   7   8 |
                    +---------------+
                Type reshape error for a list of the problem variables.
            r(9);
            Last edited by sladmin; 02 Mar 2024, 08:23. Reason: anonymize original poster

            Comment


            • #7
              If country, year, entity and v are the only variables in your dataset, then you have duplicate observations. You can resolve this by first eliminating these duplicates.

              Code:
              duplicates drop *, force
              Otherwise, there is some structure to your data that you are not adequately representing (maybe you do not have panel data but multiple observations per each time period). For help with this, we need a reproducible example (see FAQ Advice #12 for details).

              Comment


              • #8
                Before you go ahead with -duplicates drop *, force-, you should ask yourself why you have duplicate observations in your data set. Evidently you don't expect them to be there. So something went wrong in the creation of the dataset. And where one mistake has been made evident, others often lurk. So if I were you, I would do:
                Code:
                duplicates tag country year entity, gen(flag)
                browse if flag
                to see the offending observations. Perhaps seeing them will jog you into realizing that they really should be there, and that some other variable(s) need to be specified in -reshape-'s -i()- option to get unique observation identification. If not, then, rather than just forcing out the duplicates, review the data management that created this data set and fix the errors that led to this problem, as well as any others you find in the course of doing that.
                Last edited by Clyde Schechter; 29 Apr 2023, 15:36.

                Comment


                • #9
                  Originally posted by Andrew Musau View Post
                  If country, year, entity and v are the only variables in your dataset, then you have duplicate observations. You can resolve this by first eliminating these duplicates.

                  Code:
                  duplicates drop *, force
                  Otherwise, there is some structure to your data that you are not adequately representing (maybe you do not have panel data but multiple observations per each time period). For help with this, we need a reproducible example (see FAQ Advice #12 for details).
                  Sorry for the late reply, but no, the variables in my dataset are category country entity scenario unit and year, as I wrote in my first post.

                  Comment


                  • #10
                    Then provide an example with these extra variables that reproduces the problem.

                    Comment


                    • #11
                      Originally posted by Andrew Musau View Post
                      Then provide an example with these extra variables that reproduces the problem.
                      They are the variables from the data example in #3. I mean, I want to keep the rest unaltered while "reshaping only the entity columns", like in the picture I provided in #4.
                      Sorry if it seems confusing, it is the best way I can explain it.

                      Comment


                      • #12
                        The thing is that the code in #5 works perfectly for the data in #3. We need data that replicates your error. Sorry, I do not want to guess what the issue is - so maybe someone else can help you absent a reproducible example.

                        Comment


                        • #13
                          I can't see the problematic dataset either, but I have a guess that may help. Quite often with this kind of data the problematic observations are gunk or junk at the end of the dataset full of missing values that are a side-effect of something done in a spreadsheet version of the data.

                          Getting rid of any observations missing on all variables would do no harm if they exist.

                          Comment

                          Working...
                          X