Importing from csv (wide format) to Stata (long format)

Guest

Importing from csv (wide format) to Stata (long format)

27 Apr 2023, 12:05

Hello,

I want to perform a panel data analysis of environmental degradation and economic growth.

I am trying to transform the PRIMAP-hist database (national emissions time series) from csv (wide format) to Stata (long format).

After import the csv data to Stata in wide format, I tried to reshape it by using the following code but I only got one column for all the observations:

Code:

reshape long v, i(category country entity scenario unit) j(year)

I want to obtain a column for each "entity", but I do not know how to do perform that. I tried to use this code:

Code:

reshape long v entity, i(category country scenario unit) j(year)

And I get this error

Code:

variable id does not uniquely identify the observations
    Your data are currently wide.  You are performing a reshape long.  You specified i(category country scenario unit) and j(year).  In
    the current wide form, variable category country scenario unit should uniquely identify the observations.  Remember this picture:

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+
    Type reshape error for a list of the problem observations.
r(9);

How can I obtain a regular panel data with this csv dataset? Sorry if this is a trivial question.

Last edited by sladmin; 02 Mar 2024, 08:23. Reason: anonymize original poster

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

27 Apr 2023, 12:08

Data example using dataex please. Welcome to Statalist!
Comment

Guest

27 Apr 2023, 12:47

Originally posted by Jared Greathouse View Post

Data example using dataex please. Welcome to Statalist!

Thanks for your reply! I will try to provide data example although I dont know if it will be correct, since I have a extensive database.

This is what I get after importing it:
Note: v6, v7, v8 etc are the years but Stata does not allow me to change v6, v7, v8 etc to 1850, 1851, 1852 etc in the wide format

Code:

. dataex category country entity scenario unit v6 v7 v8 v9 in 1/5

----------------------- copy starting from the next line -----------------------

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9(category country) str11 entity str6 scenario str7 unit float(v6 v7 v8 v9)
"IPC1A" "ABW" "CH4" "HISTCR" "Gg" .000366 .000378  .00039 .000402
"IPC1A" "AFG" "CH4" "HISTCR" "Gg"   .0317   .0319   .0321   .0323
"IPC1A" "AGO" "CH4" "HISTCR" "Gg"    1.75    1.77     1.8    1.82
"IPC1A" "AIA" "CH4" "HISTCR" "Gg"  .00012  .00012 .000121 .000123
"IPC1A" "ALB" "CH4" "HISTCR" "Gg"   .0602   .0606   .0615    .063
end

After reshaping, I obtain

Code:

. reshape long v, i(category country entity scenario unit) j(year)
(note: j = 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4
> 6 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 8
> 8 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
>  123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 1
> 54 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                    29728   -> 5.0e+06
Number of variables                 173   ->       7
j variable (168 values)                   ->   year
xij variables:
                         v6 v7 ... v173   ->   v
-----------------------------------------------------------------------------

Code:

. dataex category country entity scenario unit v in 1/5

----------------------- copy starting from the next line -----------------------

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9(category country) str11 entity str6 scenario str7 unit float v
"IPC1" "ABW" "CH4" "HISTCR" "Gg" .000387
"IPC1" "ABW" "CH4" "HISTCR" "Gg"   .0004
"IPC1" "ABW" "CH4" "HISTCR" "Gg" .000413
"IPC1" "ABW" "CH4" "HISTCR" "Gg" .000426
"IPC1" "ABW" "CH4" "HISTCR" "Gg" .000439
end

But this is NOT what I want. I want to have columns for each "entity" (type of pollutant)

Comment

Guest
#4

27 Apr 2023, 17:25

In case I did not explain myself, this is a simplified example of my problem.

Sorry if this is not appropiate, but I could not find other way to expose my issue with this reshaping outcome.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10141

28 Apr 2023, 00:08

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9(category country) str11 entity str6 scenario str7 unit float(v6 v7 v8 v9)
"IPC1A" "ABW" "CH4" "HISTCR" "Gg" .000366 .000378  .00039 .000402
"IPC1A" "AFG" "CH4" "HISTCR" "Gg"   .0317   .0319   .0321   .0323
"IPC1A" "AGO" "CH4" "HISTCR" "Gg"    1.75    1.77     1.8    1.82
"IPC1A" "AIA" "CH4" "HISTCR" "Gg"  .00012  .00012 .000121 .000123
"IPC1A" "ALB" "CH4" "HISTCR" "Gg"   .0602   .0606   .0615    .063
"IPC1A" "ABW" "CO2" "HISTCR" "Gg" .000366 .000378  .00039 .000402
"IPC1A" "AFG" "CO2" "HISTCR" "Gg"   .0317   .0319   .0321   .0323
"IPC1A" "AGO" "CO2" "HISTCR" "Gg"    1.75    1.77     1.8    1.82
"IPC1A" "AIA" "CO2" "HISTCR" "Gg"  .00012  .00012 .000121 .000123
"IPC1A" "ALB" "CO2" "HISTCR" "Gg"   .0602   .0606   .0615    .063
end

reshape long v, i(category country entity scenario unit) j(year)
reshape wide v, i(country year) j(entity) string
rename v* *

Res.:

Code:

. sort country year

. l, sepby(country)

     +-----------------------------------------------------------------+
     | country   year       CH4       CO2   category   scenario   unit |
     |-----------------------------------------------------------------|
  1. |     ABW      6   .000366   .000366      IPC1A     HISTCR     Gg |
  2. |     ABW      7   .000378   .000378      IPC1A     HISTCR     Gg |
  3. |     ABW      8    .00039    .00039      IPC1A     HISTCR     Gg |
  4. |     ABW      9   .000402   .000402      IPC1A     HISTCR     Gg |
     |-----------------------------------------------------------------|
  5. |     AFG      6     .0317     .0317      IPC1A     HISTCR     Gg |
  6. |     AFG      7     .0319     .0319      IPC1A     HISTCR     Gg |
  7. |     AFG      8     .0321     .0321      IPC1A     HISTCR     Gg |
  8. |     AFG      9     .0323     .0323      IPC1A     HISTCR     Gg |
     |-----------------------------------------------------------------|
  9. |     AGO      6      1.75      1.75      IPC1A     HISTCR     Gg |
 10. |     AGO      7      1.77      1.77      IPC1A     HISTCR     Gg |
 11. |     AGO      8       1.8       1.8      IPC1A     HISTCR     Gg |
 12. |     AGO      9      1.82      1.82      IPC1A     HISTCR     Gg |
     |-----------------------------------------------------------------|
 13. |     AIA      6    .00012    .00012      IPC1A     HISTCR     Gg |
 14. |     AIA      7    .00012    .00012      IPC1A     HISTCR     Gg |
 15. |     AIA      8   .000121   .000121      IPC1A     HISTCR     Gg |
 16. |     AIA      9   .000123   .000123      IPC1A     HISTCR     Gg |
     |-----------------------------------------------------------------|
 17. |     ALB      6     .0602     .0602      IPC1A     HISTCR     Gg |
 18. |     ALB      7     .0606     .0606      IPC1A     HISTCR     Gg |
 19. |     ALB      8     .0615     .0615      IPC1A     HISTCR     Gg |
 20. |     ALB      9      .063      .063      IPC1A     HISTCR     Gg |
     +-----------------------------------------------------------------+

Comment

Guest

28 Apr 2023, 05:01

Originally posted by Andrew Musau View Post

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9(category country) str11 entity str6 scenario str7 unit float(v6 v7 v8 v9)
"IPC1A" "ABW" "CH4" "HISTCR" "Gg" .000366 .000378 .00039 .000402
"IPC1A" "AFG" "CH4" "HISTCR" "Gg" .0317 .0319 .0321 .0323
"IPC1A" "AGO" "CH4" "HISTCR" "Gg" 1.75 1.77 1.8 1.82
"IPC1A" "AIA" "CH4" "HISTCR" "Gg" .00012 .00012 .000121 .000123
"IPC1A" "ALB" "CH4" "HISTCR" "Gg" .0602 .0606 .0615 .063
"IPC1A" "ABW" "CO2" "HISTCR" "Gg" .000366 .000378 .00039 .000402
"IPC1A" "AFG" "CO2" "HISTCR" "Gg" .0317 .0319 .0321 .0323
"IPC1A" "AGO" "CO2" "HISTCR" "Gg" 1.75 1.77 1.8 1.82
"IPC1A" "AIA" "CO2" "HISTCR" "Gg" .00012 .00012 .000121 .000123
"IPC1A" "ALB" "CO2" "HISTCR" "Gg" .0602 .0606 .0615 .063
end

reshape long v, i(category country entity scenario unit) j(year)
reshape wide v, i(country year) j(entity) string
rename v* *

Res.:

Code:

. sort country year

. l, sepby(country)

+-----------------------------------------------------------------+
| country year CH4 CO2 category scenario unit |
|-----------------------------------------------------------------|
1. | ABW 6 .000366 .000366 IPC1A HISTCR Gg |
2. | ABW 7 .000378 .000378 IPC1A HISTCR Gg |
3. | ABW 8 .00039 .00039 IPC1A HISTCR Gg |
4. | ABW 9 .000402 .000402 IPC1A HISTCR Gg |
|-----------------------------------------------------------------|
5. | AFG 6 .0317 .0317 IPC1A HISTCR Gg |
6. | AFG 7 .0319 .0319 IPC1A HISTCR Gg |
7. | AFG 8 .0321 .0321 IPC1A HISTCR Gg |
8. | AFG 9 .0323 .0323 IPC1A HISTCR Gg |
|-----------------------------------------------------------------|
9. | AGO 6 1.75 1.75 IPC1A HISTCR Gg |
10. | AGO 7 1.77 1.77 IPC1A HISTCR Gg |
11. | AGO 8 1.8 1.8 IPC1A HISTCR Gg |
12. | AGO 9 1.82 1.82 IPC1A HISTCR Gg |
|-----------------------------------------------------------------|
13. | AIA 6 .00012 .00012 IPC1A HISTCR Gg |
14. | AIA 7 .00012 .00012 IPC1A HISTCR Gg |
15. | AIA 8 .000121 .000121 IPC1A HISTCR Gg |
16. | AIA 9 .000123 .000123 IPC1A HISTCR Gg |
|-----------------------------------------------------------------|
17. | ALB 6 .0602 .0602 IPC1A HISTCR Gg |
18. | ALB 7 .0606 .0606 IPC1A HISTCR Gg |
19. | ALB 8 .0615 .0615 IPC1A HISTCR Gg |
20. | ALB 9 .063 .063 IPC1A HISTCR Gg |
+-----------------------------------------------------------------+

Thanks so much for your answer!

However, after running this code I get the following error again:

Code:

reshape wide v, i(country year) j(entity) string

Code:

. reshape wide v, i(country year) j(entity) string
(note: j = CH4 CO2 FGASES FGASESAR4 HFCS HFCSAR4 KYOTOGHG KYOTOGHGAR4 N2O NF3 PFCS PFCSAR4 SF6)
values of variable entity not unique within country year
    Your data are currently long.  You are performing a reshape wide.  You specified i(country year) and j(entity).  There are
    observations within i(country year) with the same value of j(entity).  In the long data, variables i() and j() together
    must uniquely identify the observations.

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+
    Type reshape error for a list of the problem variables.
r(9);

Last edited by sladmin; 02 Mar 2024, 08:23. Reason: anonymize original poster

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10141
#7

29 Apr 2023, 05:25

If country, year, entity and v are the only variables in your dataset, then you have duplicate observations. You can resolve this by first eliminating these duplicates.

Code:

duplicates drop *, force

Otherwise, there is some structure to your data that you are not adequately representing (maybe you do not have panel data but multiple observations per each time period). For help with this, we need a reproducible example (see FAQ Advice #12 for details).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30003
#8

29 Apr 2023, 15:34

Before you go ahead with -duplicates drop *, force-, you should ask yourself why you have duplicate observations in your data set. Evidently you don't expect them to be there. So something went wrong in the creation of the dataset. And where one mistake has been made evident, others often lurk. So if I were you, I would do:

Code:

duplicates tag country year entity, gen(flag) browse if flag

to see the offending observations. Perhaps seeing them will jog you into realizing that they really should be there, and that some other variable(s) need to be specified in -reshape-'s -i()- option to get unique observation identification. If not, then, rather than just forcing out the duplicates, review the data management that created this data set and fix the errors that led to this problem, as well as any others you find in the course of doing that.

Last edited by Clyde Schechter; 29 Apr 2023, 15:36.
Comment
Guest
#9

07 May 2023, 16:14

Originally posted by Andrew Musau View Post

If country, year, entity and v are the only variables in your dataset, then you have duplicate observations. You can resolve this by first eliminating these duplicates.

Code:

duplicates drop *, force

Otherwise, there is some structure to your data that you are not adequately representing (maybe you do not have panel data but multiple observations per each time period). For help with this, we need a reproducible example (see FAQ Advice #12 for details).

Sorry for the late reply, but no, the variables in my dataset are category country entity scenario unit and year, as I wrote in my first post.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10141
#10

08 May 2023, 03:08

Then provide an example with these extra variables that reproduces the problem.
Comment
Guest
#11

11 May 2023, 16:51

Originally posted by Andrew Musau View Post

Then provide an example with these extra variables that reproduces the problem.

They are the variables from the data example in #3. I mean, I want to keep the rest unaltered while "reshaping only the entity columns", like in the picture I provided in #4.
Sorry if it seems confusing, it is the best way I can explain it.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10141
#12

12 May 2023, 16:31

The thing is that the code in #5 works perfectly for the data in #3. We need data that replicates your error. Sorry, I do not want to guess what the issue is - so maybe someone else can help you absent a reproducible example.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35505
#13

13 May 2023, 03:43

I can't see the problematic dataset either, but I have a guess that may help. Quite often with this kind of data the problematic observations are gunk or junk at the end of the dataset full of missing values that are a side-effect of something done in a spreadsheet version of the data.

Getting rid of any observations missing on all variables would do no harm if they exist.
1 like
Comment

Announcement

Importing from csv (wide format) to Stata (long format)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment