Adding observations to a dataset with .csv files and a loop

Benoit Decoco

Join Date: Jun 2022

Posts: 17
#1

Adding observations to a dataset with .csv files and a loop

05 Jul 2022, 07:14

Hello Stata users,

I am using Stata 17. I want to add observations and variables in a dataset A.

I have the following dataset A

Code:

Date Year urbanname urbancode etc.. 2010m1 2010 Beijing 1029 etc. 2010m2 2010 Beijing 1029 ... 2010m3 2010 Beijing 1029 ... .... 2015m1 2015 Paris 1030 etc

For different cities and different time periods. I would like to add observations of other cities (that are not in the initial dataset) which I have in different .csv files (datasets B,C, etc..). For each city that I want to add I have the following variables

1) the dates
2) the urbanname
3) the urbancode
4) other variables which I do not yet have in my initial dataset but that I want to add

What would be your advice on how to proceed ? I thought of doing it with R but the dataset A does not open well in RStudio and the variable Date is not well imported.
Ideally, i'd like to use a loop as I have 17 .csv files, each containing the information of a city.

Thank you very much for your help,

Benoît

Last edited by Benoit Decoco; 05 Jul 2022, 08:07.
Tags: append, data, merge multiple datasets
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

05 Jul 2022, 07:19

Please use dataex to present your data.
Comment

Benoit Decoco

Join Date: Jun 2022
Posts: 17

05 Jul 2022, 07:34

Dear Jared,

Unfortunately I have more than 1M observations I tried to use dataex but the result seems less clear than what I put in the first post.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(makeid year) strL urbanname long urbancode
  5 1854 "Bielefeld" 1
  1 1854 "Bielefeld" 1
  4 1854 "Bielefeld" 1
  6 1854 "Bielefeld" 1
  7 1854 "Bielefeld" 1
  3 1854 "Bielefeld" 1
  2 1854 "Bielefeld" 1
 17 1855 "Bielefeld" 1
  8 1855 "Bielefeld" 1
 12 1855 "Bielefeld" 1
 15 1855 "Bielefeld" 1
 13 1855 "Bielefeld" 1
 19 1855 "Bielefeld" 1
 11 1855 "Bielefeld" 1
 14 1855 "Bielefeld" 1
 16 1855 "Bielefeld" 1
  9 1855 "Bielefeld" 1
 18 1855 "Bielefeld" 1
 10 1855 "Bielefeld" 1
 21 1856 "Bielefeld" 1
 20 1856 "Bielefeld" 1
 22 1856 "Bielefeld" 1
 30 1856 "Bielefeld" 1
 25 1856 "Bielefeld" 1
 23 1856 "Bielefeld" 1
 27 1856 "Bielefeld" 1
 29 1856 "Bielefeld" 1
 24 1856 "Bielefeld" 1
 28 1856 "Bielefeld" 1
 26 1856 "Bielefeld" 1
 31 1856 "Bielefeld" 1
 32 1857 "Bielefeld" 1
 39 1857 "Bielefeld" 1
 42 1857 "Bielefeld" 1
 34 1857 "Bielefeld" 1
 43 1857 "Bielefeld" 1
 41 1857 "Bielefeld" 1
 37 1857 "Bielefeld" 1
 38 1857 "Bielefeld" 1
 33 1857 "Bielefeld" 1
 36 1857 "Bielefeld" 1
 35 1857 "Bielefeld" 1
 40 1857 "Bielefeld" 1
 53 1858 "Bielefeld" 1
 45 1858 "Bielefeld" 1
 44 1858 "Bielefeld" 1
 55 1858 "Bielefeld" 1
 50 1858 "Bielefeld" 1
 46 1858 "Bielefeld" 1
 52 1858 "Bielefeld" 1
 47 1858 "Bielefeld" 1
 48 1858 "Bielefeld" 1
 49 1858 "Bielefeld" 1
 51 1858 "Bielefeld" 1
 54 1858 "Bielefeld" 1
 61 1859 "Bielefeld" 1
 66 1859 "Bielefeld" 1
 58 1859 "Bielefeld" 1
 56 1859 "Bielefeld" 1
 65 1859 "Bielefeld" 1
 62 1859 "Bielefeld" 1
 59 1859 "Bielefeld" 1
 63 1859 "Bielefeld" 1
 60 1859 "Bielefeld" 1
 64 1859 "Bielefeld" 1
 67 1859 "Bielefeld" 1
 57 1859 "Bielefeld" 1
 70 1860 "Bielefeld" 1
 77 1860 "Bielefeld" 1
 74 1860 "Bielefeld" 1
 72 1860 "Bielefeld" 1
 79 1860 "Bielefeld" 1
 71 1860 "Bielefeld" 1
 75 1860 "Bielefeld" 1
 73 1860 "Bielefeld" 1
 78 1860 "Bielefeld" 1
 68 1860 "Bielefeld" 1
 69 1860 "Bielefeld" 1
 76 1860 "Bielefeld" 1
 91 1861 "Bielefeld" 1
 89 1861 "Bielefeld" 1
 88 1861 "Bielefeld" 1
 82 1861 "Bielefeld" 1
 84 1861 "Bielefeld" 1
 86 1861 "Bielefeld" 1
 80 1861 "Bielefeld" 1
 85 1861 "Bielefeld" 1
 83 1861 "Bielefeld" 1
 87 1861 "Bielefeld" 1
 90 1861 "Bielefeld" 1
 81 1861 "Bielefeld" 1
 94 1862 "Bielefeld" 1
102 1862 "Bielefeld" 1
 99 1862 "Bielefeld" 1
 93 1862 "Bielefeld" 1
100 1862 "Bielefeld" 1
 95 1862 "Bielefeld" 1
 92 1862 "Bielefeld" 1
 97 1862 "Bielefeld" 1
 98 1862 "Bielefeld" 1
end

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

05 Jul 2022, 07:58

I don't understand. Do these csv files already have the variables you've described?
Comment

Benoit Decoco

Join Date: Jun 2022
Posts: 17

05 Jul 2022, 08:17

I edited my initial post to make it clearer.
I have an initial dataset A which is described in the initial post. I have other datasets (B,C, ...) , each one describing a city that is not in dataset A. The datasets B,C,... have 3 variables (or columns) in common with the dataset A : date, urbancode, urbanname. So I want to add the other variables to dataset A.

My goal is to merge the rows of datasets B,C,... with the dataset A.

This is the dataex description of one of the datasets (B,C,...) which is on Jakarta.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str12 date str8 urbanname int urbancode float pm25
"15-août"     "Jarkarta" 1017         .
"15-décembre" "Jarkarta" 1017  84.16666
"15-juillet"   "Jarkarta" 1017         .
"15-octobre"   "Jarkarta" 1017         .
"15-septembre" "Jarkarta" 1017         .
"16-août"     "Jarkarta" 1017 117.98387
"16-avril"     "Jarkarta" 1017 105.28333
"16-décembre" "Jarkarta" 1017 68.935486
"16-février"  "Jarkarta" 1017  89.82758
"16-janvier"   "Jarkarta" 1017 102.48276
"16-juillet"   "Jarkarta" 1017 118.93549
"16-juin"      "Jarkarta" 1017 128.51666
"16-mai"       "Jarkarta" 1017 123.35484
"16-mars"      "Jarkarta" 1017 103.32258
"16-novembre"  "Jarkarta" 1017 118.73333
"16-octobre"   "Jarkarta" 1017  111.3387
"16-septembre" "Jarkarta" 1017 108.01667
"17-août"     "Jarkarta" 1017  98.37615
"17-avril"     "Jarkarta" 1017  78.38889
"17-décembre" "Jarkarta" 1017      51.1
"17-février"  "Jarkarta" 1017  78.10053
"17-janvier"   "Jarkarta" 1017     68.25
"17-juillet"   "Jarkarta" 1017 100.51613
"17-juin"      "Jarkarta" 1017  84.63333
"17-mai"       "Jarkarta" 1017 109.27586
"17-mars"      "Jarkarta" 1017  78.10789
"17-novembre"  "Jarkarta" 1017  67.02778
"17-octobre"   "Jarkarta" 1017   86.3023
"17-septembre" "Jarkarta" 1017    102.75
end

I hope this is clearer.

Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2172

05 Jul 2022, 08:31

See, this is why we ask for example data. Now that I have a good visual representation of what the issue is, I can understand what's going on much better. I think I know the issue, but my question for you now, is why is your data monthly in the first and third dataexes, but not in the second one?

Either way, you don't want to "merge" the rows. What you wanna do, is append them to your first dataset. Consider the full worked example

Code:

clear
* Example generated by -dataex-. For more info, type help dataex
clear
input float(time gdp) long region
132 .064902 22
133 .065123 22
134 .067379 22
135 .069164 22
136 .069452 22
137 .070135 22
138 .069298 22
139 .076049 22
140 .071104 22
141 .069214 22
142 .063417 22
143 .048752 22
144 .054664 22
145 .059415 22
146 .059393 22
147 .070151 22
148 .063613 22
149 .060897 22
150 .064957 22
151 .065703 22
152 .057225 22
153 .049896 22
154 .039404 22
155 .032541 22
156 .046425 22
157 .065049 22
158 .047784 22
159 .063896 22
end
format %tq time


tempfile Taiwan

sa `Taiwan', replace

* Example generated by -dataex-. For more info, type help dataex
clear
input float(time gdp) long region
132  .062 9
133  .059 9
134  .058 9
135  .062 9
136  .079 9
137  .068 9
138  .046 9
139  .052 9
140  .037 9
141  .029 9
142  .012 9
143  .015 9
144  .025 9
145  .036 9
146  .047 9
147  .059 9
148  .058 9
149  .072 9
150  .061 9
151  .014 9
152 -.032 9
153 -.061 9
154 -.081 9
155 -.065 9
156 -.029 9
157  .005 9
158  .039 9
159  .083 9
end
format %tq time


tempfile HongKong

sa `HongKong', replace

clear

foreach x in Taiwan HongKong {

ap using ``x''    
    
    
}

label values region region
label def region 9 "HongKong" 22 "Taiwan", modify

br

i save datasets as tempfiles (same variables names, all that jazz) and then I append them on top of one another.

Comment

Benoit Decoco

Join Date: Jun 2022

Posts: 17
#7

05 Jul 2022, 08:51

Thank you Jared !

The second dataex is also monthly data but it is written in french because of a R function I used, I am going to modify it to match the format of the first dataex.

I still have a little question : there are variables in the .csv files that do not exist in dataset A. Won't this be a problem when I append the .csv files to dataset A ?
In your example, imagine there is an extra variable in the Hongkong dataset that does not exist in the Taiwan dataset. How should I do ?

Thank you again for your help !

Last edited by Benoit Decoco; 05 Jul 2022, 08:55.
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2172

05 Jul 2022, 09:31

Code:

clear
* Example generated by -dataex-. For more info, type help dataex
clear
input float(time gdp) long region
132 .064902 22
133 .065123 22
134 .067379 22
135 .069164 22
136 .069452 22
137 .070135 22
138 .069298 22
139 .076049 22
140 .071104 22
141 .069214 22
142 .063417 22
143 .048752 22
144 .054664 22
145 .059415 22
146 .059393 22
147 .070151 22
148 .063613 22
149 .060897 22
150 .064957 22
151 .065703 22
152 .057225 22
153 .049896 22
154 .039404 22
155 .032541 22
156 .046425 22
157 .065049 22
158 .047784 22
159 .063896 22
end
format %tq time


tempfile Taiwan

sa `Taiwan', replace

* Example generated by -dataex-. For more info, type help dataex
clear
input float(time gdp) long region
132  .062 9
133  .059 9
134  .058 9
135  .062 9
136  .079 9
137  .068 9
138  .046 9
139  .052 9
140  .037 9
141  .029 9
142  .012 9
143  .015 9
144  .025 9
145  .036 9
146  .047 9
147  .059 9
148  .058 9
149  .072 9
150  .061 9
151  .014 9
152 -.032 9
153 -.061 9
154 -.081 9
155 -.065 9
156 -.029 9
157  .005 9
158  .039 9
159  .083 9
end
format %tq time

g taino =1


tempfile HongKong

sa `HongKong', replace

clear

foreach x in Taiwan HongKong {

ap using ``x''    
    
    
}

label values region region
label def region 9 "HongKong" 22 "Taiwan", modify

br

Nothing bad will happen, you'll just have lots of missing data. My advice to you is to not use R. Only use R for analysis, there are few things Stata can't do that R can.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35725
#9

05 Jul 2022, 10:53

Cross-posted at https://stackoverflow.com/questions/...files-in-stata

Please note our policy on cross-posting, which is that you should tell us about it. https://www.statalist.org/forums/help#crossposting
Comment

Announcement