percentage of new peers

Sima Parkam

Join Date: Jul 2023

Posts: 4
#1

percentage of new peers

27 Jul 2023, 03:04

Hello,

I have two datasets for grade 5 and grade 8 (for four cohorts). In other words, for each cohort I have data related to grade 5 and 8. There is a unique identifier and this variable is called SIDI_Invalsi in my data. I merge the two datasets based on this variable. Moreover, in my dataset I have class_id and school_id.
Now, I have a question.
I would like to compute the percentage of new peers for each student at grade 8.
I would be very thankful if you could help me with this.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#2

27 Jul 2023, 04:48

Sima:
welcome to this forum.
Without further details (please see the FAQ on how to poste more effectively), you may want to take a look at the -pc- function available from -egen-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Sima Parkam

Join Date: Jul 2023

Posts: 4
#3

27 Jul 2023, 23:42

Each of my datasets at grade 5 and 8 has the same structure and the same variables. They both include students' characteristics and parents' characteristics. But, the variable I can merge the two datasets based on it is SIDI_Invalsi which is the identifier. Grade 5 is important because I want to track students at grade 5 and 8.
Now, I want to compute when students move from primary school (grade 5) to middle school (grade 8), how many percent of their peers at grade 8 are new?

This is the commands I have tried so far:

use G5.dta, clear
merge 1:1 SIDI_Invalsi using G8.dta

after merging, 3 shows the matched observations, 2 shows using data , 1 shows master data

gen new_peers = (_merge == 2)
egen total_new = total(new_peers)
drop if _merge ==1

gen total_students = (_merge == 2 | _merge == 3)
egen total_students_count = total(total_students)

gen pct_new_peers = total_new / total_students_count * 100

But, the result I get is 28%! which I think the path I tried is not correct.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#4

28 Jul 2023, 00:03

Sima Parkam there is a lot to fix here.

First off, you need to help others help you. For this, providing a data extract using the -dataex- command (as suggested in the FAQ point #12) is invaluable. In this case, an extract of your merged dataset would have been a good thing to provide. Second, you need to provide more context. For example, you end #3 by saying the result you get is 28%, which is not a number that is nonsensical by itself. If it surprises you, we need to understand why.

Your code appears to be computing new students (proxied by the fact that their grade 8 records do not merge with grade 5, so they must be new) as a fraction of total students (new + those with both grade 5 and grade 8). This you say works out to 28%. Again, why is it surprising?

Also, if you just want the percentage, you do not need to compute new variables -- this is very inefficient. You need a single number, whereas a variable is a vector, and in your case, all the elements of it will take the same value. This can be very inefficient -- imagine you had data of millions of observations, all of which now store the same value in this column! You can work with scalars or macros instead.

Something like this: (WARNING: untested code, since there was no data extract to test it on!)

Code:

sum if _merge == 2 local total_new = r(sum) sum if inlist(_merge, 2, 3) local total_students = r(sum) local pct_new_peers = `total_new'/`total_students' * 100 dis "The percentage of new peers is `pct_new_peers'%."
1 like
Comment

Sima Parkam

Join Date: Jul 2023
Posts: 4

28 Jul 2023, 01:21

I want to compute the percentage of new peers at grade 8 in order to see if there is a correlation between the scores at grade 5 and 8. When the percentage of new peers are higher, it means the possibility of having correlation between the grade 5 and 8 test scores is very low. The only variable that we can merge our data based on it is the identifier.

My data for grade 8 look like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long G8_school_id double G8_class_id long(G8_student_id identifier)
40407422 404074220805 174060   1989
41601169 416011690803 468079   4484
40606050 406060500803 240235   7403
40407422 404074220801 173981   8585
41903252 419032520801 532142   9625
21906249 219062490804  71298  11283
41204380 412043800806 363841  11514
42003026 420030260802 568175  12375
41204479 412044790803 369666  15985
40404020 404040200802 147197  16357
40407416 404074160804 173381  16365
40407422 404074220807 174091  16637
40407422 404074220801 173973  16638
40407422 404074220802 173994  16639
40407422 404074220806 174073  16640
40407422 404074220803 174012  16641
40407422 404074220803 174013  16642
40407422 404074220805 174054  16643
40407422 404074220804 174032  16644
40407422 404074220803 174014  16645
40407205 404072050805 155184  16646
40407422 404074220807 174097  16649
40407422 404074220807 174098  16650
40407422 404074220803 174018  16651
40407205 404072050805 155194  16652
40407422 404074220804 174039  16653
40407422 404074220807 174099  16654
40407422 404074220802 174001  16655
40407422 404074220803 174019  16656
40407484 404074840802 181300  16657
40407422 404074220807 174102  16658
40808056 408080560804 296552  17054
41801054 418010540804 499621  17332
41505064 415050640805 453803  17465
40206187 402061870807  92331  17858
41504560 415045600801 434849  19252
41804082 418040820804 512270  20412
20403086 204030860801  15646  22025
41604069 416040690803 483481  22232
41204669 412046690806 389967  22931
41204720 412047200803 395852  23816
41101028 411010280805 337428  24974
40607068 406070680802 249245  28054
40804053 408040530802 280618  29062
40809034 408090340801 299127  29836
20411029 204110290801  20917  32538
21504680 215046800806  43299  37144
40402110 404021100803 135727  39316
40407292 404072920802 160088  40758
40407408 404074080803 172488  40908
40402052 404020520802 130413  43406
40607065 406070650804 248940  73230
20407159 204071590802  18334  80357
41801035 418010350804 498786  87109
40206318 402063180808 101273 132497
41001094 410010940803 331576 137591
40206192 402061920801  92573 154654
41906244 419062440808 548370 163438
40801087 408010870806 269607 169227
40302023 403020230802 114627 177577
41204646 412046460808 387308 203048
40301115 403011150802 111794 206281
41909063 419090630803 559369 209339
20202025 202020250812 576782 209445
40407402 404074020802 171924 210449
40910023 409100230803 328666 210599
40402173 404021730801 140954 212979
41204487 412044870807 370461 213733
40801095 408010950801 270311 214383
40407341 404073410801 165697 214422
40204073 402040730804  85817 214548
41804096 418040960804 513621 214989
40202021 402020210803  80019 215781
41907048 419070480803 551657 216369
41702056 417020560801 495890 216790
41504562 415045620804 435050 221334
21504302 215043020801  42700 221340
20407118 204071180802  16545 222430
41505064 415050640805 453802 222494
40606123 406061230803 247357 222631
40407501 404075010805 183206 223321
40704041 407040410801 262697 223896
40411117 404111170805 204840 225619
42001086 420010860804 561874 225738
40206345 402063450805 104729 226440
40407542 404075420810 187603 227729
41906256 419062560802 548808 230442
40701010 407010100803 256019 230471
41801114 418011140801 502238 230477
40401108 404011080807 124695 231636
41204727 412047270803 396674 231726
41602060 416020600801 474752 231787
21503222 215032220811  42058 234672
41204517 412045170806 373828 238415
40204087 402040870803  87298 238584
40909035 409090350804 327191 238630
41504562 415045620807 435115 238989
40802027 408020270801 275638 239022
40909028 409090280802 326621 239141
40205036 402050360801  88399 239204
end

and this is the data related to grade 5:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long G5_school_id double(G5_class_id G5_student_id) long identifier
10101001 101010010501 10101001050102 5450035
10101001 101010010501 10101001050103 5450036
10101001 101010010501 10101001050104 5450038
10101001 101010010501 10101001050105 5450039
10101001 101010010501 10101001050106 5450042
10101001 101010010501 10101001050107 5450043
10101001 101010010501 10101001050108 5450044
10101001 101010010501 10101001050109 5449903
10101001 101010010501 10101001050110 5449881
10101001 101010010501 10101001050112 5450040
10101001 101010010501 10101001050113 5450045
10101001 101010010501 10101001050114 5450046
10101001 101010010501 10101001050115 5450047
10101001 101010010501 10101001050116 5450048
10101001 101010010501 10101001050117 5449891
10101001 101010010501 10101001050118 5450049
10101001 101010010501 10101001050119 5450050
10101001 101010010501 10101001050120 5450051
10101001 101010010501 10101001050121 5450052
10101001 101010010501 10101001050122 5449882
10101001 101010010501 10101001050123 5450053
10101001 101010010501 10101001050124 5450055
10101001 101010010501 10101001050125 5450054
10101001 101010010501 10101001050126 5450056
10101001 101010010501 10101001050127 5449883
10101003 101010030501 10101003050101 5450063
10101003 101010030501 10101003050102 5450069
10101003 101010030501 10101003050104 5450057
10101003 101010030501 10101003050105 5450060
10101003 101010030501 10101003050106 5450059
10101003 101010030501 10101003050107 5450061
10101003 101010030501 10101003050108 5450062
10101003 101010030501 10101003050110 5450065
10101003 101010030501 10101003050111 5449801
10101003 101010030501 10101003050112 5450066
10101003 101010030501 10101003050113 5450067
10101003 101010030501 10101003050114 5450068
10101003 101010030501 10101003050115 5450071
10101003 101010030501 10101003050116 5450070
10101003 101010030501 10101003050117 5450072
10101003 101010030501 10101003050120 5450074
10101003 101010030501 10101003050121 5450075
10101003 101010030501 10101003050122 5449798
10201001 102010010501 10201001050101 4954445
10201001 102010010501 10201001050102 4954446
10201001 102010010501 10201001050103 4954447
10201001 102010010501 10201001050104 4954451
10201001 102010010501 10201001050105 4954455
10201001 102010010501 10201001050106 4954456
10201001 102010010501 10201001050107 4954457
10201001 102010010501 10201001050108 4954458
10201001 102010010501 10201001050109 4954460
10201001 102010010501 10201001050110 5248133
10201001 102010010501 10201001050111 5079060
10201001 102010010501 10201001050112 4954464
10201001 102010010501 10201001050113 4954465
10201001 102010010501 10201001050114 4954468
10201001 102010010501 10201001050115 4954469
10201001 102010010501 10201001050116 4954474
10201001 102010010501 10201001050117 4954475
10201001 102010010501 10201001050118 4954477
10201001 102010010501 10201001050119 4918468
10201001 102010010501 10201001050120 5173699
10201001 102010010501 10201001050121 4954480
10201001 102010010501 10201001050122 5122649
10201001 102010010501 10201001050123 4954482
10201001 102010010501 10201001050124 4992265
10201001 102010010501 10201001050125 4954484
10201001 102010010501 10201001050126 5395421
10201001 102010010501 10201001050127 4954486
10201001 102010010501 10201001050128 4954488
10201001 102010010502 10201001050201 4954448
10201001 102010010502 10201001050202 4954449
10201001 102010010502 10201001050203 4954450
10201001 102010010502 10201001050204 4954452
10201001 102010010502 10201001050205 4954453
10201001 102010010502 10201001050206 5021983
10201001 102010010502 10201001050207 4954454
10201001 102010010502 10201001050208 5248132
10201001 102010010502 10201001050209 4954459
10201001 102010010502 10201001050210 4954461
10201001 102010010502 10201001050211 5021995
10201001 102010010502 10201001050212 5442573
10201001 102010010502 10201001050213 4954463
10201001 102010010502 10201001050214 5403648
10201001 102010010502 10201001050215 4956975
10201001 102010010502 10201001050216 4954466
10201001 102010010502 10201001050217 4954467
10201001 102010010502 10201001050218 4954470
10201001 102010010502 10201001050219 4954472
10201001 102010010502 10201001050220 4954473
10201001 102010010502 10201001050221 4954476
10201001 102010010502 10201001050222 4918467
10201001 102010010502 10201001050223 4954478
10201001 102010010502 10201001050224 4954479
10201001 102010010502 10201001050225 4954481
10201001 102010010502 10201001050226 4954485
10201001 102010010502 10201001050228 4954487
10201003 102010030501 10201003050101 5233110
10201003 102010030501 10201003050102 4999486
end

I hope I could provide the data example properly.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#6

28 Jul 2023, 01:42

I'm afraid this data example is not helpful. Neither of the two datasets contain the variable you are using to merge them: SIDI_Invalsi

Also, from what I can see, the identifiers present in the data extract take very different values across the two datasets, so even if it is one of the variables in the extract, the merge is not likely to produce any matched observations.

Please provide a better data extract, perhaps using an if condition to subset to the same school. Or, as I suggested before, it may be simpler to provide an extract of the merged data, rather than the original grade 5 and 8 files.

A good way to check whether the data extract will be useful to others is to put yourself in their shoes: take your data extract like any of us would have to, run your code on it, and see if it demonstrates the problem you are trying to solve.

Last edited by Hemanshu Kumar; 28 Jul 2023, 01:45.
1 like
Comment

Sima Parkam

Join Date: Jul 2023
Posts: 4

28 Jul 2023, 07:37

This is the merged data only for mathematics (merged data of grade 5 and 8). I hope this time it works. I have the same merged data for reading and I need to merge these two merged datasets together.
My question is how can I then compute the percentage of new peers at grade 8?

Is it true if I use:

Code:

* egen peers_G5 = group(G5_CODICE_SCUOLA G5_CODICE_CLASSE)
egen peers_G8 = group(G8_CODICE_SCUOLA G8_CODICE_CLASSE)

bysort peers_G8: egen total_peers_G8 = count(SIDI_Invalsi)
bysort peers_G5 peers_G8: egen new_peers = total(peers_G5 != peers_G8)
gen pct_new_peers = new_peers / total_peers_G8 * 100

end

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long G8_CODICE_SCUOLA double G8_CODICE_CLASSE long(SIDI_Invalsi G8_CODICE_STUDENTE G5_CODICE_SCUOLA) double(G5_CODICE_CLASSE G5_CODICE_STUDENTE)
40407422 404074220805   1989 174060 40407422 404074220502 40407422050208
40407422 404074220801   8585 173981 40407422 404074220501 40407422050113
41204380 412043800806  11514 363841 41204699 412046990502 41204699050205
40404020 404040200802  16357 147197 40404020 404040200502 40404020050204
40407416 404074160804  16365 173381 40407416 404074160505 40407416050527
40407422 404074220807  16637 174091 40407422 404074220501 40407422050102
40407422 404074220801  16638 173973 40407422 404074220502 40407422050201
40407422 404074220802  16639 173994 40407422 404074220501 40407422050103
40407422 404074220806  16640 174073 40407422 404074220503 40407422050301
40407422 404074220803  16641 174012 40407422 404074220502 40407422050203
40407422 404074220803  16642 174013 40407422 404074220501 40407422050105
40407422 404074220805  16643 174054 40407422 404074220501 40407422050106
40407422 404074220804  16644 174032 40407422 404074220503 40407422050302
40407422 404074220803  16645 174014 40407422 404074220501 40407422050107
40407422 404074220807  16649 174097 40407422 404074220502 40407422050205
40407422 404074220807  16650 174098 40407422 404074220502 40407422050206
40407422 404074220803  16651 174018 40407422 404074220501 40407422050114
40407205 404072050805  16652 155194 40407422 404074220503 40407422050309
40407422 404074220804  16653 174039 40407422 404074220501 40407422050115
40407422 404074220807  16654 174099 40407422 404074220501 40407422050116
40407422 404074220802  16655 174001 40407422 404074220503 40407422050310
40407422 404074220803  16656 174019 40407422 404074220503 40407422050311
40407422 404074220807  16658 174102 40407422 404074220501 40407422050118
20403086 204030860801  22025  15646 10403007 104030070501 10403007050116
41204669 412046690806  22931 389967 41204482 412044820505 41204482050521
41204720 412047200803  23816 395852 41204720 412047200503 41204720050309
41101028 411010280805  24974 337428 41101028 411010280502 41101028050212
40607068 406070680802  28054 249245 40607068 406070680503 40607068050302
40407408 404074080803  40908 172488 40407408 404074080503 40407408050321
40206192 402061920801 154654  92573 40206192 402061920504 40206192050409
40301115 403011150802 206281 111794 40301115 403011150502 40301115050211
40801095 408010950801 214383 270311 40801095 408010950503 40801095050302
40407341 404073410801 214422 165697 40407341 404073410501 40407341050116
40606123 406061230803 222631 247357 40606123 406061230501 40606123050101
40206345 402063450805 226440 104729 40206242 402062420503 40206242050310
40407542 404075420810 227729 187603 40407233 404072330504 40407233050417
41204517 412045170806 238415 373828 41204584 412045840506 41204584050613
40204087 402040870803 238584  87298 10204010 102040100506 10204010050604
40410033 404100330804 242024 197938 40410033 404100330502 40410033050201
40302027 403020270806 251180 115061 40302027 403020270502 40302027050211
40203016 402030160804 266801  81322 40203014 402030140501 40203014050113
40409029 404090290801 272463 193302 40409029 404090290503 40409029050320
41101028 411010280806 275779 337432 41101028 411010280502 41101028050204
41101028 411010280806 275781 337434 41101028 411010280501 41101028050102
41101028 411010280806 275794 337433 41101028 411010280502 41101028050205
41101028 411010280806 275799 337449 41101028 411010280501 41101028050117
41101028 411010280802 275813 337360 41101028 411010280501 41101028050118
41202073 412020730803 295423 357020 41202016 412020160502 41202016050223
40907010 409070100801 301306 319082 40907010 409070100503 40907010050308
40302032 403020320802 304811 115482 40302015 403020150501 40302015050109
41201065 412010650803 312166 352420 41201065 412010650501 41201065050114
41201065 412010650801 312167 352379 41201065 412010650502 41201065050222
40407211 404072110802 312168 155844 40407211 404072110506 40407211050619
41102032 411020320806 316842 341658 41102032 411020320504 41102032050409
21001092 210010920801 316963  33967 11001035 110010350506 11001035050614
41104047 411040470801 323820 348473 41104047 411040470504 41104047050411
20904037 209040370801 330272  31830 10904017 109040170501 10904017050101
40410021 404100210802 330351 197162 40410021 404100210502 40410021050207
40410021 404100210801 330371 197139 40410021 404100210501 40410021050108
40901032 409010320803 333412 301381 40901032 409010320506 40901032050604
40807030 408070300801 351522 290774 40807030 408070300501 40807030050103
40605040 406050400802 358539 232540 41505064 415050640506 41505064050605
40304014 403040140801 369721 117824 40304014 403040140504 40304014050403
71205003 712050030803 391578 573646 41205045 412050450501 41205045050110
40407361 404073610804 397078 167763 40407361 404073610502 40407361050208
40205032 402050320804 397146  88062 40205032 402050320503 40205032050303
40205036 402050360803 404268  88431 40205036 402050360502 40205036050209
40604066 406040660804 411648 225216 40604066 406040660504 40604066050403
40206275 402062750805 419728 100066 40206275 402062750505 40206275050510
20902047 209020470801 424619  30803 10902028 109020280501 10902028050110
40807028 408070280805 427831 290605 40807018 408070180503 40807018050306
40407326 404073260803 444693 163945 40407326 404073260501 40407326050114
40302017 403020170804 459845 114220 40302015 403020150504 40302015050414
41203014 412030140801 468090 359028 41203014 412030140503 41203014050308
40409049 404090490802 472071 194985 40409014 404090140508 40409014050812
40905047 409050470804 494052 316293 40905047 409050470503 40905047050308
40407442 404074420805 503705 176627 40407442 404074420502 40407442050201
40808044 408080440801 505259 294941 40808044 408080440503 40808044050302
40204037 402040370802 513068  83148 40204037 402040370506 40204037050609
40407422 404074220801 521823 173988 40407422 404074220502 40407422050225
41204569 412045690803 535420 379463 11204425 112044250502 11204425050205
21204709 212047090803 535454  38454 11204425 112044250503 11204425050301
20407196 204071960801 577450  20076 10407025 104070250501 10407025050124
40604119 406041190801 577473 230620 40604117 406041170506 40604117050616
20407166 204071660801 577516  18850 10407023 104070230501 10407023050118
20804015 208040150803 577524  26988 40804044 408040440507 40804044050710
40302017 403020170801 577552 114146 40302015 403020150504 40302015050402
20407511 204075110802 577557  20121 40407460 404074600506 40407460050607
41104049 411040490801 577678 348603 11104009 111040090505 11104009050511
40301115 403011150803 577769 111810 40301115 403011150503 40301115050301
20206081 202060810802 579594   8224 10206018 102060180501 10206018050105
40407229 404072290807 580254 157342 10407034 104070340503 10407034050307
40504019 405040190803 581202 205830 40504019 405040190504 40504019050411
40504069 405040690802 581337 210323 40504066 405040660507 40504066050715
40504073 405040730801 581774 210788 40504073 405040730501 40504073050104
21204252 212042520801 584749  36655 11204093 112040930501 11204093050103
40504068 405040680805 585630 210266 40504068 405040680504 40504068050408
40406019 404060190805 586031 153201 40406019 404060190503 40406019050317
41201037 412010370803 592422 351262 11201006 112010060501 11201006050123
40602105 406021050801 604473 219435 41204545 412045450502 41204545050201
end

Announcement