Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • percentage of new peers

    Hello,

    I have two datasets for grade 5 and grade 8 (for four cohorts). In other words, for each cohort I have data related to grade 5 and 8. There is a unique identifier and this variable is called SIDI_Invalsi in my data. I merge the two datasets based on this variable. Moreover, in my dataset I have class_id and school_id.
    Now, I have a question.
    I would like to compute the percentage of new peers for each student at grade 8.
    I would be very thankful if you could help me with this.

  • #2
    Sima:
    welcome to this forum.
    Without further details (please see the FAQ on how to poste more effectively), you may want to take a look at the -pc- function available from -egen-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Each of my datasets at grade 5 and 8 has the same structure and the same variables. They both include students' characteristics and parents' characteristics. But, the variable I can merge the two datasets based on it is SIDI_Invalsi which is the identifier. Grade 5 is important because I want to track students at grade 5 and 8.
      Now, I want to compute when students move from primary school (grade 5) to middle school (grade 8), how many percent of their peers at grade 8 are new?

      This is the commands I have tried so far:

      use G5.dta, clear
      merge 1:1 SIDI_Invalsi using G8.dta

      after merging, 3 shows the matched observations, 2 shows using data , 1 shows master data

      gen new_peers = (_merge == 2)
      egen total_new = total(new_peers)
      drop if _merge ==1

      gen total_students = (_merge == 2 | _merge == 3)
      egen total_students_count = total(total_students)

      gen pct_new_peers = total_new / total_students_count * 100

      But, the result I get is 28%! which I think the path I tried is not correct.

      Comment


      • #4
        Sima Parkam there is a lot to fix here.

        First off, you need to help others help you. For this, providing a data extract using the -dataex- command (as suggested in the FAQ point #12) is invaluable. In this case, an extract of your merged dataset would have been a good thing to provide. Second, you need to provide more context. For example, you end #3 by saying the result you get is 28%, which is not a number that is nonsensical by itself. If it surprises you, we need to understand why.

        Your code appears to be computing new students (proxied by the fact that their grade 8 records do not merge with grade 5, so they must be new) as a fraction of total students (new + those with both grade 5 and grade 8). This you say works out to 28%. Again, why is it surprising?

        Also, if you just want the percentage, you do not need to compute new variables -- this is very inefficient. You need a single number, whereas a variable is a vector, and in your case, all the elements of it will take the same value. This can be very inefficient -- imagine you had data of millions of observations, all of which now store the same value in this column! You can work with scalars or macros instead.

        Something like this: (WARNING: untested code, since there was no data extract to test it on!)

        Code:
        sum if _merge == 2
        local total_new = r(sum)
        
        sum if inlist(_merge, 2, 3)
        local total_students = r(sum)
        
        local pct_new_peers = `total_new'/`total_students' * 100
        
        dis "The percentage of new peers is `pct_new_peers'%."

        Comment


        • #5
          I want to compute the percentage of new peers at grade 8 in order to see if there is a correlation between the scores at grade 5 and 8. When the percentage of new peers are higher, it means the possibility of having correlation between the grade 5 and 8 test scores is very low. The only variable that we can merge our data based on it is the identifier.



          My data for grade 8 look like this:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input long G8_school_id double G8_class_id long(G8_student_id identifier)
          40407422 404074220805 174060   1989
          41601169 416011690803 468079   4484
          40606050 406060500803 240235   7403
          40407422 404074220801 173981   8585
          41903252 419032520801 532142   9625
          21906249 219062490804  71298  11283
          41204380 412043800806 363841  11514
          42003026 420030260802 568175  12375
          41204479 412044790803 369666  15985
          40404020 404040200802 147197  16357
          40407416 404074160804 173381  16365
          40407422 404074220807 174091  16637
          40407422 404074220801 173973  16638
          40407422 404074220802 173994  16639
          40407422 404074220806 174073  16640
          40407422 404074220803 174012  16641
          40407422 404074220803 174013  16642
          40407422 404074220805 174054  16643
          40407422 404074220804 174032  16644
          40407422 404074220803 174014  16645
          40407205 404072050805 155184  16646
          40407422 404074220807 174097  16649
          40407422 404074220807 174098  16650
          40407422 404074220803 174018  16651
          40407205 404072050805 155194  16652
          40407422 404074220804 174039  16653
          40407422 404074220807 174099  16654
          40407422 404074220802 174001  16655
          40407422 404074220803 174019  16656
          40407484 404074840802 181300  16657
          40407422 404074220807 174102  16658
          40808056 408080560804 296552  17054
          41801054 418010540804 499621  17332
          41505064 415050640805 453803  17465
          40206187 402061870807  92331  17858
          41504560 415045600801 434849  19252
          41804082 418040820804 512270  20412
          20403086 204030860801  15646  22025
          41604069 416040690803 483481  22232
          41204669 412046690806 389967  22931
          41204720 412047200803 395852  23816
          41101028 411010280805 337428  24974
          40607068 406070680802 249245  28054
          40804053 408040530802 280618  29062
          40809034 408090340801 299127  29836
          20411029 204110290801  20917  32538
          21504680 215046800806  43299  37144
          40402110 404021100803 135727  39316
          40407292 404072920802 160088  40758
          40407408 404074080803 172488  40908
          40402052 404020520802 130413  43406
          40607065 406070650804 248940  73230
          20407159 204071590802  18334  80357
          41801035 418010350804 498786  87109
          40206318 402063180808 101273 132497
          41001094 410010940803 331576 137591
          40206192 402061920801  92573 154654
          41906244 419062440808 548370 163438
          40801087 408010870806 269607 169227
          40302023 403020230802 114627 177577
          41204646 412046460808 387308 203048
          40301115 403011150802 111794 206281
          41909063 419090630803 559369 209339
          20202025 202020250812 576782 209445
          40407402 404074020802 171924 210449
          40910023 409100230803 328666 210599
          40402173 404021730801 140954 212979
          41204487 412044870807 370461 213733
          40801095 408010950801 270311 214383
          40407341 404073410801 165697 214422
          40204073 402040730804  85817 214548
          41804096 418040960804 513621 214989
          40202021 402020210803  80019 215781
          41907048 419070480803 551657 216369
          41702056 417020560801 495890 216790
          41504562 415045620804 435050 221334
          21504302 215043020801  42700 221340
          20407118 204071180802  16545 222430
          41505064 415050640805 453802 222494
          40606123 406061230803 247357 222631
          40407501 404075010805 183206 223321
          40704041 407040410801 262697 223896
          40411117 404111170805 204840 225619
          42001086 420010860804 561874 225738
          40206345 402063450805 104729 226440
          40407542 404075420810 187603 227729
          41906256 419062560802 548808 230442
          40701010 407010100803 256019 230471
          41801114 418011140801 502238 230477
          40401108 404011080807 124695 231636
          41204727 412047270803 396674 231726
          41602060 416020600801 474752 231787
          21503222 215032220811  42058 234672
          41204517 412045170806 373828 238415
          40204087 402040870803  87298 238584
          40909035 409090350804 327191 238630
          41504562 415045620807 435115 238989
          40802027 408020270801 275638 239022
          40909028 409090280802 326621 239141
          40205036 402050360801  88399 239204
          end

          and this is the data related to grade 5:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input long G5_school_id double(G5_class_id G5_student_id) long identifier
          10101001 101010010501 10101001050102 5450035
          10101001 101010010501 10101001050103 5450036
          10101001 101010010501 10101001050104 5450038
          10101001 101010010501 10101001050105 5450039
          10101001 101010010501 10101001050106 5450042
          10101001 101010010501 10101001050107 5450043
          10101001 101010010501 10101001050108 5450044
          10101001 101010010501 10101001050109 5449903
          10101001 101010010501 10101001050110 5449881
          10101001 101010010501 10101001050112 5450040
          10101001 101010010501 10101001050113 5450045
          10101001 101010010501 10101001050114 5450046
          10101001 101010010501 10101001050115 5450047
          10101001 101010010501 10101001050116 5450048
          10101001 101010010501 10101001050117 5449891
          10101001 101010010501 10101001050118 5450049
          10101001 101010010501 10101001050119 5450050
          10101001 101010010501 10101001050120 5450051
          10101001 101010010501 10101001050121 5450052
          10101001 101010010501 10101001050122 5449882
          10101001 101010010501 10101001050123 5450053
          10101001 101010010501 10101001050124 5450055
          10101001 101010010501 10101001050125 5450054
          10101001 101010010501 10101001050126 5450056
          10101001 101010010501 10101001050127 5449883
          10101003 101010030501 10101003050101 5450063
          10101003 101010030501 10101003050102 5450069
          10101003 101010030501 10101003050104 5450057
          10101003 101010030501 10101003050105 5450060
          10101003 101010030501 10101003050106 5450059
          10101003 101010030501 10101003050107 5450061
          10101003 101010030501 10101003050108 5450062
          10101003 101010030501 10101003050110 5450065
          10101003 101010030501 10101003050111 5449801
          10101003 101010030501 10101003050112 5450066
          10101003 101010030501 10101003050113 5450067
          10101003 101010030501 10101003050114 5450068
          10101003 101010030501 10101003050115 5450071
          10101003 101010030501 10101003050116 5450070
          10101003 101010030501 10101003050117 5450072
          10101003 101010030501 10101003050120 5450074
          10101003 101010030501 10101003050121 5450075
          10101003 101010030501 10101003050122 5449798
          10201001 102010010501 10201001050101 4954445
          10201001 102010010501 10201001050102 4954446
          10201001 102010010501 10201001050103 4954447
          10201001 102010010501 10201001050104 4954451
          10201001 102010010501 10201001050105 4954455
          10201001 102010010501 10201001050106 4954456
          10201001 102010010501 10201001050107 4954457
          10201001 102010010501 10201001050108 4954458
          10201001 102010010501 10201001050109 4954460
          10201001 102010010501 10201001050110 5248133
          10201001 102010010501 10201001050111 5079060
          10201001 102010010501 10201001050112 4954464
          10201001 102010010501 10201001050113 4954465
          10201001 102010010501 10201001050114 4954468
          10201001 102010010501 10201001050115 4954469
          10201001 102010010501 10201001050116 4954474
          10201001 102010010501 10201001050117 4954475
          10201001 102010010501 10201001050118 4954477
          10201001 102010010501 10201001050119 4918468
          10201001 102010010501 10201001050120 5173699
          10201001 102010010501 10201001050121 4954480
          10201001 102010010501 10201001050122 5122649
          10201001 102010010501 10201001050123 4954482
          10201001 102010010501 10201001050124 4992265
          10201001 102010010501 10201001050125 4954484
          10201001 102010010501 10201001050126 5395421
          10201001 102010010501 10201001050127 4954486
          10201001 102010010501 10201001050128 4954488
          10201001 102010010502 10201001050201 4954448
          10201001 102010010502 10201001050202 4954449
          10201001 102010010502 10201001050203 4954450
          10201001 102010010502 10201001050204 4954452
          10201001 102010010502 10201001050205 4954453
          10201001 102010010502 10201001050206 5021983
          10201001 102010010502 10201001050207 4954454
          10201001 102010010502 10201001050208 5248132
          10201001 102010010502 10201001050209 4954459
          10201001 102010010502 10201001050210 4954461
          10201001 102010010502 10201001050211 5021995
          10201001 102010010502 10201001050212 5442573
          10201001 102010010502 10201001050213 4954463
          10201001 102010010502 10201001050214 5403648
          10201001 102010010502 10201001050215 4956975
          10201001 102010010502 10201001050216 4954466
          10201001 102010010502 10201001050217 4954467
          10201001 102010010502 10201001050218 4954470
          10201001 102010010502 10201001050219 4954472
          10201001 102010010502 10201001050220 4954473
          10201001 102010010502 10201001050221 4954476
          10201001 102010010502 10201001050222 4918467
          10201001 102010010502 10201001050223 4954478
          10201001 102010010502 10201001050224 4954479
          10201001 102010010502 10201001050225 4954481
          10201001 102010010502 10201001050226 4954485
          10201001 102010010502 10201001050228 4954487
          10201003 102010030501 10201003050101 5233110
          10201003 102010030501 10201003050102 4999486
          end

          I hope I could provide the data example properly.

          Comment


          • #6
            I'm afraid this data example is not helpful. Neither of the two datasets contain the variable you are using to merge them: SIDI_Invalsi

            Also, from what I can see, the identifiers present in the data extract take very different values across the two datasets, so even if it is one of the variables in the extract, the merge is not likely to produce any matched observations.

            Please provide a better data extract, perhaps using an if condition to subset to the same school. Or, as I suggested before, it may be simpler to provide an extract of the merged data, rather than the original grade 5 and 8 files.

            A good way to check whether the data extract will be useful to others is to put yourself in their shoes: take your data extract like any of us would have to, run your code on it, and see if it demonstrates the problem you are trying to solve.
            Last edited by Hemanshu Kumar; 28 Jul 2023, 01:45.

            Comment


            • #7
              This is the merged data only for mathematics (merged data of grade 5 and 8). I hope this time it works. I have the same merged data for reading and I need to merge these two merged datasets together.
              My question is how can I then compute the percentage of new peers at grade 8?

              Is it true if I use:

              Code:
              * egen peers_G5 = group(G5_CODICE_SCUOLA G5_CODICE_CLASSE)
              egen peers_G8 = group(G8_CODICE_SCUOLA G8_CODICE_CLASSE)
              
              bysort peers_G8: egen total_peers_G8 = count(SIDI_Invalsi)
              bysort peers_G5 peers_G8: egen new_peers = total(peers_G5 != peers_G8)
              gen pct_new_peers = new_peers / total_peers_G8 * 100
              
              end
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input long G8_CODICE_SCUOLA double G8_CODICE_CLASSE long(SIDI_Invalsi G8_CODICE_STUDENTE G5_CODICE_SCUOLA) double(G5_CODICE_CLASSE G5_CODICE_STUDENTE)
              40407422 404074220805   1989 174060 40407422 404074220502 40407422050208
              40407422 404074220801   8585 173981 40407422 404074220501 40407422050113
              41204380 412043800806  11514 363841 41204699 412046990502 41204699050205
              40404020 404040200802  16357 147197 40404020 404040200502 40404020050204
              40407416 404074160804  16365 173381 40407416 404074160505 40407416050527
              40407422 404074220807  16637 174091 40407422 404074220501 40407422050102
              40407422 404074220801  16638 173973 40407422 404074220502 40407422050201
              40407422 404074220802  16639 173994 40407422 404074220501 40407422050103
              40407422 404074220806  16640 174073 40407422 404074220503 40407422050301
              40407422 404074220803  16641 174012 40407422 404074220502 40407422050203
              40407422 404074220803  16642 174013 40407422 404074220501 40407422050105
              40407422 404074220805  16643 174054 40407422 404074220501 40407422050106
              40407422 404074220804  16644 174032 40407422 404074220503 40407422050302
              40407422 404074220803  16645 174014 40407422 404074220501 40407422050107
              40407422 404074220807  16649 174097 40407422 404074220502 40407422050205
              40407422 404074220807  16650 174098 40407422 404074220502 40407422050206
              40407422 404074220803  16651 174018 40407422 404074220501 40407422050114
              40407205 404072050805  16652 155194 40407422 404074220503 40407422050309
              40407422 404074220804  16653 174039 40407422 404074220501 40407422050115
              40407422 404074220807  16654 174099 40407422 404074220501 40407422050116
              40407422 404074220802  16655 174001 40407422 404074220503 40407422050310
              40407422 404074220803  16656 174019 40407422 404074220503 40407422050311
              40407422 404074220807  16658 174102 40407422 404074220501 40407422050118
              20403086 204030860801  22025  15646 10403007 104030070501 10403007050116
              41204669 412046690806  22931 389967 41204482 412044820505 41204482050521
              41204720 412047200803  23816 395852 41204720 412047200503 41204720050309
              41101028 411010280805  24974 337428 41101028 411010280502 41101028050212
              40607068 406070680802  28054 249245 40607068 406070680503 40607068050302
              40407408 404074080803  40908 172488 40407408 404074080503 40407408050321
              40206192 402061920801 154654  92573 40206192 402061920504 40206192050409
              40301115 403011150802 206281 111794 40301115 403011150502 40301115050211
              40801095 408010950801 214383 270311 40801095 408010950503 40801095050302
              40407341 404073410801 214422 165697 40407341 404073410501 40407341050116
              40606123 406061230803 222631 247357 40606123 406061230501 40606123050101
              40206345 402063450805 226440 104729 40206242 402062420503 40206242050310
              40407542 404075420810 227729 187603 40407233 404072330504 40407233050417
              41204517 412045170806 238415 373828 41204584 412045840506 41204584050613
              40204087 402040870803 238584  87298 10204010 102040100506 10204010050604
              40410033 404100330804 242024 197938 40410033 404100330502 40410033050201
              40302027 403020270806 251180 115061 40302027 403020270502 40302027050211
              40203016 402030160804 266801  81322 40203014 402030140501 40203014050113
              40409029 404090290801 272463 193302 40409029 404090290503 40409029050320
              41101028 411010280806 275779 337432 41101028 411010280502 41101028050204
              41101028 411010280806 275781 337434 41101028 411010280501 41101028050102
              41101028 411010280806 275794 337433 41101028 411010280502 41101028050205
              41101028 411010280806 275799 337449 41101028 411010280501 41101028050117
              41101028 411010280802 275813 337360 41101028 411010280501 41101028050118
              41202073 412020730803 295423 357020 41202016 412020160502 41202016050223
              40907010 409070100801 301306 319082 40907010 409070100503 40907010050308
              40302032 403020320802 304811 115482 40302015 403020150501 40302015050109
              41201065 412010650803 312166 352420 41201065 412010650501 41201065050114
              41201065 412010650801 312167 352379 41201065 412010650502 41201065050222
              40407211 404072110802 312168 155844 40407211 404072110506 40407211050619
              41102032 411020320806 316842 341658 41102032 411020320504 41102032050409
              21001092 210010920801 316963  33967 11001035 110010350506 11001035050614
              41104047 411040470801 323820 348473 41104047 411040470504 41104047050411
              20904037 209040370801 330272  31830 10904017 109040170501 10904017050101
              40410021 404100210802 330351 197162 40410021 404100210502 40410021050207
              40410021 404100210801 330371 197139 40410021 404100210501 40410021050108
              40901032 409010320803 333412 301381 40901032 409010320506 40901032050604
              40807030 408070300801 351522 290774 40807030 408070300501 40807030050103
              40605040 406050400802 358539 232540 41505064 415050640506 41505064050605
              40304014 403040140801 369721 117824 40304014 403040140504 40304014050403
              71205003 712050030803 391578 573646 41205045 412050450501 41205045050110
              40407361 404073610804 397078 167763 40407361 404073610502 40407361050208
              40205032 402050320804 397146  88062 40205032 402050320503 40205032050303
              40205036 402050360803 404268  88431 40205036 402050360502 40205036050209
              40604066 406040660804 411648 225216 40604066 406040660504 40604066050403
              40206275 402062750805 419728 100066 40206275 402062750505 40206275050510
              20902047 209020470801 424619  30803 10902028 109020280501 10902028050110
              40807028 408070280805 427831 290605 40807018 408070180503 40807018050306
              40407326 404073260803 444693 163945 40407326 404073260501 40407326050114
              40302017 403020170804 459845 114220 40302015 403020150504 40302015050414
              41203014 412030140801 468090 359028 41203014 412030140503 41203014050308
              40409049 404090490802 472071 194985 40409014 404090140508 40409014050812
              40905047 409050470804 494052 316293 40905047 409050470503 40905047050308
              40407442 404074420805 503705 176627 40407442 404074420502 40407442050201
              40808044 408080440801 505259 294941 40808044 408080440503 40808044050302
              40204037 402040370802 513068  83148 40204037 402040370506 40204037050609
              40407422 404074220801 521823 173988 40407422 404074220502 40407422050225
              41204569 412045690803 535420 379463 11204425 112044250502 11204425050205
              21204709 212047090803 535454  38454 11204425 112044250503 11204425050301
              20407196 204071960801 577450  20076 10407025 104070250501 10407025050124
              40604119 406041190801 577473 230620 40604117 406041170506 40604117050616
              20407166 204071660801 577516  18850 10407023 104070230501 10407023050118
              20804015 208040150803 577524  26988 40804044 408040440507 40804044050710
              40302017 403020170801 577552 114146 40302015 403020150504 40302015050402
              20407511 204075110802 577557  20121 40407460 404074600506 40407460050607
              41104049 411040490801 577678 348603 11104009 111040090505 11104009050511
              40301115 403011150803 577769 111810 40301115 403011150503 40301115050301
              20206081 202060810802 579594   8224 10206018 102060180501 10206018050105
              40407229 404072290807 580254 157342 10407034 104070340503 10407034050307
              40504019 405040190803 581202 205830 40504019 405040190504 40504019050411
              40504069 405040690802 581337 210323 40504066 405040660507 40504066050715
              40504073 405040730801 581774 210788 40504073 405040730501 40504073050104
              21204252 212042520801 584749  36655 11204093 112040930501 11204093050103
              40504068 405040680805 585630 210266 40504068 405040680504 40504068050408
              40406019 404060190805 586031 153201 40406019 404060190503 40406019050317
              41201037 412010370803 592422 351262 11201006 112010060501 11201006050123
              40602105 406021050801 604473 219435 41204545 412045450502 41204545050201
              end

              Comment

              Working...
              X