Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aggregating individual-level data to studysite-level data

    Dear all,

    I am still very new to Stata and would need your help with the following:
    I am currently trying to examine the effect of annual average sunset times on childrens test scores in 4 countries. I have individual-level data on test scores, childrens basic infos, their family background, ... (in total 1500 variables). In addition to that I have annual average sunset data for every studysite in each of the 4 countries. In total I have 31 studysites.
    Now I have to merge the sunset data with the rest of the data and aggregate it at the district level.
    I tried to use the collapse command but Stata always deletes most of my variables.
    Can someone tell me how to aggregate my data at the studysite level without loosing variables or without having to type every single variable I want to keep in the varlist?
    Also, how does Stata treat dummy variables in that case?

    Any help would be very much appreciated.

    Best,
    Barbara

    This is my sunset data for all 31 studysite:
    Region StudySite# Longitude Latitude Annual Average Sunset 2006 Round2 OC
    Addis Ababa 1 38,78553835 8,980483001 18:28:38
    Amhara 2 38,06226519 11,55349417 18:31:39
    Oromia 3 38,76559313 7,507149591 18:28:40
    SNNPR 4 36,81105223 6,466449445 18:36:27
    Tigray 5 38,44150724 13,77672819 18:30:14
    West Godavari 6 81,40801065 16,85990733 18:08:31
    Srikakulam 7 84,04901558 18,55646022 17:58:03
    YSR Kadapa 8 78,78213005 14,4198334 18:18:54
    Anantapur 9 77,57242415 14,44111801 18:23:44
    Karimnagar 10 79,27815044 18,48450185 18:17:07
    Mahbubnagar 11 78,15832288 16,45622938 18:21:30
    Hyderabad 12 78,49838124 17,33738175 18:20:11
    Tumbes 13 -80,54457718 -3,856347577 18:28:38
    Piura 14 -80,33597652 -5,127998224 18:31:39
    Amazonas 15 -78,05489176 -5,070728741 18:28:40
    San Martin 16 -76,71502707 -7,034172687 18:36:27
    Cajamarca 17 -78,74544306 -6,433003181 18:30:14
    La Libertad 18 -78,36833749 -7,921413268 18:08:31
    Ancash 19 -77,6699664 -9,407022341 17:58:03
    Huanuco 20 -76,02561441 -9,4161985 18:18:54
    Lima 21 -76,62717287 -11,78556757 18:23:44
    Junin 22 -74,87907398 -11,53821381 18:17:07
    Ayacucho 23 -74,08367452 -14,08861044 18:21:30
    Apurimac 24 -72,97537937 -14,02813587 18:20:11
    Arequipa 25 -72,47623414 -15,84292367 18:13:12
    Puno 26 -69,95071434 -14,9291106 18:20:12
    Phu Yen 27 109,0593083 13,16937219 17:47:44
    Ben Tre 28 106,465743 10,12054253 17:58:06
    Lao Cai 29 104,1128728 22,36517675 18:08:01
    Hung Yen 30 106,0604002 20,81451683 18:00:08
    Da Nang 31 108,0611039 16,06856346 17:51:52

    This is an extract of my individual-level dataset, Addid Ababa is studysite #1
    childid country round studysite typesite cdint sex age age_months hhsize dad_alive mum_alive dad_edu mum_edu score_math
    ET021042 Ethiopia 2 Addis Ababa Urban 22. Nov 06 Male 11 143 2 Yes Yes 9 3 6
    ET021025 Ethiopia 2 Addis Ababa Urban 22. Nov 06 Male 11 141 9 Yes Yes None None 4
    ET021045 Ethiopia 2 Addis Ababa Urban 23. Nov 06 Female 12 148 9 Yes Yes Universi 10 6
    ET021001 Ethiopia 2 Addis Ababa Urban 24. Nov 06 Female 11 142 8 Yes Yes 6 6 3
    ET021018 Ethiopia 2 Addis Ababa Urban 24. Nov 06 Female 12 146 8 Yes Yes Adult li 4 8
    ET021047 Ethiopia 2 Addis Ababa Urban 24. Nov 06 Female 11 143 7 Yes Yes 11 None 8
    ET021010 Ethiopia 2 Addis Ababa Urban 24. Nov 06 Male 12 144 6 Yes Yes 8 Adult li 7
    ET021023 Ethiopia 2 Addis Ababa Urban 25. Nov 06 Male 11 139 9 Yes Yes Adult li None 6
    ET021044 Ethiopia 2 Addis Ababa Urban 25. Nov 06 Female 11 143 8 No Yes 7 None 6
    ET021046 Ethiopia 2 Addis Ababa Urban 26. Nov 06 Male 11 139 8 Yes Yes 6 2 5
    ET021024 Ethiopia 2 Addis Ababa Urban 28. Nov 06 Female 12 149 5 Yes Yes 8 5 6
    ET021012 Ethiopia 2 Addis Ababa Urban 28. Nov 06 Male 11 142 7 Yes Yes 7 Adult li 9
    ET031039 Ethiopia 2 Addis Ababa Urban 29. Nov 06 Female 12 148 4 Yes Yes Religiou 3 9
    ET021043 Ethiopia 2 Addis Ababa Urban 30. Nov 06 Male 11 142 7 Yes Yes 5 5 6
    ET021020 Ethiopia 2 Addis Ababa Urban 02dec2006 Female 12 145 5 Yes Yes Universi 9 6
    ET021003 Ethiopia 2 Addis Ababa Urban 02dec2006 Male 11 142 6 Yes Yes Adult li None 6
    ET021013 Ethiopia 2 Addis Ababa Urban 02dec2006 Male 11 142 6 No Yes None Adult li 6
    ET021016 Ethiopia 2 Addis Ababa Urban 02dec2006 Male 12 144 6 Yes Yes 6 3 7
    ET021028 Ethiopia 2 Addis Ababa Urban 03dec2006 Female 11 139 2 Yes Yes 6 3 5
    ET021017 Ethiopia 2 Addis Ababa Urban 03dec2006 Male 12 149 5 Yes Yes 6 4 6
    ET021021 Ethiopia 2 Addis Ababa Urban 03dec2006 Male 12 146 7 Yes Yes 4 Adult li 7
    ET021030 Ethiopia 2 Addis Ababa Urban 03dec2006 Female 11 142 6 Yes Yes None None 0
    ET021022 Ethiopia 2 Addis Ababa Urban 04dec2006 Female 11 140 8 Yes Yes 8 2 7
    ET021006 Ethiopia 2 Addis Ababa Urban 05dec2006 Female 12 146 5 Yes Yes Universi 5 7
    ET021037 Ethiopia 2 Addis Ababa Urban 06dec2006 Female 11 142 5 Yes Yes 3 Adult li 7
    ET021032 Ethiopia 2 Addis Ababa Urban 09dec2006 Male 11 143 8 Yes Yes 4 3 8
    ET021033 Ethiopia 2 Addis Ababa Urban 09dec2006 Male 11 139 4 No Yes Adult li 4 4
    ET021007 Ethiopia 2 Addis Ababa Urban 09dec2006 Female 11 142 4 Yes Yes 12 12 8
    ET021014 Ethiopia 2 Addis Ababa Urban 10dec2006 Female 12 145 6 Yes Yes Adult li Adult li 7
    ET021034 Ethiopia 2 Addis Ababa Urban 12dec2006 Male 11 143 8 Yes Yes 8 3 7

  • #2
    I tried to use the collapse command but Stata always deletes most of my variables.
    Yes, -collapse- drops any variables that do not appear in the command's statistics list or by() option.

    So the solution is to include those variables in the command. There are two different situations:

    1. The variable is actually a constant within the levels to which you are aggregating. So if you are aggregating to district level and if you have a variable that takes the same value in every observation within any given district, then you can just add this variable to the statistics list with (first) [or (min) or (max) or (mean) or (last) if you prefer--it doesn't matter as they are the same].

    2. The variable actually varies within the aggregation level. In that case, just choosing the first for inclusion would be inappropriate because it would probably be an arbitrary choice and, in any case, it discards information. In this case, the solution is to add it to the -by()- option.

    I would illustrate with your data but I do not understand its organization; I cannot tell what identifies districts vs sites vs regions. But I hope the explanation above is clear enough that you can figure it out from there.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      It looks to me that you want to actually merge in the sunset time data into your individual-level dataset (not collapse it). You'll want to rename the variable to studysite in both datasets (or so the variable has the same name in both datasets.)

      Code:
      * This assumes named variable containing "Addis Ababa" to studysite in both datasets
      use Individual_level.dta, clear
      merge m:1 studysite using sunset_dataset.dta, keepusing(study_site_id latitude longitude avg_sunset) keep(match master) gen(merge_sunset)

      Comment

      Working...
      X