Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • converting agregate data to individual data

    DAuid tot_pop tot_male 0_4male 5_9male 10_14male 15_19male 20_24male 25_29male 30_34male 35_39male 40_44male 45_49male 50_54male 55_59male 60_64male 60_69male 70_74male 75_79male 80_84male 85_overmale total_female 0-4female 5_9female 10_14female 15_19female 20-24female 25_29female 30_34female 35_39female 40_44female 45_49female 50_54female 55_59female 60_64female 65_69female 70_74female 75_79female 80_84female 85_over_female
    10010165 540 245 5 10 15 20 20 15 10 15 15 25 35 25 20 5 5 5 0 0 295 15 15 10 25 35 20 15 10 30 30 30 25 15 5 5 0 0 5
    10010166 374 175 5 10 5 20 20 10 10 5 15 15 25 15 10 5 0 0 0 0 200 5 10 15 15 20 10 5 10 15 25 25 15 15 0 5 0 5 0
    10010167 511 250 15 15 15 15 40 25 10 10 15 25 25 15 20 5 0 0 5 0 260 10 5 10 25 30 25 15 15 15 30 30 25 15 0 5 5 0 0
    10010168 595 285 5 10 10 20 25 20 10 25 20 20 20 25 25 20 15 5 5 5 315 10 15 10 25 25 30 15 25 15 20 30 30 30 10 15 10 0 5
    10010169 326 160 5 10 15 15 10 10 10 15 15 20 10 10 5 5 0 0 0 0 170 5 5 10 20 15 10 10 10 20 15 15 10 5 5 5 0 5 0
    10010170 453 215 10 10 15 20 25 20 5 15 15 20 20 20 15 0 0 5 5 0 235 5 10 15 20 20 15 5 20 25 15 25 20 15 5 5 5 5 5
    10010171 563 260 10 15 20 30 20 20 15 10 25 30 25 10 15 5 5 5 5 0 300 5 20 25 20 35 20 10 15 35 30 20 15 20 10 5 5 0 0
    10010172 246 120 5 5 5 5 15 15 10 5 10 10 15 10 5 0 5 0 0 0 125 5 10 10 10 5 10 10 10 15 5 10 10 0 10 5 5 0 0
    10010173 984 465 20 40 45 40 40 30 35 35 50 50 35 15 15 10 10 5 0 0 515 30 35 40 40 45 50 35 35 60 60 30 20 15 10 10 10 0 0
    Hi All
    I have a agregate data in the above format. DAUID is a census dissemination area. The figures in each cell indicate number of individuals in each category. tot_pop=total population, tot_male= total male population and 0_4male: number of male between 0 and 4 years and so on.
    I want to combine this data with people with certain disease condition(confidential data, so I could not share here) to look at the effect of age and sex (among other variables) on the ocurrence of disease. The confidential data has information(age, sex) on people with disease only.

    I am trying to make a dataset like the below using the above dataset, where 0 indicates male, and 1 female (male_female) and age_gp (0 to 4 =1, 5-9=2, and so on).
    I am hoping that once I combine the disease dataset with controls, the below dataset by using DAUID (dissemination area) variable, I can see the effect of age, and sex on disease ocurrence.
    Can you please let me know the STATA data set to do this. I am using STATA 15.1.
    many thanks
    Yuba
    DAUID male_female age_gp
    10010165 0 1
    10010165 1 2
    10010165 1 6
    10010165 1 8
    10010165 0 3
    10010165 1 2
    10010165 1 5
    10010165 1 6
    10010165 1 6
    10010165 1 7
    10010165 1 8
    10010165 1 9
    10010165 1 1
    10010165 0 4
    10010165 0 5
    10010165 1 6
    10010165 1 7
    10010166 1 5
    10010166 1 3
    10010166 1 3
    10010166 1 3
    10010166 0 3
    10010166 0 3
    10010166 0 3

  • #2
    What you are asking for can be done, but it requires several steps, and to get the code right requires a usable example of the data. It appears that you have not yet even imported your data to Stata: I say that because the column headers in your data tableau are, for the most part, not legal Stata variable names. Please first import your data into Stata and then use the -dataex- command to post a usable example of your data. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    You may find that you have too many variables for -dataex-. If that turns out to be the case, I suggest you use -dataex- with only the DAUID and the age-group specific male variables. That will give an example that is workable and will illustrate the code well enough that it will not be hard for you to modify it so that it applies to your full data set.

    Comment


    • #3
      Many thanks Clyde for your suggestion. This is the dataset for male population.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(dauid tot_pop tot_male _4male _9male _14male _19male _24male _29male _34male _39male _44male _49male _54male _59male _64male _69male _74male) int(_79male _overmale)
      10010165  540 245  5 10 15 20 20 15 10 15 15 25 35 25 20  5  5  5  0
      10010166  374 175  5 10  5 20 20 10 10  5 15 15 25 15 10  5  0  0  0
      10010167  511 250 15 15 15 15 40 25 10 10 15 25 25 15 20  5  0  0  0
      10010168  595 285  5 10 10 20 25 20 10 25 20 20 20 25 25 20 15  5  5
      10010169  326 160  5 10 15 15 10 10 10 15 15 20 10 10  5  5  0  0  0
      10010170  453 215 10 10 15 20 25 20  5 15 15 20 20 20 15  0  0  5  0
      10010171  563 260 10 15 20 30 20 20 15 10 25 30 25 10 15  5  5  5  0
      10010172  246 120  5  5  5  5 15 15 10  5 10 10 15 10  5  0  5  0  0
      10010173  984 465 20 40 45 40 40 30 35 35 50 50 35 15 15 10 10  5  0
      10010182  416 195 10 10 10 10 10 15 20 20 10 20 10 15 15 15  5  5  0
      10010184 1635 795 40 45 55 45 45 45 65 65 75 70 65 60 50 35 25 10  0
      10010187  725 325 20 10 15 15 10 35 20 20 30 30 30 25 25 10 10 10  5
      10010191  841 415 20 30 35 45 25 20 25 35 35 30 30 20 20 25 10 10  0
      10010192  359 185 10 10 15 10 15 20 10 15 25 10 10 20  5 10  5  0  0
      10010193  547 270 20 15 25 25 10 15 20 15 20 20 25 25 10 10  5  0  0
      10010194  257 135  5 10  5  5 10 10 10  5 15 10 10 15  5  5  5  5  0
      10010195  669 325 20 25 15 15 20 25 15 30 40 25 35 20 15 10 10  5  0
      10010196  572 265  5  5 15 15 15 15 10 20 25 20 15 20 25 15 20 10  5
      10010198  461 225 10 10 10 10 20 10 20 10 25 25 25 15  5 10  5 10  0
      10010199  553 185 15  5  5  5  5  5 10 10  5 10 10  5  5 10 15 15 25
      end
      and data for female population is

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(dauid tot_pop total_female _4fem _9female _14female _19female _24female _29female _34female _39female _44female _49female _54female _59female _64female _69female _74female _79female _84female)
      10010165  540 295 15 15 10 25 35 20 15 10 30 30 30 25 15  5  5  0  0
      10010166  374 200  5 10 15 15 20 10  5 10 15 25 25 15 15  0  5  0  5
      10010167  511 260 10  5 10 25 30 25 15 15 15 30 30 25 15  0  5  5  0
      10010168  595 315 10 15 10 25 25 30 15 25 15 20 30 30 30 10 15 10  0
      10010169  326 170  5  5 10 20 15 10 10 10 20 15 15 10  5  5  5  0  5
      10010170  453 235  5 10 15 20 20 15  5 20 25 15 25 20 15  5  5  5  5
      10010171  563 300  5 20 25 20 35 20 10 15 35 30 20 15 20 10  5  5  0
      10010172  246 125  5 10 10 10  5 10 10 10 15  5 10 10  0 10  5  5  0
      10010173  984 515 30 35 40 40 45 50 35 35 60 60 30 20 15 10 10 10  0
      10010182  416 220  5 15 15 10 15 15 20 20 10 20 10 30 10 10  5  0  0
      10010184 1635 835 40 50 40 50 45 60 60 75 75 70 70 70 45 30 20 15 15
      10010187  725 400 15 15 15 20 25 25 30 30 40 30 40 40 20 10  5 15  5
      10010191  841 425 25 25 30 30 30 25 40 35 35 30 25 25 25 20 15 10  5
      10010192  359 170 10  5 10 10 15 10 15 15 20 10 25 15  5 10  5  5  5
      10010193  547 280 10 15 20 30 20 25 20 15 25 30 20 15 10  5  5  5  0
      10010194  257 125  5  5 10  5 10 10  5 10 10 20  5  5  5  5  5  5  5
      10010195  669 340  5 15 15 25 20 20 20 35 40 30 35 25 20 10 10 10  5
      10010196  572 305 10 10 10 15 20 15 15 10 20 25 15 25 20 25 10 25 20
      10010198  461 240  5 10 15 20 10 15 20 15 25 25 25 10  5 10 10 10 10
      10010199  553 370 10 10  5  0 10 10 15  5 15 10 10 10 15 10 15 35 60
      10010200  435 225  5 10 10 10 15 15 15 10 20 35 15 15 15 10 10  5  5
      10010201  547 270 10  5 10 15 20 10 20 30 25 35 35 15 20 10  5  5  5
      10010202  474 245  5 10 10 15 25 10 15 10 30 15 30 25 15 10  5  5  0
      10010203  415 220 10  5 15 10 10 15 10 20 20 20 25 15 15 10  5  5  5
      10010204  563 305 10 15 15 20 20 20 30 20 25 25 20 25 20 10 10  5  5
      10010206  622 375 10 20 10 20 15 20 10 20 20 20 25 15 15 30 30 50 30
      10010207 1301 690 45 30 20 40 80 85 60 60 50 55 45 35 20 20 10 15 10
      10010208 1245 705 50 45 45 55 55 60 65 50 65 55 40 45 25 10 10 15  5
      10010209  540 290  5 10 15 15 30 20 15 20 20 30 25 20 10 10  5 10  5
      10010210  431 230 10 15 10 15 20 15 20 15 25 20 10 15 20 10  5  0  5
      end

      Comment


      • #4
        Thanks. That's good. The following code converts this into a data set of individual observations, specifying for each person the dauid they come from, their age group and sex, and the numbers match what is given in your data.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long(dauid tot_pop tot_male _4male _9male _14male _19male _24male _29male _34male _39male _44male _49male _54male _59male _64male _69male _74male) int(_79male _overmale)
        10010165  540 245  5 10 15 20 20 15 10 15 15 25 35 25 20  5  5  5  0
        10010166  374 175  5 10  5 20 20 10 10  5 15 15 25 15 10  5  0  0  0
        10010167  511 250 15 15 15 15 40 25 10 10 15 25 25 15 20  5  0  0  0
        10010168  595 285  5 10 10 20 25 20 10 25 20 20 20 25 25 20 15  5  5
        10010169  326 160  5 10 15 15 10 10 10 15 15 20 10 10  5  5  0  0  0
        10010170  453 215 10 10 15 20 25 20  5 15 15 20 20 20 15  0  0  5  0
        10010171  563 260 10 15 20 30 20 20 15 10 25 30 25 10 15  5  5  5  0
        10010172  246 120  5  5  5  5 15 15 10  5 10 10 15 10  5  0  5  0  0
        10010173  984 465 20 40 45 40 40 30 35 35 50 50 35 15 15 10 10  5  0
        10010182  416 195 10 10 10 10 10 15 20 20 10 20 10 15 15 15  5  5  0
        10010184 1635 795 40 45 55 45 45 45 65 65 75 70 65 60 50 35 25 10  0
        10010187  725 325 20 10 15 15 10 35 20 20 30 30 30 25 25 10 10 10  5
        10010191  841 415 20 30 35 45 25 20 25 35 35 30 30 20 20 25 10 10  0
        10010192  359 185 10 10 15 10 15 20 10 15 25 10 10 20  5 10  5  0  0
        10010193  547 270 20 15 25 25 10 15 20 15 20 20 25 25 10 10  5  0  0
        10010194  257 135  5 10  5  5 10 10 10  5 15 10 10 15  5  5  5  5  0
        10010195  669 325 20 25 15 15 20 25 15 30 40 25 35 20 15 10 10  5  0
        10010196  572 265  5  5 15 15 15 15 10 20 25 20 15 20 25 15 20 10  5
        10010198  461 225 10 10 10 10 20 10 20 10 25 25 25 15  5 10  5 10  0
        10010199  553 185 15  5  5  5  5  5 10 10  5 10 10  5  5 10 15 15 25
        end
        tempfile males
        save `males'
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long(dauid tot_pop total_female _4fem _9female _14female _19female _24female _29female _34female _39female _44female _49female _54female _59female _64female _69female _74female _79female _84female)
        10010165  540 295 15 15 10 25 35 20 15 10 30 30 30 25 15  5  5  0  0
        10010166  374 200  5 10 15 15 20 10  5 10 15 25 25 15 15  0  5  0  5
        10010167  511 260 10  5 10 25 30 25 15 15 15 30 30 25 15  0  5  5  0
        10010168  595 315 10 15 10 25 25 30 15 25 15 20 30 30 30 10 15 10  0
        10010169  326 170  5  5 10 20 15 10 10 10 20 15 15 10  5  5  5  0  5
        10010170  453 235  5 10 15 20 20 15  5 20 25 15 25 20 15  5  5  5  5
        10010171  563 300  5 20 25 20 35 20 10 15 35 30 20 15 20 10  5  5  0
        10010172  246 125  5 10 10 10  5 10 10 10 15  5 10 10  0 10  5  5  0
        10010173  984 515 30 35 40 40 45 50 35 35 60 60 30 20 15 10 10 10  0
        10010182  416 220  5 15 15 10 15 15 20 20 10 20 10 30 10 10  5  0  0
        10010184 1635 835 40 50 40 50 45 60 60 75 75 70 70 70 45 30 20 15 15
        10010187  725 400 15 15 15 20 25 25 30 30 40 30 40 40 20 10  5 15  5
        10010191  841 425 25 25 30 30 30 25 40 35 35 30 25 25 25 20 15 10  5
        10010192  359 170 10  5 10 10 15 10 15 15 20 10 25 15  5 10  5  5  5
        10010193  547 280 10 15 20 30 20 25 20 15 25 30 20 15 10  5  5  5  0
        10010194  257 125  5  5 10  5 10 10  5 10 10 20  5  5  5  5  5  5  5
        10010195  669 340  5 15 15 25 20 20 20 35 40 30 35 25 20 10 10 10  5
        10010196  572 305 10 10 10 15 20 15 15 10 20 25 15 25 20 25 10 25 20
        10010198  461 240  5 10 15 20 10 15 20 15 25 25 25 10  5 10 10 10 10
        10010199  553 370 10 10  5  0 10 10 15  5 15 10 10 10 15 10 15 35 60
        10010200  435 225  5 10 10 10 15 15 15 10 20 35 15 15 15 10 10  5  5
        10010201  547 270 10  5 10 15 20 10 20 30 25 35 35 15 20 10  5  5  5
        10010202  474 245  5 10 10 15 25 10 15 10 30 15 30 25 15 10  5  5  0
        10010203  415 220 10  5 15 10 10 15 10 20 20 20 25 15 15 10  5  5  5
        10010204  563 305 10 15 15 20 20 20 30 20 25 25 20 25 20 10 10  5  5
        10010206  622 375 10 20 10 20 15 20 10 20 20 20 25 15 15 30 30 50 30
        10010207 1301 690 45 30 20 40 80 85 60 60 50 55 45 35 20 20 10 15 10
        10010208 1245 705 50 45 45 55 55 60 65 50 65 55 40 45 25 10 10 15  5
        10010209  540 290  5 10 15 15 30 20 15 20 20 30 25 20 10 10  5 10  5
        10010210  431 230 10 15 10 15 20 15 20 15 25 20 10 15 20 10  5  0  5
        end
        tempfile females
        save `females'
        
        use `males', clear
        merge 1:1 dauid using `females', nogenerate
        
        //    AT THIS POINT YOUR DATA SET HAS BEEN RE-CREATED.  WHAT YOU WANT TO DO
        //        STARTS HERE
        
        //    GO LONG
        rename _4fem _4female // CORRECT IRREGULARITY IN VARIABLE NAMING SYSTEM
        drop tot*
        reshape long _, i(dauid) j(age_sex) string
        rename _ population
        
        //    SEPARATE AGE AND SEX
        replace age_sex = subinstr(age_sex, "male", "_male", .)
        replace age_sex = subinstr(age_sex, "fe_", "_fe", .)
        split age_sex, parse("_") gen(v)
        rename v1 age
        rename v2 sex
        
        //    CLEAN UP THE AGE GROUPS
        gen lower_age = real(age) - 4
        replace age = string(lower_age) + "-" + age
        replace age = "over 85" if missing(lower_age)
        drop lower_age age_sex
        
        //    NOW EXPAND TO INDIVIDUAL DATA
        drop if population == 0
        expand population
        drop population
        The end result is a data set of 15,025 individual observations.

        Comment


        • #5
          many thanks Clyde, it perfectly worked.

          Yuba

          Comment

          Working...
          X