Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create preference order for source of data

    Hello everyone,

    I have occupational data for a large set of countries over time that come from different types of sources.
    The three types are "Labor force surveys", "Population Census" and "Other household survey".
    My preferred datasource are Labor Force Surveys. So if this is available as source, I would like to drop the other two types.
    The second preference would be Population Census and the third one other household surveys.

    Now the countries differ in their data sources. I attached the example of Botswana below, where all three types of sources are available. In this case I would only like to keep "Labor Force Surveys" and drop the other two. In the case of other countries I only have only "Population Census" and "Other household surveys", so I would like to keep only "population census" and drop "other household surveys".

    So the order of preferences would be
    "Labor force survey">"Population Census">"Other Household Survey"

    Is there a code to make this selection automatically?

    Thanks and kind regards,
    Jonas



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str33 country str62 occupation str35 typeofsource
    "Botswana" "1. Legislators, senior officials and managers (ISCO-88)"        "Other household survey"
    "Botswana" "2. Professionals (ISCO-88)"                                     "Other household survey"
    "Botswana" "3. Technicians and associate professionals (ISCO-88)"           "Other household survey"
    "Botswana" "4. Clerks (ISCO-88)"                                            "Other household survey"
    "Botswana" "5. Service workers and shop and market sales workers (ISCO-88)" "Other household survey"
    "Botswana" "6. Skilled agricultural and fishery workers (ISCO-88)"          "Other household survey"
    "Botswana" "7. Craft and related trades workers (ISCO-88)"                  "Other household survey"
    "Botswana" "8. Plant and machine operators and assemblers (ISCO-88)"        "Other household survey"
    "Botswana" "9. Elementary occupations (ISCO-88)"                            "Other household survey"
    "Botswana" "Total (ISCO-88)"                                                "Other household survey"
    "Botswana" "X. Not elsewhere classified (ISCO-88)"                          "Other household survey"
    "Botswana" "1. Legislators, senior officials and managers (ISCO-88)"        "Labour force survey"   
    "Botswana" "2. Professionals (ISCO-88)"                                     "Labour force survey"   
    "Botswana" "3. Technicians and associate professionals (ISCO-88)"           "Labour force survey"   
    "Botswana" "4. Clerks (ISCO-88)"                                            "Labour force survey"   
    "Botswana" "5. Service workers and shop and market sales workers (ISCO-88)" "Labour force survey"   
    "Botswana" "6. Skilled agricultural and fishery workers (ISCO-88)"          "Labour force survey"   
    "Botswana" "7. Craft and related trades workers (ISCO-88)"                  "Labour force survey"   
    "Botswana" "8. Plant and machine operators and assemblers (ISCO-88)"        "Labour force survey"   
    "Botswana" "9. Elementary occupations (ISCO-88)"                            "Labour force survey"   
    "Botswana" "Total (ISCO-88)"                                                "Labour force survey"   
    "Botswana" "X. Not elsewhere classified (ISCO-88)"                          "Labour force survey"   
    "Botswana" "1. Legislators, senior officials and managers (ISCO-88)"        "Population census"     
    "Botswana" "2. Professionals (ISCO-88)"                                     "Population census"     
    "Botswana" "3. Technicians and associate professionals (ISCO-88)"           "Population census"     
    "Botswana" "4. Clerks (ISCO-88)"                                            "Population census"     
    "Botswana" "5. Service workers and shop and market sales workers (ISCO-88)" "Population census"     
    "Botswana" "6. Skilled agricultural and fishery workers (ISCO-88)"          "Population census"     
    "Botswana" "7. Craft and related trades workers (ISCO-88)"                  "Population census"     
    "Botswana" "8. Plant and machine operators and assemblers (ISCO-88)"        "Population census"     
    "Botswana" "9. Elementary occupations (ISCO-88)"                            "Population census"     
    "Botswana" "Total (ISCO-88)"                                                "Population census"     
    "Botswana" "X. Not elsewhere classified (ISCO-88)"                          "Population census"     
    end

  • #2
    Starting with your example data, the following example code may start you in a useful direction.
    Code:
    . generate byte priority = 0
    
    . replace priority = 3 if typeofsource == "Other household survey"
    (11 real changes made)
    
    . replace priority = 2 if typeofsource == "Labour force survey"
    (11 real changes made)
    
    . replace priority = 1 if typeofsource == "Population census"
    (11 real changes made)
    
    . assert priority > 0
    
    . 
    . bysort country occupation (priority): keep if _n==1
    (22 observations deleted)
    
    . list, clean noobs abbreviate(20)
    
         country                                                       occupation        typeofsource   priority  
        Botswana          1. Legislators, senior officials and managers (ISCO-88)   Population census          1  
        Botswana                                       2. Professionals (ISCO-88)   Population census          1  
        Botswana             3. Technicians and associate professionals (ISCO-88)   Population census          1  
        Botswana                                              4. Clerks (ISCO-88)   Population census          1  
        Botswana   5. Service workers and shop and market sales workers (ISCO-88)   Population census          1  
        Botswana            6. Skilled agricultural and fishery workers (ISCO-88)   Population census          1  
        Botswana                    7. Craft and related trades workers (ISCO-88)   Population census          1  
        Botswana          8. Plant and machine operators and assemblers (ISCO-88)   Population census          1  
        Botswana                              9. Elementary occupations (ISCO-88)   Population census          1  
        Botswana                                                  Total (ISCO-88)   Population census          1  
        Botswana                            X. Not elsewhere classified (ISCO-88)   Population census          1

    Comment


    • #3
      Thanks a lot William! This is definitely going into the right direction and the code works perfectly. However, I encountered one problem: In the case of some countries, there are two different surveys that both fall into the same category, but cover different time periods. With the code
      Code:
       bysort country occupation (priority): keep if _n==1
      it now keeps only the first survey and drops the second one, although it is from the same priority number. I would like to keep both, but only if they are from they same type of survey.
      This is for example the case for Czech Republic:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str33 country str62 occupation str76 survey str35 typeofsource
      "Czech Republic" "0. Armed forces (ISCO-88)"                                      "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "1. Legislators, senior officials and managers (ISCO-88)"        "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "2. Professionals (ISCO-88)"                                     "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "3. Technicians and associate professionals (ISCO-88)"           "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "4. Clerks (ISCO-88)"                                            "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "5. Service workers and shop and market sales workers (ISCO-88)" "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "6. Skilled agricultural and fishery workers (ISCO-88)"          "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "7. Craft and related trades workers (ISCO-88)"                  "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "8. Plant and machine operators and assemblers (ISCO-88)"        "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "9. Elementary occupations (ISCO-88)"                            "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "X. Not elsewhere classified (ISCO-88)"                          "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "Total (ISCO-88)"                                                "Labour Force Sample Survey" "Labour force survey"
      "Czech Republic" "0. Armed forces (ISCO-88)"                                      "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "1. Legislators, senior officials and managers (ISCO-88)"        "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "2. Professionals (ISCO-88)"                                     "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "3. Technicians and associate professionals (ISCO-88)"           "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "4. Clerks (ISCO-88)"                                            "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "5. Service workers and shop and market sales workers (ISCO-88)" "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "6. Skilled agricultural and fishery workers (ISCO-88)"          "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "7. Craft and related trades workers (ISCO-88)"                  "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "8. Plant and machine operators and assemblers (ISCO-88)"        "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "9. Elementary occupations (ISCO-88)"                            "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "X. Not elsewhere classified (ISCO-88)"                          "EU Labour Force Survey"     "Labour force survey"
      "Czech Republic" "Total (ISCO-88)"                                                "EU Labour Force Survey"     "Labour force survey"
      end

      Any ideas on that? :-)

      Comment


      • #4
        However, I encountered one problem: In the case of some countries, there are two different surveys that both fall into the same category, but cover different time periods. ... Any ideas on that?
        Yes, my idea is that once the problem of different surveys of the same priority covering different time periods is solved, you will next find a country which has different surveys of different priorities covering different time periods, and you will realize that what you want to do is look at your data by country and time period and retain the highest priority survey for a country in each separate time period. :-)

        Please repost the data from the Czech Republic and include whatever variable or variables indicate the time period.

        Comment


        • #5
          Yes, exactly. Here is the example for the Czech republic again, including the time period. (For further steps in my analysis I kept it in wide format.)
          For clarification, I always want to keep only one survey type per country. As they are both labor force surveys in the case of the Czech republic, I want to keep both, but not merge them together, as the sources are different.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str33 country str62 occupation str76 survey str35 typeofsource double(yr1990 yr1991 yr1992 yr1993 yr1994 yr1995 yr1996 yr1997 yr1998 yr1999 yr2000 yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010)
          "Czech Republic" "0. Armed forces (ISCO-88)"                                      "Labour Force Sample Survey" "Labour force survey" . . .  1.5  1.3  1.1  1.2    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "1. Legislators, senior officials and managers (ISCO-88)"        "Labour Force Sample Survey" "Labour force survey" . . .  4.4  5.2  6.2  6.7    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "2. Professionals (ISCO-88)"                                     "Labour Force Sample Survey" "Labour force survey" . . .  9.2  8.8  9.4  9.4    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "3. Technicians and associate professionals (ISCO-88)"           "Labour Force Sample Survey" "Labour force survey" . . . 17.9 18.2 17.9 17.9    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "4. Clerks (ISCO-88)"                                            "Labour Force Sample Survey" "Labour force survey" . . .  7.4  7.6  7.6  7.8    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "5. Service workers and shop and market sales workers (ISCO-88)" "Labour Force Sample Survey" "Labour force survey" . . . 10.6 11.2 11.2 11.4    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "6. Skilled agricultural and fishery workers (ISCO-88)"          "Labour Force Sample Survey" "Labour force survey" . . .  2.6  2.5  2.5  2.4    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "7. Craft and related trades workers (ISCO-88)"                  "Labour Force Sample Survey" "Labour force survey" . . . 22.9 22.2 21.6 21.1    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "8. Plant and machine operators and assemblers (ISCO-88)"        "Labour Force Sample Survey" "Labour force survey" . . . 13.2 13.3 12.9 12.8    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "9. Elementary occupations (ISCO-88)"                            "Labour Force Sample Survey" "Labour force survey" . . . 10.2  9.5  9.5  9.2    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "X. Not elsewhere classified (ISCO-88)"                          "Labour Force Sample Survey" "Labour force survey" . . .   .2   .2   .1   .1    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "Total (ISCO-88)"                                                "Labour Force Sample Survey" "Labour force survey" . . .  100  100  100  100    .    .    .    .    .    .    .    .    .    .    .    .    .    .
          "Czech Republic" "0. Armed forces (ISCO-88)"                                      "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .   .3   .2   .3   .4   .3   .3   .3   .3   .3   .4   .3   .3   .3   .3
          "Czech Republic" "1. Legislators, senior officials and managers (ISCO-88)"        "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .  6.7  6.7  6.8  6.2  6.4  6.4  6.2  6.4  6.2  6.6  6.7  6.7  5.9  5.4
          "Czech Republic" "2. Professionals (ISCO-88)"                                     "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .  9.7  9.6  9.9 10.9 10.8 10.3 10.2 10.5 10.8 10.7   11 11.1 11.7 10.5
          "Czech Republic" "3. Technicians and associate professionals (ISCO-88)"           "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    . 18.2 18.1 18.3 18.6 19.3   19 20.2 20.4 21.9   22 22.4 22.8   24 24.9
          "Czech Republic" "4. Clerks (ISCO-88)"                                            "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .  8.1  8.1  7.9  7.7  7.9  8.5  8.1  7.9  7.5    7    7  7.1  7.4  7.9
          "Czech Republic" "5. Service workers and shop and market sales workers (ISCO-88)" "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    . 11.8 12.4 12.3   12 12.1 12.5 12.5 12.3 12.1 12.1 11.7 11.6 11.9 12.4
          "Czech Republic" "6. Skilled agricultural and fishery workers (ISCO-88)"          "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .  2.2  2.2  2.1  2.1    2  1.9  1.9  1.8  1.6  1.5  1.5  1.4  1.3  1.4
          "Czech Republic" "7. Craft and related trades workers (ISCO-88)"                  "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    . 21.3 21.3 21.2 20.8 19.8 19.7 19.7 19.5 18.7 18.2 18.6 18.7 17.6 17.4
          "Czech Republic" "8. Plant and machine operators and assemblers (ISCO-88)"        "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    . 12.8 12.8   13   13 15.2 15.2   15 15.2 15.4   16 15.6 15.1 14.6 14.9
          "Czech Republic" "9. Elementary occupations (ISCO-88)"                            "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .  8.7  8.7  8.3  8.3  6.2  6.1  5.8  5.7  5.6  5.6  5.3  5.4  5.4    5
          "Czech Republic" "X. Not elsewhere classified (ISCO-88)"                          "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .   .1    0    0   .1    0   .1    0    0   .1    0    0    0    0    0
          "Czech Republic" "Total (ISCO-88)"                                                "EU Labour Force Survey"     "Labour force survey" . . .    .    .    .    .  100  100  100  100  100  100  100  100  100  100  100  100  100  100
          end

          Comment


          • #6
            First, let me apologize - the code in post #2 assigned the priorities in the wrong order - Labour force survey should have been 1, not 2.

            Consider the following made-up example based on the format shown in post #5. It's something of a worst-case scenario, but demonstrates technique that reduces the results to at most one non-missing value per year, and when multiple values are available in a given year, breaks ties first by priority and then by picking alphabetically by survey name.
            Code:
            cls
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str6 country str15 occupation str19(survey typeofsource) byte(yr1995 yr1996 yr1997 yr1998 yr1999)
            "Mordor" "Total (ISCO-88)" "Mordor CPS"          "Population census"   100 100   .   .   .
            "Mordor" "Total (ISCO-88)" "Mordor ACS"          "Population census"     . 100 100 100 100
            "Mordor" "Total (ISCO-88)" "Middle Earth Labour" "Labour force survey"   .   .   .   . 100
            end
            
            reshape long yr, i(country occupation survey) j(year)
            rename yr value
            order country year
            drop if missing(value)
            
            generate byte priority = 0
            replace priority = 3 if typeofsource == "Other household survey"
            replace priority = 2 if typeofsource == "Population census"
            replace priority = 1 if typeofsource == "Labour force survey"
            assert priority > 0
            
            sort country year occupation priority survey
            list, noobs sepby(country year occupation)
            bysort country year occupation (priority survey): keep if _n==1
            list, noobs 
            
            drop priority
            rename value yr
            reshape wide yr, i(country occupation survey) j(year)
            list, noobs
            Code:
            . sort country year occupation priority survey
            
            . list, noobs sepby(country year occupation)
            
              +-------------------------------------------------------------------------------------------------+
              | country   year        occupation                survey          typeofsource   value   priority |
              |-------------------------------------------------------------------------------------------------|
              |  Mordor   1995   Total (ISCO-88)            Mordor CPS     Population census     100          2 |
              |-------------------------------------------------------------------------------------------------|
              |  Mordor   1996   Total (ISCO-88)            Mordor ACS     Population census     100          2 |
              |  Mordor   1996   Total (ISCO-88)            Mordor CPS     Population census     100          2 |
              |-------------------------------------------------------------------------------------------------|
              |  Mordor   1997   Total (ISCO-88)            Mordor ACS     Population census     100          2 |
              |-------------------------------------------------------------------------------------------------|
              |  Mordor   1998   Total (ISCO-88)            Mordor ACS     Population census     100          2 |
              |-------------------------------------------------------------------------------------------------|
              |  Mordor   1999   Total (ISCO-88)   Middle Earth Labour   Labour force survey     100          1 |
              |  Mordor   1999   Total (ISCO-88)            Mordor ACS     Population census     100          2 |
              +-------------------------------------------------------------------------------------------------+
            Code:
            . by country year occupation (priority survey): keep if _n==1
            (2 observations deleted)
            
            . list, noobs 
            
              +-------------------------------------------------------------------------------------------------+
              | country   year        occupation                survey          typeofsource   value   priority |
              |-------------------------------------------------------------------------------------------------|
              |  Mordor   1995   Total (ISCO-88)            Mordor CPS     Population census     100          2 |
              |  Mordor   1996   Total (ISCO-88)            Mordor ACS     Population census     100          2 |
              |  Mordor   1997   Total (ISCO-88)            Mordor ACS     Population census     100          2 |
              |  Mordor   1998   Total (ISCO-88)            Mordor ACS     Population census     100          2 |
              |  Mordor   1999   Total (ISCO-88)   Middle Earth Labour   Labour force survey     100          1 |
              +-------------------------------------------------------------------------------------------------+
            Code:
            . list, noobs
            
              +--------------------------------------------------------------------------------------------------------------------+
              | country        occupation                survey          typeofsource   yr1995   yr1996   yr1997   yr1998   yr1999 |
              |--------------------------------------------------------------------------------------------------------------------|
              |  Mordor   Total (ISCO-88)   Middle Earth Labour   Labour force survey        .        .        .        .      100 |
              |  Mordor   Total (ISCO-88)            Mordor ACS     Population census        .      100      100      100        . |
              |  Mordor   Total (ISCO-88)            Mordor CPS     Population census      100        .        .        .        . |
              +--------------------------------------------------------------------------------------------------------------------+
            Now, I did this by starting with a reshape long that made the problem trivial and ending with a reshape wide. But were it my data, I would likely have continued to work with the data in a long layout. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data.

            Comment


            • #7
              Thank you so much for putting effort into this and setting up the example case, William! Unfortunately that doesn't solve the problem completely.
              Maybe I did not explain it sufficiently.
              Generally my preferred data source are labour force surveys. If this source is present for one country, I want to drop any other type of source, to always have only the same type of source per country.
              In the case of your Mordor example, I would have liked to drop every observation that is not from a Labour force survey, which would be all the population census data.
              If in another example, I would have data from a population census and "Other household survey", I would like to keep only population census data and drop "other household survey".
              Just as indicated in the preference list.

              For the cases where this leaves me with two different surveys from the same type of source, (as for the czech republic where I have 2x labor force survey), I would like to keep data from both surveys.
              So my priority is unity of data source types over number of observations.

              Your code
              Code:
               
               bysort country occupation (priority): keep if _n==1
              works perfectly for keeping and dropping observations according to the priority list, but it also drops all other observations from a second survey of the same category, which I would like to keep.
              So if the code could be adjusted to keeping all observations, as long as they are of the same type of source, that would be the solution I am looking for? :-)
              I hope I explained this well. ;-)

              Comment


              • #8
                So my priority is unity of data source types over number of observations.
                Not the objective I would imagine having for my work, which is why I had so much trouble grasping what you want.

                Code:
                bysort country occupation (priority): keep if priority==priority[1]
                will keep all the observations for a country/occupation combination that have the same priority as the highest priority for that country/occupation.

                Comment


                • #9
                  Thanks William, this is what I was looking for! :-)

                  Comment

                  Working...
                  X