Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is it possible to keep a variable after a -collapse- command ?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str22 country_name int year double var1 str125 sector str53 occupation
    "Albania" 2015 .398 "A. Agriculture; forestry and fishing" "1. Managers"                               
    "Albania" 2016 .067 "A. Agriculture; forestry and fishing" "1. Managers"                               
    "Albania" 2017    . "A. Agriculture; forestry and fishing" "1. Managers"                               
    "Albania" 2018    . "A. Agriculture; forestry and fishing" "1. Managers"                               
    "Albania" 2015 .237 "A. Agriculture; forestry and fishing" "2. Professionals"                          
    "Albania" 2016 .243 "A. Agriculture; forestry and fishing" "2. Professionals"                          
    "Albania" 2017 .193 "A. Agriculture; forestry and fishing" "2. Professionals"                          
    "Albania" 2018 .668 "A. Agriculture; forestry and fishing" "2. Professionals"                          
    "Albania" 2019 .413 "A. Agriculture; forestry and fishing" "2. Professionals"                          
    "Albania" 2015    . "A. Agriculture; forestry and fishing" "3. Technicians and associate professionals"
    "Albania" 2016    . "A. Agriculture; forestry and fishing" "3. Technicians and associate professionals"
    "Albania" 2017 .434 "A. Agriculture; forestry and fishing" "3. Technicians and associate professionals"
    "Albania" 2018 .335 "A. Agriculture; forestry and fishing" "3. Technicians and associate professionals"
    "Albania" 2019 .296 "A. Agriculture; forestry and fishing" "3. Technicians and associate professionals"
    "Albania" 2015    . "A. Agriculture; forestry and fishing" "4. Clerical support workers"               
    end
    I would like to keep the last non-missing value by country-sector-occupation. However, when I run :

    Code:
    collapse (lastnm) var1, by (country_name sector occupation)
    I also loose my variable year, and therefore information on the year of the last non-missing observation kept. Is there any option that allows the command to keep this variable? I tried

    Code:
    collapse (lastnm) var1, by (country_name sector occupation (year))
    But nothing changes.

    Thanks a lot for any help you can provide.
    Regards,

    Adam

  • #2
    It seems to me from your data example that you already have what you want unless it is

    Code:
     
     collapse (lastnm) var1 year, by (country_name sector occupation)

    Comment


    • #3
      I tried

      Code:
      tempfile data_cleaned
      save `data_cleaned'
      
      collapse (lastnm) var1, by(country_name isic4 isco8)
      merge 1:m country_name isic4 isco8 var1 using `data_cleaned', keepusing(year) keep(3) nogenerate
      But I am afraid it is not the right way and I don't know how to check.
      Last edited by Adam Sadi; 01 Sep 2022, 06:06.

      Comment


      • #4
        Nick :

        Thank you for your help. However, if I understood the collapse command correctly, your code will keep the last nonmissing observation for both year and var1 separately.

        I would just like to keep the year of the last non-missing observation for var1. My data should look like something like this:


        Code:
         * Example generated by -dataex-. For more info, type help dataex
        clear input str22 country_name int year double var1 str125 sector str53 occupation
        "Albania" 2016 .067 "A. Agriculture; forestry and fishing" "1. Managers"
        "Albania" 2019 .413 "A. Agriculture; forestry and fishing" "2. Professionals"
        "Albania" 2019 .296 "A. Agriculture; forestry and fishing" "3. Technicians and associate professionals"        
         end
        Last edited by Adam Sadi; 01 Sep 2022, 06:13.

        Comment


        • #5
          Originally posted by Adam Sadi View Post
          Nick :

          Thank you for your help. However, if I understood the collapse command correctly, your code will keep the last nonmissing observation for both year and var1 separately.

          I would just like to keep the year of the last non-missing observation for var1. My data should look like something like this:


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear input str22 country_name int year double var1 str125 sector str53 occupation
          "Albania" 2016 .067 "A. Agriculture; forestry and fishing" "1. Managers"
          "Albania" 2019 .413 "A. Agriculture; forestry and fishing" "2. Professionals"
          "Albania" 2019 .296 "A. Agriculture; forestry and fishing" "3. Technicians and associate professionals"
          end
          Create a new year variable that is synchronized with var1 in terms of missing pattern. In other words:

          Code:
          gen year2 = year if !missing(var1)
          Then, add "year2" into the collapse list after var1, like suggest in #2, but use year2 instead.

          Comment


          • #6
            Ken : Thank you for your help. I now have the dataset that I wanted.

            While waiting for answers I also found an alternative way to code a solution to my problem:

            Code:
            tempfile data_cleaned
            save `data_cleaned'
            
            collapse (lastnm) var1, by(country_name sector occupation)
            merge 1:m country_name sector occupation var1 using `data_cleaned', keepusing(year) keep(3) nogenerate
            
            gen miss_var1 = missing(var1)
            by country_name sector occupation miss_var1 (year), sort: drop if _n != _N & miss_var1 == 1
            drop miss_var1
            I get to the same results than yours with one difference however. Some country-sector-occupation combination have missing values for var1 for all the available years (something I didn't mention in the original post). My code keeps the year with the most recent missing observation if this case happens. It might be useful to know if one wants to make descriptive statistics about missing data.

            Thanks a lot for your help.

            Comment


            • #7
              #3 and #5 are correct. Thanks to Ken Chui for the fix.

              Comment

              Working...
              X