Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create dummy that considers the first event of a repeated set

    Hello everyone,

    I want to create a dummy variable that captures the first salary after graduation. My base is made up at the individual level (id_persona). For each individual there is the graduation period (periodo), the salary (sueldo) they received per month (mes) and year (anio).
    Id mes anio sueldo periodo
    17529056 4 2018 345 2015-2016
    17529056 5 2018 345 2015-2016
    17529056 8 2018 525 2015-2016
    17528012 2 2019 325 2017-2018
    17528012 3 2019 400 2017-2018
    17528012 4 2019 800 2017-2018
    17528012 5 2019 800 2017-2018
    17528012 6 2019 900 2017-2018
    17859404 1 2016 125 2014-2015
    17859404 2 2016 300 2014-2015
    17860950 6 2017 200 2013-2014
    17860950 7 2018 225 2013-2014
    I think that for this I should make a loop where I look for the first month in which he receives his first salary and put the number 1. However, I have never worked with loops and I don't know how to generate the code.
    Last edited by Katherine Oleas; 21 Dec 2023, 19:36.

  • #2
    As stated, your problem cannot be solved with the data provided. The reason is that you can only determine graduation year, but your mes and anio variables give time to the month. So if, for example id 7528012 had an observation with mes = 6 anio = 2018, there is no way to know if that precedes or follows graduation which was at some unknown time in 2018. Now, this kind of problem does not actually arise in the example data you show: all of the mes anio combinations are no earlier than the year after graduation. But that may not hold in the entire data set.

    Anyway, I have modified your question to finding the first salary obtained in the year following graduation or later.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id byte mes int(anio sueldo) str9 periodo
    17529056 4 2018 345 "2015-2016"
    17529056 5 2018 345 "2015-2016"
    17529056 8 2018 525 "2015-2016"
    17528012 2 2019 325 "2017-2018"
    17528012 3 2019 400 "2017-2018"
    17528012 4 2019 800 "2017-2018"
    17528012 5 2019 800 "2017-2018"
    17528012 6 2019 900 "2017-2018"
    17859404 1 2016 125 "2014-2015"
    17859404 2 2016 300 "2014-2015"
    17860950 6 2017 200 "2013-2014"
    17860950 7 2018 225 "2013-2014"
    end
    
    //    IDENTIFY GRADUATION YEAR
    split periodo, parse("-") destring gen(yr)
    rename yr1 school_start
    rename yr2 graduation_year
    
    //    CREATE A MONTHLY DATE VARIABLE
    gen mdate = ym(anio, mes)
    format mdate %tm
    
    //    FIND FIRST SALARY AFTER GRADUATION
    by id (mdate), sort: egen ptr = min(cond(yofd(dofm(mdate))>graduation_year, _n, .))
    by id (mdate): gen wanted = sueldo[ptr]
    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you very much Clyde. The coma is helping me a lot, the only problem when running is

      Code:
      by id (mdate): gen wanted = sueldo[ptr]
      I get missing and I don't know what the problem would be, because I understand that I would have to take the salary of the position that I took out with the command:

      Code:
      by id (mdate), sort: egen ptr = min(cond(yofd(dofm(mdate))>graduation_year, _n, .))
      Last edited by Katherine Oleas; 22 Dec 2023, 07:52.

      Comment


      • #4
        What version of Stata are you using? I probably should not have written the code the way I did. It is, in general, unsafe to use _n and _N with -egen- commands, because the -egen- functions sometimes change the sort order of the data. In some versions of Stata this problem actually bites, and in others it doesn't. This code worked on my set up, but it might not work on yours if your -egen, min()- function reorders the data.

        The following should be safe in any version of Stata:
        Code:
        //    IDENTIFY GRADUATION YEAR
        split periodo, parse("-") destring gen(yr)
        rename yr1 school_start
        rename yr2 graduation_year
        
        //    CREATE A MONTHLY DATE VARIABLE
        gen mdate = ym(anio, mes)
        format mdate %tm
        
        //    FIND FIRST SALARY AFTER GRADUATION
        by id (mdate), sort: gen ptr1 = cond(yofd(dofm(mdate)) > graduation_year, _n, .)
        by id (mdate), sort: egen ptr2 = min(ptr1)
        by id (mdate): gen wanted = sueldo[ptr2]
        This works correctly with your example data in my setup, and should work in any Stata. If this does not produce the intended results, please post back with a new data example that exhibits the difficulties you encounter.

        Comment


        • #5
          Thank you Clyde.

          Comment

          Working...
          X