Create dummy that considers the first event of a repeated set

Katherine Oleas

Join Date: Aug 2021
Posts: 80

Create dummy that considers the first event of a repeated set

21 Dec 2023, 19:19

Hello everyone,

I want to create a dummy variable that captures the first salary after graduation. My base is made up at the individual level (id_persona). For each individual there is the graduation period (periodo), the salary (sueldo) they received per month (mes) and year (anio).

Id	mes	anio	sueldo	periodo
17529056	4	2018	345	2015-2016
17529056	5	2018	345	2015-2016
17529056	8	2018	525	2015-2016
17528012	2	2019	325	2017-2018
17528012	3	2019	400	2017-2018
17528012	4	2019	800	2017-2018
17528012	5	2019	800	2017-2018
17528012	6	2019	900	2017-2018
17859404	1	2016	125	2014-2015
17859404	2	2016	300	2014-2015
17860950	6	2017	200	2013-2014
17860950	7	2018	225	2013-2014

I think that for this I should make a loop where I look for the first month in which he receives his first salary and put the number 1. However, I have never worked with loops and I don't know how to generate the code.

Last edited by Katherine Oleas; 21 Dec 2023, 19:36.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30141
#2

21 Dec 2023, 19:49

As stated, your problem cannot be solved with the data provided. The reason is that you can only determine graduation year, but your mes and anio variables give time to the month. So if, for example id 7528012 had an observation with mes = 6 anio = 2018, there is no way to know if that precedes or follows graduation which was at some unknown time in 2018. Now, this kind of problem does not actually arise in the example data you show: all of the mes anio combinations are no earlier than the year after graduation. But that may not hold in the entire data set.

Anyway, I have modified your question to finding the first salary obtained in the year following graduation or later.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long id byte mes int(anio sueldo) str9 periodo 17529056 4 2018 345 "2015-2016" 17529056 5 2018 345 "2015-2016" 17529056 8 2018 525 "2015-2016" 17528012 2 2019 325 "2017-2018" 17528012 3 2019 400 "2017-2018" 17528012 4 2019 800 "2017-2018" 17528012 5 2019 800 "2017-2018" 17528012 6 2019 900 "2017-2018" 17859404 1 2016 125 "2014-2015" 17859404 2 2016 300 "2014-2015" 17860950 6 2017 200 "2013-2014" 17860950 7 2018 225 "2013-2014" end // IDENTIFY GRADUATION YEAR split periodo, parse("-") destring gen(yr) rename yr1 school_start rename yr2 graduation_year // CREATE A MONTHLY DATE VARIABLE gen mdate = ym(anio, mes) format mdate %tm // FIND FIRST SALARY AFTER GRADUATION by id (mdate), sort: egen ptr = min(cond(yofd(dofm(mdate))>graduation_year, _n, .)) by id (mdate): gen wanted = sueldo[ptr]

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#3

22 Dec 2023, 07:33

Thank you very much Clyde. The coma is helping me a lot, the only problem when running is

Code:

by id (mdate): gen wanted = sueldo[ptr]

I get missing and I don't know what the problem would be, because I understand that I would have to take the salary of the position that I took out with the command:

Code:

by id (mdate), sort: egen ptr = min(cond(yofd(dofm(mdate))>graduation_year, _n, .))

Last edited by Katherine Oleas; 22 Dec 2023, 07:52.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30141
#4

22 Dec 2023, 09:41

What version of Stata are you using? I probably should not have written the code the way I did. It is, in general, unsafe to use _n and _N with -egen- commands, because the -egen- functions sometimes change the sort order of the data. In some versions of Stata this problem actually bites, and in others it doesn't. This code worked on my set up, but it might not work on yours if your -egen, min()- function reorders the data.

The following should be safe in any version of Stata:

Code:

// IDENTIFY GRADUATION YEAR split periodo, parse("-") destring gen(yr) rename yr1 school_start rename yr2 graduation_year // CREATE A MONTHLY DATE VARIABLE gen mdate = ym(anio, mes) format mdate %tm // FIND FIRST SALARY AFTER GRADUATION by id (mdate), sort: gen ptr1 = cond(yofd(dofm(mdate)) > graduation_year, _n, .) by id (mdate), sort: egen ptr2 = min(ptr1) by id (mdate): gen wanted = sueldo[ptr2]

This works correctly with your example data in my setup, and should work in any Stata. If this does not produce the intended results, please post back with a new data example that exhibits the difficulties you encounter.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#5

05 Jan 2024, 11:59

Thank you Clyde.
Comment

Announcement

Create dummy that considers the first event of a repeated set

Comment

Comment

Comment

Comment