Create a dummy variable one year before and after an event with loop functions (e.g: forvalues)

RUBY JULY

Join Date: Mar 2023
Posts: 12

Create a dummy variable one year before and after an event with loop functions (e.g: forvalues)

07 Mar 2023, 20:37

Hi everyone,

I would like to create a dummy variable that equals 1 for one year after the CEO changes and one year before the CEO changes by using loop functions such as forvalues, but I do not have any ideas how to create that variable. I used execid (executives’ ID) and ceoann (CEO annual flag) to identify whether firms have CEO changes across years. Therefore, I created a variable: CEO_changes that equals 1 for firms having CEO changes and 0 otherwise. In particular:
gen lag_execid = l1.execid
gen CEO_changes = 1 if ceoann == “CEO” & execid != lag_execid
replace CEO_changes = 0 if ceoann == “CEO” & execid == lag_execid
However, I do not know how to create a dummy that equals one year before and 0 for one year after the CEO changes in next steps. Can you suggest me some ideas? Thank you.

gvkey	fyear	ceoann	execid	execid_n	lag_execide	CEO_changes
1004	2002		9252	9252
1004	2003		9251	9251	9252
1004	2004		9252	9252	9251
1004	2005		23781	23781	9252
1004	2006	CEO	9249	9249	23781	1
1004	2007	CEO	9249	9249	9249	0
1004	2008		23781	23781	9249
1004	2009		36199	36199	23781
1004	2010	CEO	9249	9249	36199	1
1004	2011		41787	41787	9249
1004	2012		33979	33979	41787
1004	2013		46404	46404	33979
1004	2014		46404	46404	46404
1004	2015	CEO	9249	9249	46404	1
1004	2016		51706	51706	9249
1034	2002
1034	2003
1034	2004
1034	2005
1034	2006
1038	2002
1045	2001		3662	3662
1045	2002	CEO	3661	3661	3662	1
1045	2003		3661	3661	3661
1045	2004		16654	16654	3661
1045	2005		16654	16654	16654
1045	2006		16654	16654	16654
1045	2007		28280	28280	16654
1045	2008	CEO	14591	14591	28280	1
1045	2009		28280	28280	14591
1045	2010		28280	28280	28280
1045	2011	CEO	14591	14591	28280	1
1045	2014		46189	46189
1045	2015		46190	46190	46189
1045	2016		46193	46193	46190

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

07 Mar 2023, 22:07

Suppose a firm experiences a CEO change in each of three consecutive years. Then the second of those years is both a year before and a year after a CEO change. How do you want to handle that situation? (And similar situations if there were even more than three consecutive years of CEO change.)

Also, why do you have two variables execid and execid_n which have the exact same values in every observation? I imagine that one of them is a string and the other is a numeric variable. But this is just an indication of the inadequacy of using a tabular display of data: it lacks important information. The effective and helpful way to show example data is with the -dataex- command. If you would like help with code, you should use -dataex- to post example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When you post back with answers to the questions posed here, please also include -dataex- output to exhibit your data.

Last edited by Clyde Schechter; 07 Mar 2023, 22:12.
1 like
Comment
RUBY JULY

Join Date: Mar 2023

Posts: 12
#3

08 Mar 2023, 16:06

Dear Clyde,

Thank you for your response. Yes, you asked very good questions. I expect that a firm does not experience a CEO change in each two or three consecutive years (because I do not know how to deal with the dummy if a firm have a CEO change in each two or three consecutive years). However, I figure that the above way that I used to create CEO_changes, was seemly wrong because it did not successfully remove non-CEO observations, especially missing values in the ceoann variable. I am trying to find a solution to fix that problem first (create CEO_changes correctly), then create a dummy later. Can you give some recommendations? Thanks a lot.
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(gvkey fyear) str2 ceoso str4 ceoann str6 execid long(execid_n becameceo leftofc) str4 pceo float(Pastceodum Ceodivdummypp lag_execide CEO_changes)
1004 2002 "" "" "09252" 9252 . . "" 0 1 . .
1004 2003 "Y" "" "09251" 9251 . . "" 0 1 9252 .
1004 2004 "Y" "" "09252" 9252 . . "" 0 1 9251 .
1004 2005 "Y" "" "23781" 23781 . . "" 0 1 9252 .
1004 2006 "Y" "CEO" "09249" 9249 13431 21335 "CEO" 0 1 23781 1
1004 2007 "Y" "CEO" "09249" 9249 13431 21335 "CEO" 0 1 9249 .
1004 2008 "Y" "" "23781" 23781 . . "" 0 1 9249 .
1004 2009 "Y" "" "36199" 36199 . . "" 0 1 23781 .
1004 2010 "Y" "CEO" "09249" 9249 13431 21335 "CEO" 0 1 36199 1
1004 2011 "Y" "" "41787" 41787 . . "" 0 1 9249 .
1004 2012 "Y" "" "33979" 33979 . . "" 0 1 41787 .
1004 2013 "Y" "" "46404" 46404 . . "" 0 1 33979 .
1004 2014 "Y" "" "46404" 46404 . . "" 0 1 46404 .
1004 2015 "Y" "CEO" "09249" 9249 13431 21335 "CEO" . . 46404 1
1004 2016 "Y" "" "51706" 51706 . . "" . . 9249 .
1034 2002 "" "" "" . . . "" 0 1 . .
1034 2003 "Y" "" "" . . . "" 0 1 . .
1034 2004 "Y" "" "" . . . "" 0 1 . .
1034 2005 "Y" "" "" . . . "" 0 1 . .
1034 2006 "Y" "" "" . . . "" 1 1 . .
1038 2002 "" "" "" . . . "" . . . .
1045 2001 "" "" "03662" 3662 . . "" 1 1 . .
1045 2002 "" "CEO" "03661" 3661 14019 15819 "" 1 1 3662 1
1045 2003 "Y" "" "03661" 3661 14019 15819 "" 1 1 3661 .
1045 2004 "Y" "" "16654" 16654 . . "" 1 1 3661 .
1045 2005 "Y" "" "16654" 16654 . . "" 1 1 16654 .
1045 2006 "Y" "" "16654" 16654 . . "" 1 1 16654 .
1045 2007 "Y" "" "28280" 28280 . . "" 1 1 16654 .
1045 2008 "Y" "CEO" "14591" 14591 15820 18959 "" 1 1 28280 1
1045 2009 "Y" "" "28280" 28280 . . "" 1 1 14591 .
1045 2010 "Y" "" "28280" 28280 . . "" 1 1 28280 .
1045 2011 "Y" "CEO" "14591" 14591 15820 18959 "" 1 1 28280 1
1045 2014 "Y" "" "46189" 46189 . . "" . . . .
1045 2015 "Y" "" "46190" 46190 . . "" . . 46189 .
1045 2016 "Y" "" "46193" 46193 . . "" . . 46190 .
1075 2002 "" "CEO" "10739" 10739 14276 18017 "" 1 1 . .
1075 2003 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
1075 2004 "Y" "" "19092" 19092 . . "" 1 1 10739 .
1075 2005 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 19092 1
1075 2006 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
1075 2007 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
1075 2008 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
1075 2009 "Y" "" "39817" 39817 . . "" 0 1 10739 .
1075 2010 "Y" "" "33218" 33218 . . "" 1 1 39817 .
1075 2011 "Y" "" "12755" 12755 . . "" 1 1 33218 .
1075 2012 "Y" "CEO" "05835" 5835 18017 . "CEO" 1 1 12755 1
1075 2013 "Y" "" "39817" 39817 . . "" 1 1 5835 .
1075 2014 "Y" "" "12755" 12755 . . "" 1 1 39817 .
1075 2015 "Y" "" "12755" 12755 . . "" 1 1 12755 .
1075 2016 "Y" "" "12755" 12755 . . "" 1 1 12755 .
1078 2002 "" "" "20862" 20862 . . "" 0 1 . .
1078 2003 "Y" "" "18569" 18569 . . "" 0 1 20862 .
1078 2004 "Y" "" "20862" 20862 . . "" 0 1 18569 .
1078 2005 "Y" "" "24400" 24400 . . "" 0 1 20862 .
1078 2006 "Y" "CEO" "14300" 14300 14245 . "CEO" 0 1 24400 1
1078 2007 "Y" "" "24400" 24400 . . "" 0 1 14300 .
1078 2008 "Y" "" "24400" 24400 . . "" 0 1 24400 .
1078 2009 "Y" "" "35171" 35171 . . "" 0 1 24400 .
1078 2010 "Y" "CEO" "14300" 14300 14245 . "CEO" 0 1 35171 1
1078 2011 "Y" "" "42389" 42389 . . "" 0 1 14300 .
1078 2012 "Y" "" "24400" 24400 . . "" 0 1 42389 .
1078 2013 "Y" "CEO" "14300" 14300 14245 . "CEO" 0 1 24400 1
1078 2014 "Y" "" "45356" 45356 . . "" 0 1 14300 .
1078 2015 "Y" "" "48541" 48541 . . "" 0 1 45356 .
1078 2016 "Y" "" "45356" 45356 . . "" 0 1 48541 .
1111 2004 "Y" "" "" . . . "" 0 1 . .
1111 2005 "Y" "" "" . . . "" 0 1 . .
1111 2006 "Y" "" "" . . . "" 0 1 . .
1161 2002 "" "CEO" "20918" 20918 15455 17730 "" 0 1 . .
1161 2003 "Y" "" "25959" 25959 17714 18637 "" 0 1 20918 .
1161 2004 "Y" "" "24474" 24474 . . "" 0 1 25959 .
1161 2005 "Y" "" "27476" 27476 . . "" 0 1 24474 .
1161 2006 "Y" "" "24474" 24474 . . "" 1 1 27476 .
1161 2007 "Y" "CEO" "20918" 20918 15455 17730 "" 1 1 24474 1
1161 2008 "Y" "" "24474" 24474 . . "" 1 1 20918 .
1161 2009 "Y" "" "37980" 37980 18637 18840 "" 0 1 24474 .
1161 2010 "Y" "CEO" "25959" 25959 17714 18637 "" 0 1 37980 1
1161 2011 "Y" "CEO" "37980" 37980 18637 18840 "" 0 1 25959 1
1161 2012 "Y" "" "42391" 42391 . . "" . . 37980 .
end
format %d becameceo
format %d leftofc
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

08 Mar 2023, 17:03

Well, a close and strict identification of CEO changes would be if the same gvkey has two consecutive years with ceoann == "CEO" and the two execid's are different. From there, it is straightforward to get the variable that indicates the years immediately before and after a ceo change.

Code:

xtset gvkey fyear gen byte is_ceo = (ceoann == "CEO") gen byte ceo_changed = execid_n != L1.execid_n & is_ceo == 1 & L1.is_ceo == 1 gen byte wanted = . replace wanted = 1 if F1.ceo_changed == 1 replace wanted = 0 if L1.ceo_changed == 1

But I have to say I don't understand your data, and it doesn't seem particularly well suited to the question you are asking. In most of the observations in your example, the data shows a non-CEO executive. So for most years, we don't know who the CEO is, we only know the identity of one other executive for that year. So it is possible that CEO's are changing often, but in years where the executive in the data set is not the CEO. Am I missing something here?
Comment
RUBY JULY

Join Date: Mar 2023

Posts: 12
#5

09 Mar 2023, 01:46

Dear Clyde,

Thank you very much your support and your code. It works well for my case now. However, I have a similar situation for location changes. I want to create a post that equals one for one year after location changes and zero for one year before location changes. My dataset includes firm, year, zip (old zipcode), ba_zip5 (new zipcode).
I created a post as below:
Gen location_changes = 1 if zip != ba_zip5 & ba_zip5 != . & zip != .
Gen byte post = .
replace post = 1 if F1.location_changes == 1
replace post = 0 if L1.location_changes == 1
When you look at zip and ba_zip5, you can see they repeat across years. Do I need to duplicates drop zip ba_zip5, force, then merge this sub-dataset with the original dataset?
Let me know your recommendations. Thank you.
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(gvkey fyear) long(zip ba_zip5) float location_changes
1004 1994 60191 60007 1
1004 1995 60191 60007 1
1004 1996 60191 60007 1
1004 1997 60191 60007 1
1004 1998 60191 60191 .
1004 1999 60191 60191 .
1004 2000 60191 60191 .
1004 2001 60191 60191 .
1004 2002 60191 60191 .
1004 2003 60191 60191 .
1004 2004 60191 60191 .
1004 2005 60191 60191 .
1004 2006 60191 60191 .
1004 2007 60191 60191 .
1004 2008 60191 60191 .
1004 2009 60191 60191 .
1004 2010 60191 60191 .
1004 2011 60191 60191 .
1004 2012 60191 60191 .
1004 2013 60191 60191 .
1004 2014 60191 60191 .
1004 2015 60191 60191 .
1004 2016 60191 60191 .
1034 1998 . 7024 .
1034 1999 . 7024 .
1034 2000 . 7024 .
1034 2001 . 7024 .
1034 2002 . 7024 .
1034 2003 . 7024 .
1034 2004 . 7024 .
1034 2005 . 7024 .
1034 2006 . 7024 .
1034 2007 . 7024 .
1036 2000 . 28202 .
1045 1994 76155 76155 .
1045 1995 76155 76155 .
1045 1996 76155 76155 .
1045 1997 76155 76155 .
1045 1998 76155 76155 .
1045 1999 76155 76155 .
1045 2000 76155 76155 .
1045 2001 76155 76155 .
1045 2002 76155 76155 .
1045 2003 76155 76155 .
1045 2004 76155 76155 .
1045 2005 76155 76155 .
1045 2006 76155 76155 .
1045 2007 76155 76155 .
1045 2008 76155 76155 .
1045 2009 76155 76155 .
1045 2010 76155 76155 .
1045 2011 76155 76155 .
1045 2014 76155 76155 .
1045 2015 76155 76155 .
1045 2016 76155 76155 .
1045 2017 76155 76155 .
1055 1994 . . .
1055 1995 . . .
1055 1996 . . .
end
[/CODE]
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#6

09 Mar 2023, 10:45

I think there is something wrong with your data, or you are not explain it clearly. Looking at the very first examples, with gvkey = 1004, in fyear's 1994 through 1997, if zip represents the "old zipcode" and ba_zip5 represents the "new zipcode," shouldn't zip become 60007 in 1995, since apparently the firm moved there from 60191 in 1994? Once you move to a new zipcode, shouldn't that new zipcode become the old zipcode for subsequent years (until you move again)? Either these variables mean something different from what I understand, or these are data errors.

Added:

Do I need to duplicates drop zip ba_zip5, force...,

No. I have never used -duplicates drop, force- in the entire time I have been a Stata user, and I never will. -duplicates drop, force- is a dangerous coding practice. While -duplicates drop, force- can be a quick and easy solution to certain problems, there is always a safer way to solve the same problem. Moreover, even in the situations where it gives a quick, correct solution, when you are thinking of using it, it usually means that your data set is incorrect and you have made errors earlier in the code that led up to it. Those errors should be fixed, and the generation of the data set revisited to see if other errors have been overlooked. -duplicates drop, force- sweeps the problem under the rug--which should never be done.

Last edited by Clyde Schechter; 09 Mar 2023, 10:53.
Comment
RUBY JULY

Join Date: Mar 2023

Posts: 12
#7

09 Mar 2023, 14:02

Dear Clyde,

Thank you very much for your response. I am sorry for not explaining my problem clearly for you to understand my data. Actually, I am doing a test of whether firms’ location changes affect our research question? To verify a firm has its location change, I collected data on headquarters-location changes from Bill McDonald’s website (ba_zip5) then I merged with my dataset. I used zip in my dataset and ba_zip5 (that I collected from Bill McDonald’s web) then I will compare them with each other to see whether a firm has a location change or not. Next, I will create a post that is equal to one for one year after location changes and zero for one year before location changes. How can I create that post for this case? Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#8

09 Mar 2023, 14:22

Well, I still don't get it. What does the zip in your original data set represent? You originally referred to it as the "old zipcode," which I took to mean the previous zip code. But it appears that it's just some alternative zip code that might just be a different location the firm also uses? Or a data error? Or what? I guess what I'm saying is that I no longer understand why a comparison between zipcode and ba_zip5 tells you whether a firm has "changed location."

Be that as it may, if I simply take it for granted that change location really is measured by a difference betwen zipcode and ba_zip5, then we have precisely the problem that I raised with regard to CEO changes, and which, in that context, you said doesn't happen. Right at the top of your example data in #5, you have a situation where there are four consecutive years with a change in location. So the second observation is the year after the first location change, but it is also the year before the third location change. The same can be said about the third observation, with respect to the second and fourth observations. So now you really do have to make a decision about how you want to handle this kind of situation.
Comment
RUBY JULY

Join Date: Mar 2023

Posts: 12
#9

09 Mar 2023, 16:16

Dear Clyde,

Thank you for your response. What if I change the definition of post such as POST equal to one for years after location changes, and equal to zero for years prior to location changes. Do you have any ideas to code for that variable? I checked data, and I have not found out any errors. Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#10

09 Mar 2023, 16:31

But again, looking at the first four observations in your example data, how would you handle that? The second and third observations are both before and after a location change. So, how would you set that variable for those two observations?
Comment
RUBY JULY

Join Date: Mar 2023

Posts: 12
#11

10 Mar 2023, 00:50

Dear Clyde,

Thank you for your suggestion. Yeah, you were right. I made an error in my dataset. I fixed it and had the following dataset. How can I create a post in this case? Thank you.
I created location_changes = 1 if zip_old != ZIP_new

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(gvkey fyear) str5(ZIP_new zip_old) float location_changes
1004 1994 "60191" "60007" 1
1004 1995 "60191" "60191" 0
1004 1996 "60191" "60191" 0
1004 1997 "60191" "60191" 0
1004 1998 "60191" "60191" 0
1004 1999 "60191" "60191" 0
1004 2000 "60191" "60191" 0
1004 2001 "60191" "60191" 0
1004 2002 "60191" "60191" 0
1004 2003 "60191" "60191" 0
1004 2004 "60191" "60191" 0
1004 2005 "60191" "60191" 0
1004 2006 "60191" "60191" 0
1004 2007 "60191" "60191" 0
1004 2008 "60191" "60191" 0
1004 2009 "60191" "60191" 0
1004 2010 "60191" "60191" 0
1004 2011 "60191" "60191" 0
1004 2012 "60191" "60191" 0
1004 2013 "60191" "60191" 0
1004 2014 "60191" "60191" 0
1004 2015 "60191" "60191" 0
1004 2016 "60191" "60191" 0
1034 1998 "08807" "7024" 1
1034 1999 "08807" "08807" 0
1034 2000 "08807" "08807" 0
1034 2001 "08807" "08807" 0
1034 2002 "08807" "08807" 0
1034 2003 "08807" "08807" 0
1034 2004 "08807" "08807" 0
1034 2005 "08807" "08807" 0
1034 2006 "08807" "08807" 0
1034 2007 "08807" "08807" 0
1036 2000 "28202" "28202" 0
end
[/CODE]
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

#12

10 Mar 2023, 10:51

Code:

xtset gvkey fyear

gen byte wanted = .
replace wanted = 1 if L1.location_changes == 1
replace wanted = 0 if F1.location_changes == 1

Comment

RUBY JULY

Join Date: Mar 2023

Posts: 12
#13

11 Mar 2023, 14:57

Dear Clyde,

Thanks a lot for your suppport. It works well now. Thank you.
Comment
RUBY JULY

Join Date: Mar 2023

Posts: 12
#14

17 Mar 2023, 01:47

Originally posted by Clyde Schechter View Post

Well, a close and strict identification of CEO changes would be if the same gvkey has two consecutive years with ceoann == "CEO" and the two execid's are different. From there, it is straightforward to get the variable that indicates the years immediately before and after a ceo change.

Code:

xtset gvkey fyear gen byte is_ceo = (ceoann == "CEO") gen byte ceo_changed = execid_n != L1.execid_n & is_ceo == 1 & L1.is_ceo == 1 gen byte wanted = . replace wanted = 1 if F1.ceo_changed == 1 replace wanted = 0 if L1.ceo_changed == 1

But I have to say I don't understand your data, and it doesn't seem particularly well suited to the question you are asking. In most of the observations in your example, the data shows a non-CEO executive. So for most years, we don't know who the CEO is, we only know the identity of one other executive for that year. So it is possible that CEO's are changing often, but in years where the executive in the data set is not the CEO. Am I missing something here?

Dear Clyde,

In the last example, I create a dummy that equals one for one year after CEO changes and zero for one year before CEO changes. If now I want to create a dummy that is equal to one for two years after CEO changes and zero for two years before CEO changes [-2,2]. I will create as below:

gen POST = .
replace POST = 0 if f1.ceo_changed==1
replace POST = 1 if l1.ceo_changed == 1
replace POST =0 if f2.ceo_changed == 1
replace POST = 1 if f2.ceo_changed == 1

is it right? or I just need to gen POST = .
replace POST =0 if f2.ceo_changed == 1
replace POST = 1 if f2.ceo_changed == 1

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#15

17 Mar 2023, 09:39

Well, it is unclear what you want. If two years before and after a CEO change means both the year immediately before and the year before that, as well as the year immediately after and the year after that, then the code would be:

Code:

gen byte POST = . replace POST = 1 if inlist(1, L1.ceo_changed, L2.ceo_changed) replace POST = 0 if inlist(1, F1.ceo_changed, F2.ceo_changed)

But if what you mean is you want it to be 1 two years after, missing 1 year after, missing year of change, missing 1 year before, and 0 two years before, then it would be:

Code:

gen byte POST = . replace POST = 1 if L2.ceo_changed == 1 replace POST = 0 if F2.ceo_changed == 1

I can't discern from what you wrote which of these you want.
Comment

Announcement

Create a dummy variable one year before and after an event with loop functions (e.g: forvalues)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment