Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a dummy variable one year before and after an event with loop functions (e.g: forvalues)

    Hi everyone,

    I would like to create a dummy variable that equals 1 for one year after the CEO changes and one year before the CEO changes by using loop functions such as forvalues, but I do not have any ideas how to create that variable. I used execid (executives’ ID) and ceoann (CEO annual flag) to identify whether firms have CEO changes across years. Therefore, I created a variable: CEO_changes that equals 1 for firms having CEO changes and 0 otherwise. In particular:
    gen lag_execid = l1.execid
    gen CEO_changes = 1 if ceoann == “CEO” & execid != lag_execid
    replace CEO_changes = 0 if ceoann == “CEO” & execid == lag_execid
    However, I do not know how to create a dummy that equals one year before and 0 for one year after the CEO changes in next steps. Can you suggest me some ideas? Thank you.
    gvkey fyear ceoann execid execid_n lag_execide CEO_changes
    1004 2002 9252 9252
    1004 2003 9251 9251 9252
    1004 2004 9252 9252 9251
    1004 2005 23781 23781 9252
    1004 2006 CEO 9249 9249 23781 1
    1004 2007 CEO 9249 9249 9249 0
    1004 2008 23781 23781 9249
    1004 2009 36199 36199 23781
    1004 2010 CEO 9249 9249 36199 1
    1004 2011 41787 41787 9249
    1004 2012 33979 33979 41787
    1004 2013 46404 46404 33979
    1004 2014 46404 46404 46404
    1004 2015 CEO 9249 9249 46404 1
    1004 2016 51706 51706 9249
    1034 2002
    1034 2003
    1034 2004
    1034 2005
    1034 2006
    1038 2002
    1045 2001 3662 3662
    1045 2002 CEO 3661 3661 3662 1
    1045 2003 3661 3661 3661
    1045 2004 16654 16654 3661
    1045 2005 16654 16654 16654
    1045 2006 16654 16654 16654
    1045 2007 28280 28280 16654
    1045 2008 CEO 14591 14591 28280 1
    1045 2009 28280 28280 14591
    1045 2010 28280 28280 28280
    1045 2011 CEO 14591 14591 28280 1
    1045 2014 46189 46189
    1045 2015 46190 46190 46189
    1045 2016 46193 46193 46190

  • #2
    Suppose a firm experiences a CEO change in each of three consecutive years. Then the second of those years is both a year before and a year after a CEO change. How do you want to handle that situation? (And similar situations if there were even more than three consecutive years of CEO change.)

    Also, why do you have two variables execid and execid_n which have the exact same values in every observation? I imagine that one of them is a string and the other is a numeric variable. But this is just an indication of the inadequacy of using a tabular display of data: it lacks important information. The effective and helpful way to show example data is with the -dataex- command. If you would like help with code, you should use -dataex- to post example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When you post back with answers to the questions posed here, please also include -dataex- output to exhibit your data.

    Last edited by Clyde Schechter; 07 Mar 2023, 22:12.

    Comment


    • #3
      Dear Clyde,

      Thank you for your response. Yes, you asked very good questions. I expect that a firm does not experience a CEO change in each two or three consecutive years (because I do not know how to deal with the dummy if a firm have a CEO change in each two or three consecutive years). However, I figure that the above way that I used to create CEO_changes, was seemly wrong because it did not successfully remove non-CEO observations, especially missing values in the ceoann variable. I am trying to find a solution to fix that problem first (create CEO_changes correctly), then create a dummy later. Can you give some recommendations? Thanks a lot.
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(gvkey fyear) str2 ceoso str4 ceoann str6 execid long(execid_n becameceo leftofc) str4 pceo float(Pastceodum Ceodivdummypp lag_execide CEO_changes)
      1004 2002 "" "" "09252" 9252 . . "" 0 1 . .
      1004 2003 "Y" "" "09251" 9251 . . "" 0 1 9252 .
      1004 2004 "Y" "" "09252" 9252 . . "" 0 1 9251 .
      1004 2005 "Y" "" "23781" 23781 . . "" 0 1 9252 .
      1004 2006 "Y" "CEO" "09249" 9249 13431 21335 "CEO" 0 1 23781 1
      1004 2007 "Y" "CEO" "09249" 9249 13431 21335 "CEO" 0 1 9249 .
      1004 2008 "Y" "" "23781" 23781 . . "" 0 1 9249 .
      1004 2009 "Y" "" "36199" 36199 . . "" 0 1 23781 .
      1004 2010 "Y" "CEO" "09249" 9249 13431 21335 "CEO" 0 1 36199 1
      1004 2011 "Y" "" "41787" 41787 . . "" 0 1 9249 .
      1004 2012 "Y" "" "33979" 33979 . . "" 0 1 41787 .
      1004 2013 "Y" "" "46404" 46404 . . "" 0 1 33979 .
      1004 2014 "Y" "" "46404" 46404 . . "" 0 1 46404 .
      1004 2015 "Y" "CEO" "09249" 9249 13431 21335 "CEO" . . 46404 1
      1004 2016 "Y" "" "51706" 51706 . . "" . . 9249 .
      1034 2002 "" "" "" . . . "" 0 1 . .
      1034 2003 "Y" "" "" . . . "" 0 1 . .
      1034 2004 "Y" "" "" . . . "" 0 1 . .
      1034 2005 "Y" "" "" . . . "" 0 1 . .
      1034 2006 "Y" "" "" . . . "" 1 1 . .
      1038 2002 "" "" "" . . . "" . . . .
      1045 2001 "" "" "03662" 3662 . . "" 1 1 . .
      1045 2002 "" "CEO" "03661" 3661 14019 15819 "" 1 1 3662 1
      1045 2003 "Y" "" "03661" 3661 14019 15819 "" 1 1 3661 .
      1045 2004 "Y" "" "16654" 16654 . . "" 1 1 3661 .
      1045 2005 "Y" "" "16654" 16654 . . "" 1 1 16654 .
      1045 2006 "Y" "" "16654" 16654 . . "" 1 1 16654 .
      1045 2007 "Y" "" "28280" 28280 . . "" 1 1 16654 .
      1045 2008 "Y" "CEO" "14591" 14591 15820 18959 "" 1 1 28280 1
      1045 2009 "Y" "" "28280" 28280 . . "" 1 1 14591 .
      1045 2010 "Y" "" "28280" 28280 . . "" 1 1 28280 .
      1045 2011 "Y" "CEO" "14591" 14591 15820 18959 "" 1 1 28280 1
      1045 2014 "Y" "" "46189" 46189 . . "" . . . .
      1045 2015 "Y" "" "46190" 46190 . . "" . . 46189 .
      1045 2016 "Y" "" "46193" 46193 . . "" . . 46190 .
      1075 2002 "" "CEO" "10739" 10739 14276 18017 "" 1 1 . .
      1075 2003 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
      1075 2004 "Y" "" "19092" 19092 . . "" 1 1 10739 .
      1075 2005 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 19092 1
      1075 2006 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
      1075 2007 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
      1075 2008 "Y" "CEO" "10739" 10739 14276 18017 "" 1 1 10739 .
      1075 2009 "Y" "" "39817" 39817 . . "" 0 1 10739 .
      1075 2010 "Y" "" "33218" 33218 . . "" 1 1 39817 .
      1075 2011 "Y" "" "12755" 12755 . . "" 1 1 33218 .
      1075 2012 "Y" "CEO" "05835" 5835 18017 . "CEO" 1 1 12755 1
      1075 2013 "Y" "" "39817" 39817 . . "" 1 1 5835 .
      1075 2014 "Y" "" "12755" 12755 . . "" 1 1 39817 .
      1075 2015 "Y" "" "12755" 12755 . . "" 1 1 12755 .
      1075 2016 "Y" "" "12755" 12755 . . "" 1 1 12755 .
      1078 2002 "" "" "20862" 20862 . . "" 0 1 . .
      1078 2003 "Y" "" "18569" 18569 . . "" 0 1 20862 .
      1078 2004 "Y" "" "20862" 20862 . . "" 0 1 18569 .
      1078 2005 "Y" "" "24400" 24400 . . "" 0 1 20862 .
      1078 2006 "Y" "CEO" "14300" 14300 14245 . "CEO" 0 1 24400 1
      1078 2007 "Y" "" "24400" 24400 . . "" 0 1 14300 .
      1078 2008 "Y" "" "24400" 24400 . . "" 0 1 24400 .
      1078 2009 "Y" "" "35171" 35171 . . "" 0 1 24400 .
      1078 2010 "Y" "CEO" "14300" 14300 14245 . "CEO" 0 1 35171 1
      1078 2011 "Y" "" "42389" 42389 . . "" 0 1 14300 .
      1078 2012 "Y" "" "24400" 24400 . . "" 0 1 42389 .
      1078 2013 "Y" "CEO" "14300" 14300 14245 . "CEO" 0 1 24400 1
      1078 2014 "Y" "" "45356" 45356 . . "" 0 1 14300 .
      1078 2015 "Y" "" "48541" 48541 . . "" 0 1 45356 .
      1078 2016 "Y" "" "45356" 45356 . . "" 0 1 48541 .
      1111 2004 "Y" "" "" . . . "" 0 1 . .
      1111 2005 "Y" "" "" . . . "" 0 1 . .
      1111 2006 "Y" "" "" . . . "" 0 1 . .
      1161 2002 "" "CEO" "20918" 20918 15455 17730 "" 0 1 . .
      1161 2003 "Y" "" "25959" 25959 17714 18637 "" 0 1 20918 .
      1161 2004 "Y" "" "24474" 24474 . . "" 0 1 25959 .
      1161 2005 "Y" "" "27476" 27476 . . "" 0 1 24474 .
      1161 2006 "Y" "" "24474" 24474 . . "" 1 1 27476 .
      1161 2007 "Y" "CEO" "20918" 20918 15455 17730 "" 1 1 24474 1
      1161 2008 "Y" "" "24474" 24474 . . "" 1 1 20918 .
      1161 2009 "Y" "" "37980" 37980 18637 18840 "" 0 1 24474 .
      1161 2010 "Y" "CEO" "25959" 25959 17714 18637 "" 0 1 37980 1
      1161 2011 "Y" "CEO" "37980" 37980 18637 18840 "" 0 1 25959 1
      1161 2012 "Y" "" "42391" 42391 . . "" . . 37980 .
      end
      format %d becameceo
      format %d leftofc

      Comment


      • #4
        Well, a close and strict identification of CEO changes would be if the same gvkey has two consecutive years with ceoann == "CEO" and the two execid's are different. From there, it is straightforward to get the variable that indicates the years immediately before and after a ceo change.
        Code:
        xtset gvkey fyear
        gen byte is_ceo = (ceoann == "CEO")
        gen byte ceo_changed = execid_n != L1.execid_n & is_ceo == 1 & L1.is_ceo == 1
        gen byte wanted = .
        replace wanted = 1 if F1.ceo_changed == 1
        replace wanted = 0 if L1.ceo_changed == 1
        But I have to say I don't understand your data, and it doesn't seem particularly well suited to the question you are asking. In most of the observations in your example, the data shows a non-CEO executive. So for most years, we don't know who the CEO is, we only know the identity of one other executive for that year. So it is possible that CEO's are changing often, but in years where the executive in the data set is not the CEO. Am I missing something here?

        Comment


        • #5
          Dear Clyde,

          Thank you very much your support and your code. It works well for my case now. However, I have a similar situation for location changes. I want to create a post that equals one for one year after location changes and zero for one year before location changes. My dataset includes firm, year, zip (old zipcode), ba_zip5 (new zipcode).
          I created a post as below:
          Gen location_changes = 1 if zip != ba_zip5 & ba_zip5 != . & zip != .
          Gen byte post = .
          replace post = 1 if F1.location_changes == 1
          replace post = 0 if L1.location_changes == 1
          When you look at zip and ba_zip5, you can see they repeat across years. Do I need to duplicates drop zip ba_zip5, force, then merge this sub-dataset with the original dataset?
          Let me know your recommendations. Thank you.
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double(gvkey fyear) long(zip ba_zip5) float location_changes
          1004 1994 60191 60007 1
          1004 1995 60191 60007 1
          1004 1996 60191 60007 1
          1004 1997 60191 60007 1
          1004 1998 60191 60191 .
          1004 1999 60191 60191 .
          1004 2000 60191 60191 .
          1004 2001 60191 60191 .
          1004 2002 60191 60191 .
          1004 2003 60191 60191 .
          1004 2004 60191 60191 .
          1004 2005 60191 60191 .
          1004 2006 60191 60191 .
          1004 2007 60191 60191 .
          1004 2008 60191 60191 .
          1004 2009 60191 60191 .
          1004 2010 60191 60191 .
          1004 2011 60191 60191 .
          1004 2012 60191 60191 .
          1004 2013 60191 60191 .
          1004 2014 60191 60191 .
          1004 2015 60191 60191 .
          1004 2016 60191 60191 .
          1034 1998 . 7024 .
          1034 1999 . 7024 .
          1034 2000 . 7024 .
          1034 2001 . 7024 .
          1034 2002 . 7024 .
          1034 2003 . 7024 .
          1034 2004 . 7024 .
          1034 2005 . 7024 .
          1034 2006 . 7024 .
          1034 2007 . 7024 .
          1036 2000 . 28202 .
          1045 1994 76155 76155 .
          1045 1995 76155 76155 .
          1045 1996 76155 76155 .
          1045 1997 76155 76155 .
          1045 1998 76155 76155 .
          1045 1999 76155 76155 .
          1045 2000 76155 76155 .
          1045 2001 76155 76155 .
          1045 2002 76155 76155 .
          1045 2003 76155 76155 .
          1045 2004 76155 76155 .
          1045 2005 76155 76155 .
          1045 2006 76155 76155 .
          1045 2007 76155 76155 .
          1045 2008 76155 76155 .
          1045 2009 76155 76155 .
          1045 2010 76155 76155 .
          1045 2011 76155 76155 .
          1045 2014 76155 76155 .
          1045 2015 76155 76155 .
          1045 2016 76155 76155 .
          1045 2017 76155 76155 .
          1055 1994 . . .
          1055 1995 . . .
          1055 1996 . . .
          end
          [/CODE]

          Comment


          • #6
            I think there is something wrong with your data, or you are not explain it clearly. Looking at the very first examples, with gvkey = 1004, in fyear's 1994 through 1997, if zip represents the "old zipcode" and ba_zip5 represents the "new zipcode," shouldn't zip become 60007 in 1995, since apparently the firm moved there from 60191 in 1994? Once you move to a new zipcode, shouldn't that new zipcode become the old zipcode for subsequent years (until you move again)? Either these variables mean something different from what I understand, or these are data errors.

            Added:
            Do I need to duplicates drop zip ba_zip5, force...,
            No. I have never used -duplicates drop, force- in the entire time I have been a Stata user, and I never will. -duplicates drop, force- is a dangerous coding practice. While -duplicates drop, force- can be a quick and easy solution to certain problems, there is always a safer way to solve the same problem. Moreover, even in the situations where it gives a quick, correct solution, when you are thinking of using it, it usually means that your data set is incorrect and you have made errors earlier in the code that led up to it. Those errors should be fixed, and the generation of the data set revisited to see if other errors have been overlooked. -duplicates drop, force- sweeps the problem under the rug--which should never be done.
            Last edited by Clyde Schechter; 09 Mar 2023, 10:53.

            Comment


            • #7
              Dear Clyde,

              Thank you very much for your response. I am sorry for not explaining my problem clearly for you to understand my data. Actually, I am doing a test of whether firms’ location changes affect our research question? To verify a firm has its location change, I collected data on headquarters-location changes from Bill McDonald’s website (ba_zip5) then I merged with my dataset. I used zip in my dataset and ba_zip5 (that I collected from Bill McDonald’s web) then I will compare them with each other to see whether a firm has a location change or not. Next, I will create a post that is equal to one for one year after location changes and zero for one year before location changes. How can I create that post for this case? Thank you

              Comment


              • #8
                Well, I still don't get it. What does the zip in your original data set represent? You originally referred to it as the "old zipcode," which I took to mean the previous zip code. But it appears that it's just some alternative zip code that might just be a different location the firm also uses? Or a data error? Or what? I guess what I'm saying is that I no longer understand why a comparison between zipcode and ba_zip5 tells you whether a firm has "changed location."

                Be that as it may, if I simply take it for granted that change location really is measured by a difference betwen zipcode and ba_zip5, then we have precisely the problem that I raised with regard to CEO changes, and which, in that context, you said doesn't happen. Right at the top of your example data in #5, you have a situation where there are four consecutive years with a change in location. So the second observation is the year after the first location change, but it is also the year before the third location change. The same can be said about the third observation, with respect to the second and fourth observations. So now you really do have to make a decision about how you want to handle this kind of situation.

                Comment


                • #9
                  Dear Clyde,

                  Thank you for your response. What if I change the definition of post such as POST equal to one for years after location changes, and equal to zero for years prior to location changes. Do you have any ideas to code for that variable? I checked data, and I have not found out any errors. Thank you.

                  Comment


                  • #10
                    But again, looking at the first four observations in your example data, how would you handle that? The second and third observations are both before and after a location change. So, how would you set that variable for those two observations?

                    Comment


                    • #11
                      Dear Clyde,

                      Thank you for your suggestion. Yeah, you were right. I made an error in my dataset. I fixed it and had the following dataset. How can I create a post in this case? Thank you.
                      I created location_changes = 1 if zip_old != ZIP_new

                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input double(gvkey fyear) str5(ZIP_new zip_old) float location_changes
                      1004 1994 "60191" "60007" 1
                      1004 1995 "60191" "60191" 0
                      1004 1996 "60191" "60191" 0
                      1004 1997 "60191" "60191" 0
                      1004 1998 "60191" "60191" 0
                      1004 1999 "60191" "60191" 0
                      1004 2000 "60191" "60191" 0
                      1004 2001 "60191" "60191" 0
                      1004 2002 "60191" "60191" 0
                      1004 2003 "60191" "60191" 0
                      1004 2004 "60191" "60191" 0
                      1004 2005 "60191" "60191" 0
                      1004 2006 "60191" "60191" 0
                      1004 2007 "60191" "60191" 0
                      1004 2008 "60191" "60191" 0
                      1004 2009 "60191" "60191" 0
                      1004 2010 "60191" "60191" 0
                      1004 2011 "60191" "60191" 0
                      1004 2012 "60191" "60191" 0
                      1004 2013 "60191" "60191" 0
                      1004 2014 "60191" "60191" 0
                      1004 2015 "60191" "60191" 0
                      1004 2016 "60191" "60191" 0
                      1034 1998 "08807" "7024" 1
                      1034 1999 "08807" "08807" 0
                      1034 2000 "08807" "08807" 0
                      1034 2001 "08807" "08807" 0
                      1034 2002 "08807" "08807" 0
                      1034 2003 "08807" "08807" 0
                      1034 2004 "08807" "08807" 0
                      1034 2005 "08807" "08807" 0
                      1034 2006 "08807" "08807" 0
                      1034 2007 "08807" "08807" 0
                      1036 2000 "28202" "28202" 0
                      end
                      [/CODE]

                      Comment


                      • #12
                        Code:
                        xtset gvkey fyear
                        
                        gen byte wanted = .
                        replace wanted = 1 if L1.location_changes == 1
                        replace wanted = 0 if F1.location_changes == 1

                        Comment


                        • #13
                          Dear Clyde,

                          Thanks a lot for your suppport. It works well now. Thank you.

                          Comment


                          • #14
                            Originally posted by Clyde Schechter View Post
                            Well, a close and strict identification of CEO changes would be if the same gvkey has two consecutive years with ceoann == "CEO" and the two execid's are different. From there, it is straightforward to get the variable that indicates the years immediately before and after a ceo change.
                            Code:
                            xtset gvkey fyear
                            gen byte is_ceo = (ceoann == "CEO")
                            gen byte ceo_changed = execid_n != L1.execid_n & is_ceo == 1 & L1.is_ceo == 1
                            gen byte wanted = .
                            replace wanted = 1 if F1.ceo_changed == 1
                            replace wanted = 0 if L1.ceo_changed == 1
                            But I have to say I don't understand your data, and it doesn't seem particularly well suited to the question you are asking. In most of the observations in your example, the data shows a non-CEO executive. So for most years, we don't know who the CEO is, we only know the identity of one other executive for that year. So it is possible that CEO's are changing often, but in years where the executive in the data set is not the CEO. Am I missing something here?
                            Dear Clyde,

                            In the last example, I create a dummy that equals one for one year after CEO changes and zero for one year before CEO changes. If now I want to create a dummy that is equal to one for two years after CEO changes and zero for two years before CEO changes [-2,2]. I will create as below:

                            gen POST = .
                            replace POST = 0 if f1.ceo_changed==1
                            replace POST = 1 if l1.ceo_changed == 1
                            replace POST =0 if f2.ceo_changed == 1
                            replace POST = 1 if f2.ceo_changed == 1

                            is it right? or I just need to gen POST = .
                            replace POST =0 if f2.ceo_changed == 1
                            replace POST = 1 if f2.ceo_changed == 1

                            Thank you.

                            Comment


                            • #15
                              Well, it is unclear what you want. If two years before and after a CEO change means both the year immediately before and the year before that, as well as the year immediately after and the year after that, then the code would be:

                              Code:
                              gen byte POST = .
                              replace POST = 1 if inlist(1, L1.ceo_changed, L2.ceo_changed)
                              replace POST = 0 if inlist(1, F1.ceo_changed, F2.ceo_changed)
                              But if what you mean is you want it to be 1 two years after, missing 1 year after, missing year of change, missing 1 year before, and 0 two years before, then it would be:
                              Code:
                              gen byte POST = .
                              replace POST = 1 if L2.ceo_changed == 1
                              replace POST = 0 if F2.ceo_changed == 1
                              I can't discern from what you wrote which of these you want.

                              Comment

                              Working...
                              X