Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generate dummy==1 for 1 year 2 year and 3 years before an event year

    Dear statalist,

    I have a set of data that spans over 2010-2019, there is a dummy, vf, that ==1 if the event vf happened in that year. vf can only happen once for each firm throughout the sample period.

    What I want to do:
    1. create 6 dummies, vf (already created), pre_vf_1, pre_vf_2, pre_vf_3, post_vf_1, post_vf_2, that ==1 in the year event vf happens, 1 year before vf happens, 2 years before, 3 years before, 1 year after, and 2 years after
    2. create 2 dummies, with pre_vf ==1 if it is 1 to 3 years prior to the year event vf happens, and post_vf ==1 if it is the year event vf happens and 1-2 years after the year event vf happens, so basically categorise the 6 dummies in no.1 to 2 dummies (3 years pre vs current + 2 years post)

    Some firms may not have 3 years prior to the event year. I have another variable, listingyear, which indicates the year the firm is listed (if the firm is not listed, then it won't be in my sample). If a firm is listed in 2012 and vf happens in 2013, then there is only 1 year in the pre-period. If that is the case, pre_vf_1 should be coded 1 while pre_vf_2 and pre_vf_3 should ==. as they do not exist. Whereas pre_vf is still coded 1 as it has a pre-period of 1 year in the 3-year pre-period. If another firm is listed in 2012 and vf also happens in 2012, then pre_vf_1, pre_vf_2, pre_vf_3, and pre_vf all should ==.

    There is also concern that some firms may not have 2 years post to the event year. If the firm is delisted, I exclude the year of delisting and the years after delisting from the sample. Say a firm with vf happen in 2012 and it is delisted in 2014, then post_vf_1 should ==1, post_vf_2should ==. , and post_vf should ==1

    I have data for other variables I needed to do regressions from 2007-2021, so if a firm with vf happens in 2010, pre_vf_1, pre_vf_2, and pre_vf_3 can be coded 1. If a firm with vf happens in 2019, post_vf_1 and post_vf_2 can be coded 1.

    Here is some data
    Code:
    year    firm    listingyear    vf
    2010    881    1998    1
    2011    881    1998    0
    2012    881    1998    0
    2013    881    1998    0
    2014    881    1998    0
    2015    881    1998    0
    2016    881    1998    0
    2017    881    1998    0
    2018    881    1998    0
    2019    881    1998    0
    2010    2074    2006    0
    2011    2074    2006    0
    2012    2074    2006    0
    2013    2074    2006    0
    2014    2074    2006    0
    2015    2074    2006    0
    2016    2074    2006    0
    2017    2074    2006    0
    2018    2074    2006    1
    2019    2074    2006    0
    2011    2641    2011    1
    2012    2641    2011    0
    2013    2641    2011    0
    2014    2641    2011    0
    2015    2641    2011    0
    2016    2641    2011    0
    2017    2641    2011    0
    2018    2641    2011    0
    2019    2641    2011    0
    2015    2772    2015    0
    2016    2772    2015    0
    2017    2772    2015    0
    2018    2772    2015    0
    2019    2772    2015    0
    2015    2773    2015    0
    2016    2773    2015    0
    2017    2773    2015    0
    2018    2773    2015    1
    2019    2773    2015    0
    I hope this is not too confusing. Thanks for any suggestions

  • #2
    It's not hard to do.

    Code:
    xtset firm year
    forvalues i = 1/3 {
        gen pre_vf_`i' = F`i'.vf
        gen post_vf_`i' = L`i'.vf
    }
    But, there is no reason to create these dummies in order to include them in a regression. You can just use L(1/3).vf, and F(1/3).vf in your list of regression variables.* Stata will then create virtual indicators ("dummies") on the fly and use them in the regression, without cluttering up your data set with a bunch of variables that convey no new information.

    *Actually, you can do that whole list even more compactly as L(-3/3).vf.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thanks for your code, it works well! I've made things unnecessarily complicated lol
      And I will make sure using dataex in later posts, thanks for this suggestion as well
      Last edited by Flora Yin; 26 Mar 2022, 21:11.

      Comment


      • #4
        Hi everyone,

        I have a very similar problem with the second requirement of post #1. Instead of 3 years pre and post the event year, I'm doing a two-year period. I need an indicator variable, "pre", to equal to 1 if it is 1-2 years prior to the event year, and another indicator variable, "post", to equal to 1 if it is 0-1 years after the event year. The code given above creates separate indicator variables for each pre and post year, but I would like two variables, pre and post, each spans over a two-year period. How can I do this?

        My sample period is 2012-2020, id is firm, and shock is the event. Here is a simple example of the data:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(id year shock)
        1 2012 0
        1 2013 0
        1 2014 0
        1 2015 1
        1 2016 0
        1 2017 0
        1 2018 0
        1 2019 0
        1 2020 0
        2 2012 0
        2 2013 1
        2 2014 0
        2 2015 0
        2 2016 0
        2 2017 0
        2 2018 0
        2 2019 0
        2 2020 0
        end
        Besides, I'm unsure what should I do if there are less than 2 years in the pre period (similar concern for post period), such as firm 2. should I code pre as 1 in 2012, so this firm only gets 1 year in the pre period, not 2; or should I collect additional data to make sure every firm would have 2 years of pre and post period?

        Thanks a lot for any advice!

        Comment


        • #5
          Code:
          xtset id year
          gen byte pre = inlist(1, L1.shock, L2.shock)
          gen byte post = inlist(1, shock, F1.shock)
          Besides, I'm unsure what should I do if there are less than 2 years in the pre period (similar concern for post period), such as firm 2. should I code pre as 1 in 2012, so this firm only gets 1 year in the pre period, not 2; or should I collect additional data to make sure every firm would have 2 years of pre and post period?
          If it is feasible to collect additional data, that is usually a good idea. But if it isn't, it seems to me that counting the 1 year after the shock is sensible: it is "semantically" consistent with the definition of the variable in other circumstances. I suppose there are real-world contexts in which that might not be appropriate, but none come immediately to mind. In the end, though, this is a substantive question, not a statistical one. So you should consult with somebody in your field or the previous literature to see how it is typically handled for this kind of problem.

          Comment


          • #6
            Hi Clyde,

            Thanks for your help. I think maybe the code for pre and post should be reversed. If I use the following code it works as I want (although it is conter-intuitive that post period uses lagged value while pre uses lead):
            Code:
            gen byte post = inlist(1, shock, L1.shock)
            gen byte pre = inlist(1, F1.shock, F2.shock)
            And thanks for your advice, I will find some published papers and see what they do in this regard.

            Comment


            • #7
              Hi Lucy, I'm absolutely not an expert, but according to my supervisor, he suggests collecting more data. Hope this helps.
              Last edited by Flora Yin; 28 Mar 2022, 10:16.

              Comment


              • #8
                Lucy Garcia is correct in #6: I had pre and post reversed.

                Comment


                • #9
                  Clyde Schechter Just an amendment in #1. What if I want to create a dummy for year 2 and after and not just for year 2? I understand that for the first three dummies, i can use the L(-2/1).vf, but for the fourth dummy, I need a variable that measures the impact two years and onwards, and not just in the second year after the adoption of a law.

                  Comment


                  • #10
                    I read #9 as a different question. This may help and if it doesn't please explain -- in terms of this data example or a better one -- what different thing you want.

                    There are other ways to do it e.g, in terms of L2.year being greater than the year of shock, which is the same idea in different form.

                    Code:
                    . clear
                    
                    . set obs 10 
                    Number of observations (_N) was 0, now 10.
                    
                    . gen year = 2010 + _n 
                    
                    . gen shock = year == 2015 
                    
                    . 
                    . list 
                    
                         +--------------+
                         | year   shock |
                         |--------------|
                      1. | 2011       0 |
                      2. | 2012       0 |
                      3. | 2013       0 |
                      4. | 2014       0 |
                      5. | 2015       1 |
                         |--------------|
                      6. | 2016       0 |
                      7. | 2017       0 |
                      8. | 2018       0 |
                      9. | 2019       0 |
                     10. | 2020       0 |
                         +--------------+
                    
                    . 
                    . 
                    . gen previous = year if shock 
                    (9 missing values generated)
                    
                    . replace previous = previous[_n-1] if missing(previous)
                    (5 real changes made)
                    
                    . 
                    . gen wanted = year >= previous + 2 
                    
                    . 
                    . list, sepby(previous)
                    
                         +----------------------------------+
                         | year   shock   previous   wanted |
                         |----------------------------------|
                      1. | 2011       0          .        0 |
                      2. | 2012       0          .        0 |
                      3. | 2013       0          .        0 |
                      4. | 2014       0          .        0 |
                         |----------------------------------|
                      5. | 2015       1       2015        0 |
                      6. | 2016       0       2015        0 |
                      7. | 2017       0       2015        1 |
                      8. | 2018       0       2015        1 |
                      9. | 2019       0       2015        1 |
                     10. | 2020       0       2015        1 |
                         +----------------------------------+

                    Comment


                    • #11
                      Thanks Nick Cox . Perhaps, you are correct. This is a bit different. I will try to explain the situation here.

                      So, I have a panel dataset and shock is at the state level. Different states adopted the law in different years. The data starts from 2002. The issue is that I do not have a balanced panel. So, let's say in a state (x) the law was implemented in 2006, so for a particular firm with the data from 2002 to 2018, the dummy variable should be 1 for 2008 to 2018. But if the data for a specific firm in a specific state (x) is starting from 2012, then this wanted should be one for this firm-year observation in 2012 and all the upcoming firm-year observations for this firm. But this is not what is happening right now.

                      I am attaching the sample data.

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      
                      input float id double year str8 state int yearadopted
                      
                       1 2002 "IL" 2005
                       1 2003 "IL" 2005
                       1 2004 "IL" 2005
                       1 2005 "IL" 2005
                       1 2006 "IL" 2005
                       1 2007 "IL" 2005
                       1 2008 "IL" 2005
                       1 2009 "IL" 2005
                       1 2010 "IL" 2005
                       1 2011 "IL" 2005
                       1 2012 "IL" 2005
                       1 2013 "IL" 2005
                       1 2014 "IL" 2005
                       1 2015 "IL" 2005
                       1 2017 "IL" 2005
                       1 2018 "IL" 2005
                       1 2019 "IL" 2005
                       1 2020 "IL" 2005
                       1 2021 "IL" 2005
                       2 2005 "MN" 2005
                       2 2006 "MN" 2005
                       2 2007 "MN" 2005
                       2 2008 "MN" 2005
                       2 2009 "MN" 2005
                       2 2010 "MN" 2005
                       3 2002 "NJ" 2005
                       3 2003 "NJ" 2005
                       3 2004 "NJ" 2005
                       3 2005 "NJ" 2005
                       3 2006 "NJ" 2005
                       3 2007 "NJ" 2005
                       4 2002 "MO" 2009
                       4 2003 "MO" 2009
                       5 2014 "TX" 2007
                       5 2015 "TX" 2007
                       5 2016 "TX" 2007
                       5 2017 "TX" 2007
                       6 2006 "OH" 2005
                       6 2007 "OH" 2005
                       6 2008 "OH" 2005
                       6 2009 "OH" 2005
                       6 2010 "OH" 2005
                       6 2011 "OH" 2005
                       6 2012 "OH" 2005
                       6 2013 "OH" 2005
                       6 2014 "OH" 2005
                       6 2015 "OH" 2005
                       6 2016 "OH" 2005
                       6 2017 "TX" 2007
                       6 2018 "TX" 2007
                       6 2019 "TX" 2007
                       6 2020 "TX" 2007
                       6 2021 "TX" 2007
                       6 2022 "TX" 2007
                       7 2002 "NY" 2005
                       7 2003 "NY" 2005
                       7 2004 "NY" 2005
                       7 2005 "NY" 2005
                       7 2006 "NY" 2005
                       8 2002 "SC" 2008
                       8 2003 "SC" 2008
                       8 2004 "SC" 2008
                       8 2005 "SC" 2008
                       8 2006 "SC" 2008
                       8 2007 "SC" 2008
                       8 2008 "SC" 2008
                       8 2009 "SC" 2008
                       8 2010 "SC" 2008
                       8 2011 "SC" 2008
                       8 2012 "SC" 2008
                       8 2013 "SC" 2008
                       8 2014 "SC" 2008
                       8 2015 "SC" 2008
                       8 2016 "SC" 2008
                       8 2017 "SC" 2008
                       8 2018 "SC" 2008
                       9 2002 "AZ" 2006
                       9 2003 "AZ" 2006
                       9 2004 "AZ" 2006
                       9 2005 "AZ" 2006
                       9 2006 "AZ" 2006
                       9 2007 "AZ" 2006
                       9 2008 "AZ" 2006
                       9 2009 "AZ" 2006
                       9 2010 "AZ" 2006
                       9 2011 "AZ" 2006
                       9 2012 "AZ" 2006
                       9 2013 "AZ" 2006
                       9 2014 "AZ" 2006
                       9 2015 "AZ" 2006
                       9 2016 "AZ" 2006
                       9 2017 "AZ" 2006
                       9 2018 "AZ" 2006
                       9 2019 "AZ" 2006
                       9 2020 "AZ" 2006
                       9 2021 "AZ" 2006
                       9 2022 "AZ" 2006
                      10 2011 "GA" 2005
                      10 2012 "GA" 2005
                      10 2013 "GA" 2005
                      end

                      Comment


                      • #12
                        Lack of balance is immaterial here so far as I can see and indeed so is your panel structure. As in my answer


                        Code:
                        gen wanted = (year - yearadopted) >= 2
                        is an indicator that is 1 if 2 or more years have passed since adoption and 0 otherwise. That applies across states and firms because your adoption year is already providing the state-specific detail needed.

                        But this is not what is happening right now.
                        I don't see what this refers to.






                        Comment


                        • #13
                          Thank you Nick Cox . This is exactly what I needed.

                          Comment

                          Working...
                          X