Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating new variable using firm-level data

    Hi,

    Using the following dataset, I want to create a variable indicating the number of new "co_code" entering each year.
    (generated by -dataex)
    input double(co_code year) float(n1 n2)
    3 2011 1 9
    3 2012 2 9
    3 2013 3 9
    3 2014 4 9
    3 2015 5 9
    3 2016 6 9
    3 2017 7 9
    3 2018 8 9
    3 2019 9 9
    11 2010 1 10
    11 2011 2 10
    11 2012 3 10
    11 2013 4 10
    11 2014 5 10
    11 2015 6 10
    11 2016 7 10
    11 2017 8 10
    11 2018 9 10
    11 2019 10 10
    13 2015 1 5
    13 2016 2 5
    13 2017 3 5
    13 2018 4 5
    13 2019 5 5
    16 2010 1 8
    16 2011 2 8
    16 2012 3 8
    16 2015 4 8
    16 2016 5 8
    16 2017 6 8
    16 2018 7 8
    16 2019 8 8
    17 2015 1 5
    17 2016 2 5
    17 2017 3 5
    17 2018 4 5
    17 2019 5 5

    n1=nth number of entry for a given co_code
    n2=total number of entries for a given co_code

    Here, I am considering the years from 2010 to 2019, and not every firm (co_code) is reporting the data for all the years.
    My task is to identify
    "how many firms are added in 2011 (i.e., firms not having data for 2010), how many firms are added in 2012 (i.e., firms not having data for 2010 and 2011), how many firms are added in 2013 (i.e., firms not having data for 2010, 2011, and 2012 ), and so on and so forth till 2019"

    Any help to do the same will be appreciated.

    Thank you.

  • #2
    So for general (company, year) datasets you need an indicator variable for each company being new, and then what is wanted is a total across years.

    Code:
    . bysort co_code (year) : gen is_new = _n == 1 
    
    . egen total_new = total(is_new), by(year)
    
    . 
    . tabdisp year, c(total_new)
    
    ----------------------
         year |  total_new
    ----------+-----------
         2010 |          2
         2011 |          1
         2012 |          0
         2013 |          0
         2014 |          0
         2015 |          2
         2016 |          0
         2017 |          0
         2018 |          0
         2019 |          0
    ----------------------
    In your particular set-up

    Code:
    egen total_new = total(n1 == 1), by(year)
    would get you there in one.

    See for more on technique

    https://www.stata.com/support/faqs/d...ble-recording/

    https://www.stata-journal.com/articl...article=dm0055

    Comment


    • #3
      Thank you for the help.

      Comment

      Working...
      X