Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Copying information from current year to other years based on taxcode.

    Hello, I have data which looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year str11 taxcode str10 taxcode10
    2001 "000000000" ""          
    2005 "000000800" ""          
    2006 "000000800" ""          
    2005 "000000809" ""          
    2005 "000062723" ""          
    2007 "000718119" "000718119"
    2006 "001005091" ""          
    2007 "001005091" "001005091"
    2006 "001009069" ""          
    2007 "001009069" "001009069"
    2006 "001010417" ""          
    2006 "001010477" ""          
    2006 "001011043" ""          
    2007 "001011043" "001011043"
    2006 "001011639" ""          
    2007 "001011639" "001011639"
    2006 "001012731" ""          
    2006 "001013184" ""          
    2006 "001013267" ""          
    2007 "001013267" "001013267"
    2006 "001013405" ""          
    2007 "001013405" "001013405"
    2006 "001013422" ""          
    2007 "001013422" "001013422"
    2006 "001013434" ""          
    2006 "001013491" ""          
    2007 "001013491" "001013491"
    2006 "001013609" ""          
    2007 "001013609" "001013609"
    2006 "001014062" ""          
    2006 "001014282" ""          
    2002 "001030330" ""          
    2001 "001122040" ""          
    2002 "002404321" ""          
    2003 "002472506" ""          
    2005 "003028739" ""          
    2004 "004726657" ""          
    2005 "004726657" ""          
    2006 "004726657" ""          
    2001 "009540701" ""          
    2006 "010001031" ""          
    2007 "010001031" "0100010315"
    2001 "010001800" ""          
    2001 "010004095" ""          
    2001 "010010000" ""          
    2002 "010010000" ""          
    2003 "010010000" ""          
    2004 "010010000" ""          
    2004 "010010000" ""          
    2005 "010010000" ""          
    2005 "010010000" ""          
    2006 "010010000" ""          
    2006 "010010000" ""          
    2006 "010010000" ""          
    2007 "010010000" "0100100008"
    2007 "010010000" "0100100008"
    2007 "010010000" "0100100008"
    2001 "010010001" ""          
    2002 "010010001" ""          
    2003 "010010001" ""          
    2004 "010010001" ""          
    2004 "010010001" ""          
    2005 "010010001" ""          
    2005 "010010001" ""          
    2005 "010010001" ""          
    2006 "010010001" ""          
    2006 "010010001" ""          
    2007 "010010001" "0100100015"
    2007 "010010001" "0100100015"
    2001 "010010004" ""          
    2002 "010010004" ""          
    2003 "010010004" ""          
    2004 "010010004" ""          
    2005 "010010004" ""          
    2006 "010010004" ""          
    2007 "010010004" "0100100047"
    2002 "010010005" ""          
    2003 "010010005" ""          
    2004 "010010005" ""          
    2005 "010010005" ""          
    2006 "010010005" ""          
    2007 "010010005" "0100100054"
    2001 "010010006" ""          
    2002 "010010006" ""          
    2003 "010010006" ""          
    2004 "010010006" ""          
    2005 "010010006" ""          
    2006 "010010006" ""          
    2007 "010010006" "0100100061"
    2001 "010010007" ""          
    2002 "010010007" ""          
    2003 "010010007" ""          
    2004 "010010007" ""          
    2005 "010010007" ""          
    2001 "010010008" ""          
    2001 "010010008" ""          
    2002 "010010008" ""          
    2003 "010010008" ""          
    2004 "010010008" ""          
    2005 "010010008" ""          
    end
    The data is sorted by taxcode year, each taxcode represents a firm, and firms are observed over several years, the firm may or may not appear in every year. Notice that some firms have a 'taxcode10' entry in 2007, what I'm trying to do is to transfer this same taxcode10 entry to the same firm in observed in other years.

    For example the firm with the taxcode "010010006" has a 'taxcode10' entry in 2007, I would like this firm to have this same 'taxcode10' entry in all other years in which this firm appears in the data, and I would like to do this for all firms with a taxcode10 entry in 2007. Essentially 'taxcode10' is the 10-digit taxcode for the firm and 'taxcode' is the 9 digit taxcode which I'm trying to update.

    How can I implement something like this?

    Thanks,
    Jad
    Last edited by Jad Tamimi; 24 Apr 2024, 12:57.

  • #2
    Code:
    assert year == 2007 if !missing(taxcode10)
    by taxcode (taxcode10), sort: replace taxcode10 = taxcode10[_N]
    The code assumes, and verifies, that non-missing values of taxcode10 are found only in observations with year == 2007. If that is not the case, the code wil break at the first command. Post back if that happens, and the code can be modified to only extend the values of a non-missing taxcode10 to other years if the non-missing one is found in 2007.

    As an aside, I notice that there are many instances in the example data of the same firm (as identified by taxcode) has multiple observations in the same year. Are there other variables in the data set that distinguish these observation (perhaps divisions of the firm, or different months of the year, etc.)? Or is there some other good reason why these duplicates should exist? If not, this might mean that the data management that create this data set contains errors, and should be reviewed with an eye to finding and fixing those and any other errors found.

    Comment


    • #3
      Non-missing values of 'taxcode10' do indeed only occur in 2007. And yes there are some instances where the variable 'taxcode' does not function well as a unique identifier for the firm, the actual dataset contains more variables and I can use combinations of those variables to come up with a better identifier. Thanks alot Clyde!

      Comment

      Working...
      X