Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transferring data within a .dta file to other rows

    Hello everyone,

    I currently have a dataset in which I have the total assets for several firms over the years 1998 to 2015. These firms have committed fraudulent financial reporting within this period. I want to create a new row (variable) that represents the total assets of the years they have committed fraud. The idea is that even though firms have committed the fraud in differing years, the first year of the fraudulent activity corresponds with the first year of fraudulent activity of another firm. I want to analyse the effect of fraudulent activities on the readability of the annual reports, with size (Total assets) as one of the control variables. This control variable needs to correspond with the year of the fraudulent activity


    to illustrate

    this is how the data set may look like

    TA 1998 | TA 1999 | TA 2000 | TA 2001 | TA 2002 | TA 2003 | .... TA 2015 | First year of fraud | Last year of fraud | TA in year 1 of fraud | TA in year 2 of fraud | TA in year 3 of fraud

    100 | 120 | 130 | 140 | 150 | 160 | 200 | 1998 | 1999 |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2000 |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 1999 | 2001 |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 2001 | 2003 |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2001 |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 1999 | 2003 |

    and this is the result I would like to see

    TA 1998 | TA 1999 | TA 2000 | TA 2001 | TA 2002 | TA 2003 | .... TA 2015 | First year of fraud | Last year of fraud | TA in year 1 of fraud | TA in year 2 of fraud | TA in year 3 of fraud

    100 | 120 | 130 | 140 | 150 | 160 | 200 | 1998 | 1999 | 100 | 120 | . |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2000 | 130 | . | . |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 1999 | 2001 | 120 | 130 | 140 |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 2001 | 2003 | 140 | 150 | 160 |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2001 | 130 | 140 | . |
    100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2002 | 130 | 140 | 150 |

    I hope I have been clear in my explanation, and am curious to see if there is a solution
    Last edited by Bram van Zon; 25 May 2023, 08:59.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float firm_id byte ta1998 int(ta1999 ta2000 ta2001 ta2002 ta2003 ta2004 first_fraud_year last_fraud_year)
    1 100 120 130 140 150 160 200 1998 1999
    2 100 120 130 140 150 160 200 2000 2000
    3 100 120 130 140 150 160 200 1999 2001
    4 100 120 130 140 150 160 200 2001 2003
    5 100 120 130 140 150 160 200 2000 2001
    6 100 120 130 140 150 160 200 1999 2003
    end
    
    reshape long ta, i(firm_id) j(year)
    
    forvalues i = 1/3 {
        by firm_id, sort: egen ta_fraud_year_`i' = ///
            max(cond(year-first_fraud_year == `i'-1, ta, .))
    }
    
    reshape wide // PROBABLY YOU SHOULD SKIP THIS COMMAND
    This code leaves you with a wide data set, just as you started out. But most of Stata's data management and analysis commands work better, or only, with a long data layout. So you should probably skip the final -reshape wide- command. But if you really are going to be doing something where the wide layout works better, then leave the code as I show it.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      thank you for your response, however the problem does not seem to be solved. First off I get the message:
      "that variable year contains all missing values" I was wondering it would have to do something with the way my variable is named. The full name of total assets in my datasets is TotalAssetsFY1998USD and this for each year. Next up I was wondering where the 1/3 comes from in the for statement. Hope you can help!

      Comment


      • #4
        It had to do with the fact that USD was behind the year so that problem has been solved. However I am still not were I want to be as currently when running the for statement it gives the error unknown even function () r(133)

        Comment


        • #5
          looks to me, but I can't be sure since you did not show us "exactly" what you typed (please read the FAQ), like a typo - but to be sure, you need to copy-and-paste, within CODE blocks please, exactly what you typed and exactly what, and where in the output, Stata gave back the error message

          Comment


          • #6
            . reshape long TotalAssetsFY, i(id) j(year)
            (j = 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
            > 2015)

            Data Wide -> Long
            -----------------------------------------------------------------------------
            Number of observations 67 -> 1,206
            Number of variables 54 -> 38
            j variable (18 values) -> year
            xij variables:
            TotalAssetsFY1998 TotalAssetsFY1999 ... TotalAssetsFY2015->TotalAssetsFY
            -----------------------------------------------------------------------------

            . forvalues i = 1/3 {
            2. by id, sort: egen TotalAssets_fraud_year_`i' =
            3. max(cond(year-Firstyearoffraud == `i'-1, TotalAssets, .))
            4. }
            unknown egen function ()
            r(133);

            Comment


            • #7
              What the loop machinery is numbering as commands 2 and 3 should all be part of the same command line. That is, Stata is seeing

              Code:
              by id, sort: egen TotalAssets_fraud_year_`i' =
              and can see no egen function following, so it throws out you out. That's because what is needed has incorrectly been split on to the next line.

              In #2 Clyde was using /// to connect the lines. That's not allowed interactively, but you can't get the same result by omitting that syntax. You can get the same result by combining lines or by running code from the Do-file editor.

              Comment

              Working...
              X