Transferring data within a .dta file to other rows

Bram van Zon

Join Date: May 2023

Posts: 4
#1

Transferring data within a .dta file to other rows

25 May 2023, 08:56

Hello everyone,

I currently have a dataset in which I have the total assets for several firms over the years 1998 to 2015. These firms have committed fraudulent financial reporting within this period. I want to create a new row (variable) that represents the total assets of the years they have committed fraud. The idea is that even though firms have committed the fraud in differing years, the first year of the fraudulent activity corresponds with the first year of fraudulent activity of another firm. I want to analyse the effect of fraudulent activities on the readability of the annual reports, with size (Total assets) as one of the control variables. This control variable needs to correspond with the year of the fraudulent activity

to illustrate

this is how the data set may look like

TA 1998 | TA 1999 | TA 2000 | TA 2001 | TA 2002 | TA 2003 | .... TA 2015 | First year of fraud | Last year of fraud | TA in year 1 of fraud | TA in year 2 of fraud | TA in year 3 of fraud

100 | 120 | 130 | 140 | 150 | 160 | 200 | 1998 | 1999 |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2000 |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 1999 | 2001 |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 2001 | 2003 |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2001 |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 1999 | 2003 |

and this is the result I would like to see

TA 1998 | TA 1999 | TA 2000 | TA 2001 | TA 2002 | TA 2003 | .... TA 2015 | First year of fraud | Last year of fraud | TA in year 1 of fraud | TA in year 2 of fraud | TA in year 3 of fraud

100 | 120 | 130 | 140 | 150 | 160 | 200 | 1998 | 1999 | 100 | 120 | . |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2000 | 130 | . | . |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 1999 | 2001 | 120 | 130 | 140 |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 2001 | 2003 | 140 | 150 | 160 |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2001 | 130 | 140 | . |
100 | 120 | 130 | 140 | 150 | 160 | 200 | 2000 | 2002 | 130 | 140 | 150 |

I hope I have been clear in my explanation, and am curious to see if there is a solution

Last edited by Bram van Zon; 25 May 2023, 08:59.
Tags: categorical, data, regression
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#2

25 May 2023, 10:26

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float firm_id byte ta1998 int(ta1999 ta2000 ta2001 ta2002 ta2003 ta2004 first_fraud_year last_fraud_year) 1 100 120 130 140 150 160 200 1998 1999 2 100 120 130 140 150 160 200 2000 2000 3 100 120 130 140 150 160 200 1999 2001 4 100 120 130 140 150 160 200 2001 2003 5 100 120 130 140 150 160 200 2000 2001 6 100 120 130 140 150 160 200 1999 2003 end reshape long ta, i(firm_id) j(year) forvalues i = 1/3 { by firm_id, sort: egen ta_fraud_year_`i' = /// max(cond(year-first_fraud_year == `i'-1, ta, .)) } reshape wide // PROBABLY YOU SHOULD SKIP THIS COMMAND

This code leaves you with a wide data set, just as you started out. But most of Stata's data management and analysis commands work better, or only, with a long data layout. So you should probably skip the final -reshape wide- command. But if you really are going to be doing something where the wide layout works better, then leave the code as I show it.

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Bram van Zon

Join Date: May 2023

Posts: 4
#3

30 May 2023, 08:42

thank you for your response, however the problem does not seem to be solved. First off I get the message:
"that variable year contains all missing values" I was wondering it would have to do something with the way my variable is named. The full name of total assets in my datasets is TotalAssetsFY1998USD and this for each year. Next up I was wondering where the 1/3 comes from in the for statement. Hope you can help!
Comment
Bram van Zon

Join Date: May 2023

Posts: 4
#4

30 May 2023, 09:46

It had to do with the fact that USD was behind the year so that problem has been solved. However I am still not were I want to be as currently when running the for statement it gives the error unknown even function () r(133)
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4493
#5

30 May 2023, 09:52

looks to me, but I can't be sure since you did not show us "exactly" what you typed (please read the FAQ), like a typo - but to be sure, you need to copy-and-paste, within CODE blocks please, exactly what you typed and exactly what, and where in the output, Stata gave back the error message
Comment
Bram van Zon

Join Date: May 2023

Posts: 4
#6

01 Jun 2023, 03:47

. reshape long TotalAssetsFY, i(id) j(year)
(j = 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
> 2015)

Data Wide -> Long
-----------------------------------------------------------------------------
Number of observations 67 -> 1,206
Number of variables 54 -> 38
j variable (18 values) -> year
xij variables:
TotalAssetsFY1998 TotalAssetsFY1999 ... TotalAssetsFY2015->TotalAssetsFY
-----------------------------------------------------------------------------

. forvalues i = 1/3 {
2. by id, sort: egen TotalAssets_fraud_year_`i' =
3. max(cond(year-Firstyearoffraud == `i'-1, TotalAssets, .))
4. }
unknown egen function ()
r(133);
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35781
#7

01 Jun 2023, 05:13

What the loop machinery is numbering as commands 2 and 3 should all be part of the same command line. That is, Stata is seeing

Code:

by id, sort: egen TotalAssets_fraud_year_`i' =

and can see no egen function following, so it throws out you out. That's because what is needed has incorrectly been split on to the next line.

In #2 Clyde was using /// to connect the lines. That's not allowed interactively, but you can't get the same result by omitting that syntax. You can get the same result by combining lines or by running code from the Do-file editor.
Comment

Announcement

Transferring data within a .dta file to other rows

Comment

Comment

Comment

Comment

Comment

Comment