Programming a loop for non-time series dataset

Aicha SA

Join Date: Nov 2019

Posts: 13
#1

Programming a loop for non-time series dataset

06 Nov 2019, 03:52

Dear all,

I turn to you for advice once again, as I am working for the first time on a dataset that is not in a time series (i.e. year and id are unique identifyers). My dataset looks a bit like the following :

id yearClosed tradeRecievables
113 2015 946440
113 2016 1037787
113 2014 6237883
113 2012 10739971
113 2013 8890283

Id = a corporation and there are thousands per sector, etc. What I am trying to do is generate a new variable that is the average of tradeRecievables[_n] - tradeRecievables[_n-1], per id. The different closing years are not equal accross the companies (some start at 2007 for example), so I need to tell Stata to treat the command by the set of yearClosed per id for all the ids available.

Is there a way to do that?

Thank you so much in advance!
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

06 Nov 2019, 04:05

I wonder whether you wish something like:

Code:

by id, sort : egen float mymean = mean(tradeRecievables) tsset id yearClosed gen mylag1 = L.tradeRecievables gen myvar = mymean-mylag1

Best regards,

Marcos
Comment
Aicha SA

Join Date: Nov 2019

Posts: 13
#3

06 Nov 2019, 05:43

Dear Marcos,

Many thanks for the code. However, I'm still getting a tsset error "repeated time values within panel"... Any ideas for a workaround?

To illustrate further, below is another extraction from the dataset with two ids:

id yearClosed tradeRecievables
113 2012 10739971
113 2013 8890283
113 2014 6237883
113 2015 946440
113 2016 1037787
2232 2011 701725
2232 2012 575936
2232 2013 609605
2232 2015 476170
2232 2016 573786
2232 2017 599249
2232 2018 584737

Thanks again and best,
Aicha

Last edited by Aicha SA; 06 Nov 2019, 05:54. Reason: adding more info and fixing name autocorrect
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#4

06 Nov 2019, 06:08

Given your statement that id and year are unique identifiers, then

Code:

tsset id year

should work. If not, then they aren't unique identifiers.
Comment
Aicha SA

Join Date: Nov 2019

Posts: 13
#5

06 Nov 2019, 06:11

Hello Nick, I'm very sorry I meant that they are not unique identifiers. I will fix my original message!
Comment
Aicha SA

Join Date: Nov 2019

Posts: 13
#6

06 Nov 2019, 06:14

Originally posted by Aicha SA View Post

Dear all,

I turn to you for advice once again, as I am working for the first time on a dataset that is not in a time series (i.e. year and id are NOT unique identifyers). My dataset looks a bit like the following :

id yearClosed tradeRecievables
113 2015 946440
113 2016 1037787
113 2014 6237883
113 2012 10739971
113 2013 8890283

Id = a corporation and there are thousands per sector, etc. What I am trying to do is generate a new variable that is the average of tradeRecievables[_n] - tradeRecievables[_n-1], per id. The different closing years are not equal accross the companies (some start at 2007 for example), so I need to tell Stata to treat the command by the set of yearClosed per id for all the ids available.

Is there a way to do that?

Thank you so much in advance!

Just to clarify, I made a mistake in my original post. id and year are NOT unique identifyers.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#7

06 Nov 2019, 06:19

If you have repeated years for the same identifier, then there isn't a guaranteed unique interpretation to the previous observation within each panel. Wasn't this point made in a previous thread?

There is no scope for a work-around unless you explain exactly what you want to happen.
Comment
Aicha SA

Join Date: Nov 2019

Posts: 13
#8

06 Nov 2019, 06:29

I did post before about the same dataset, yes. This is the first time I'm working with a multidimensional dataset.

Basically, each company (id) has different closing dates (dates of closing of accounts), which of course repeat accross different ids. I want to be able to make calculations to:

1) get the average trade recievables per company (so [_n] - [_n-1]), which is difficult since the nature of the data does not allow for defining a time variable;

and 2) calculate that a certain ratio (ratio3==1) holds for three years per id (previous post). I also need the ratio of companies for which ratio3 does not hold for 3 years.

I hope this clarifies the ask. I just need a way to deal with this kind of data. Not sure if loops would be the solution (if so an example would be greatly appreciated) but any ideas are desperately welcome!

Last edited by Aicha SA; 06 Nov 2019, 06:32.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#9

06 Nov 2019, 07:10

Sorry, but I can't add more easily and helpfully.

Different panel lengths aren't the issue at all. It's the duplicates by identifier and year.

Also it's hard to make sense of the comment that "the nature of the data does not allow for defining a time variable" when year looks to me to be precisely that.

I think you're just restating the problem and asking for solutions, whereas the whole point is that you as researcher have the responsibility here. (Or if you're the assistant or student what does your supervisor advise?).

What you could do is

Code:

collapse ..., by(identifier year)

and then the previous observation is uniquely defined (or not in the dataset) but I have no way of knowing whether that is a good idea for your project or what kind of collapse makes sense for your project.
Comment
Aicha SA

Join Date: Nov 2019

Posts: 13
#10

06 Nov 2019, 07:36

Thanks, Nick.

Unfortunately both aspects are important for our analysis as we want to see the evolution of individual companies overtime as well so collapsing will cost potentially interesting data. By the comment, I meant that I cannot define a unique time variable as every id yearClosed can be regarded as a panel data on its own, so when put together the year values are bound to repeat. I also cannot process companies individually as the dataset is too huge.

Thanks again for your responses.
Comment

Announcement

Programming a loop for non-time series dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment