Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Programming a loop for non-time series dataset

    Dear all,

    I turn to you for advice once again, as I am working for the first time on a dataset that is not in a time series (i.e. year and id are unique identifyers). My dataset looks a bit like the following :

    id yearClosed tradeRecievables
    113 2015 946440
    113 2016 1037787
    113 2014 6237883
    113 2012 10739971
    113 2013 8890283

    Id = a corporation and there are thousands per sector, etc. What I am trying to do is generate a new variable that is the average of tradeRecievables[_n] - tradeRecievables[_n-1], per id. The different closing years are not equal accross the companies (some start at 2007 for example), so I need to tell Stata to treat the command by the set of yearClosed per id for all the ids available.

    Is there a way to do that?

    Thank you so much in advance!


  • #2
    I wonder whether you wish something like:

    Code:
    by id, sort : egen float mymean = mean(tradeRecievables)
    tsset id yearClosed
    gen mylag1 = L.tradeRecievables
    gen myvar = mymean-mylag1
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos,

      Many thanks for the code. However, I'm still getting a tsset error "repeated time values within panel"... Any ideas for a workaround?

      To illustrate further, below is another extraction from the dataset with two ids:

      id yearClosed tradeRecievables
      113 2012 10739971
      113 2013 8890283
      113 2014 6237883
      113 2015 946440
      113 2016 1037787
      2232 2011 701725
      2232 2012 575936
      2232 2013 609605
      2232 2015 476170
      2232 2016 573786
      2232 2017 599249
      2232 2018 584737

      Thanks again and best,
      Aicha
      Last edited by Aicha SA; 06 Nov 2019, 05:54. Reason: adding more info and fixing name autocorrect

      Comment


      • #4
        Given your statement that id and year are unique identifiers, then


        Code:
        tsset id year
        should work. If not, then they aren't unique identifiers.

        Comment


        • #5
          Hello Nick, I'm very sorry I meant that they are not unique identifiers. I will fix my original message!

          Comment


          • #6
            Originally posted by Aicha SA View Post
            Dear all,

            I turn to you for advice once again, as I am working for the first time on a dataset that is not in a time series (i.e. year and id are NOT unique identifyers). My dataset looks a bit like the following :

            id yearClosed tradeRecievables
            113 2015 946440
            113 2016 1037787
            113 2014 6237883
            113 2012 10739971
            113 2013 8890283

            Id = a corporation and there are thousands per sector, etc. What I am trying to do is generate a new variable that is the average of tradeRecievables[_n] - tradeRecievables[_n-1], per id. The different closing years are not equal accross the companies (some start at 2007 for example), so I need to tell Stata to treat the command by the set of yearClosed per id for all the ids available.

            Is there a way to do that?

            Thank you so much in advance!
            Just to clarify, I made a mistake in my original post. id and year are NOT unique identifyers.

            Comment


            • #7
              If you have repeated years for the same identifier, then there isn't a guaranteed unique interpretation to the previous observation within each panel. Wasn't this point made in a previous thread?

              There is no scope for a work-around unless you explain exactly what you want to happen.

              Comment


              • #8
                I did post before about the same dataset, yes. This is the first time I'm working with a multidimensional dataset.

                Basically, each company (id) has different closing dates (dates of closing of accounts), which of course repeat accross different ids. I want to be able to make calculations to:

                1) get the average trade recievables per company (so [_n] - [_n-1]), which is difficult since the nature of the data does not allow for defining a time variable;

                and 2) calculate that a certain ratio (ratio3==1) holds for three years per id (previous post). I also need the ratio of companies for which ratio3 does not hold for 3 years.

                I hope this clarifies the ask. I just need a way to deal with this kind of data. Not sure if loops would be the solution (if so an example would be greatly appreciated) but any ideas are desperately welcome!
                Last edited by Aicha SA; 06 Nov 2019, 06:32.

                Comment


                • #9
                  Sorry, but I can't add more easily and helpfully.

                  Different panel lengths aren't the issue at all. It's the duplicates by identifier and year.

                  Also it's hard to make sense of the comment that "the nature of the data does not allow for defining a time variable" when year looks to me to be precisely that.

                  I think you're just restating the problem and asking for solutions, whereas the whole point is that you as researcher have the responsibility here. (Or if you're the assistant or student what does your supervisor advise?).

                  What you could do is
                  Code:
                  collapse ..., by(identifier year)
                  and then the previous observation is uniquely defined (or not in the dataset) but I have no way of knowing whether that is a good idea for your project or what kind of collapse makes sense for your project.

                  Comment


                  • #10
                    Thanks, Nick.

                    Unfortunately both aspects are important for our analysis as we want to see the evolution of individual companies overtime as well so collapsing will cost potentially interesting data. By the comment, I meant that I cannot define a unique time variable as every id yearClosed can be regarded as a panel data on its own, so when put together the year values are bound to repeat. I also cannot process companies individually as the dataset is too huge.

                    Thanks again for your responses.

                    Comment

                    Working...
                    X