Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping other variables with collapse

    I'm using StataSE13. I'm using the collapse command when I need to keep other variables. For example, I have GDP, GDP growth, inflation, trade openness and so on. I'm using collapse to generate mean GDP growth and first-year Gini coefficient by country and period. But other variables need to keep with collapsed variables. I mean inflation and trade openness there are disappeared after the collapse. So, how to keep the variables with collapse?

  • #2
    Welcome to Statalist.

    One approach that comes to mind is using the egen command rather than collapse to generate the variables you need within the existing dataset.

    Another approach is using the merge command to add the variables you generate with the collapse back to the original dataset.

    Without an understanding of your data, it is difficult to give more concrete advice. Perhaps these ideas will aim you in a useful direction.

    If not, you might review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, perhaps with just a few variables and observations, showing the data before the process and how you expect it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

    Comment


    • #3
      Thank you for your advice. I will try to that approach.

      Comment


      • #4
        Hello,

        I know this is a little late, but I found that if you want to retain some of the variables then you can just include them inside the by() option in collapse. For instance,

        Suppose you have a data set of cities located within countries and attributes for each city, such as

        y1 = city name
        y2 = country name
        x1 & x2 = variables you want to aggregate

        Code:
        collapse (sum) x1 x2, by(y2 y1)
        This seemed to have worked for me. I was able to keep the y2 variable and aggregate to the just the y1 city level. I think collapse is looking for the non-duplicated variables in the by() option.

        Comment


        • #5
          Well, be careful. That works only if y2 is constant within all observations having a given value of y1. But if there is some y1 that is associated with two different values of y2, then -collapse_ will aggregate x1 and x2 only to the level of the y2 y1 pair, not all the way up to the y1 level. That is, you would end up with two different observations for the same y1, each corresponding to a different value of y2, and each containing only the sums of x1 and x2 corresponding to the y1 y2 pair, not the sums of x1 and x2 corresponding to all values of y1.

          You can verify that y2 is, in fact, always constant within y1 by running:

          Code:
          by y1 (y2), sort: assert y2[1] == y2[_N]
          Put that line into your code just before the -collapse- command. If the y2's are constant within y1's, then Stata will move along and do the -collapse-. But if there are exceptions, Stata will tell you that the "assertion is false" and give you a count of the number of exceptions, and then halt without doing the potentially erroneous collapse. (Then, of course, you have to find the exceptions and figure out what to do about them.)

          Comment

          Working...
          X