Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variable for subset of data, based on a different subset of data

    Dear Statalisters,

    I have a question regarding the creation of a new variable for a subset of data, which new variable is itself based on another subset of data. Let me clarify with an example of my dataset, visualised using -dataex-:

    My dataset conceptually looks as follows:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(SIC YEAR CompID) byte MedianNetSales
    345 1999 10134 20
    345 1999 10135 20
    345 1999 10136 20
    345 1999 10137 20
    345 1999 10138 20
    345 2000  4123 10
    345 2000  4124 10
    345 2000  4125 10
    345 2000  4126 10
    345 2000  4127 10
    345 2001  4128  5
    345 2001  4129  5
    345 2001  4130  5
    345 2001  4131  5
    345 2001  4132  5
    678 2001 10135 50
    678 2001 10136 50
    678 2001 10137 50
    678 2001 10138 50
    678 2001 10139 50
    678 2002  4133 40
    678 2002  4134 40
    678 2002  4135 40
    678 2002  4136 40
    678 2002  4137 40
    678 2003  4138 60
    678 2003  4139 60
    678 2003  4140 60
    678 2003  4141 60
    678 2003  4142 60
    end
    This data describes a financial variable (MedianNetSales) for a number of companies (CompID) belonging to a specific industry (SIC) over time (YEAR). My goal is to create a new variable, "PreviousMedianNetSales", which for each group/subset defined by industry and year (SIC and YEAR) displays the MedianNetSales for that same industry, but in the previous year (SIC and YEAR - 1). For good measure, the result would look as follows:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(SIC YEAR CompID) byte(MedianNetSales PreviousMedianNetSales)
    345 1999 10134 20  .
    345 1999 10135 20  .
    345 1999 10136 20  .
    345 1999 10137 20  .
    345 1999 10138 20  .
    345 2000  4123 10 20
    345 2000  4124 10 20
    345 2000  4125 10 20
    345 2000  4126 10 20
    345 2000  4127 10 20
    345 2001  4128  5 10
    345 2001  4129  5 10
    345 2001  4130  5 10
    345 2001  4131  5 10
    345 2001  4132  5 10
    678 2001 10135 50  .
    678 2001 10136 50  .
    678 2001 10137 50  .
    678 2001 10138 50  .
    678 2001 10139 50  .
    678 2002  4133 40 50
    678 2002  4134 40 50
    678 2002  4135 40 50
    678 2002  4136 40 50
    678 2002  4137 40 50
    678 2003  4138 60 40
    678 2003  4139 60 40
    678 2003  4140 60 40
    678 2003  4141 60 40
    678 2003  4142 60 40
    end
    To provide some context: my aim is to calculate industry-wide sales growth. Should my intended approach be inefficient to that end, I would naturally appreciate any alternative suggestions.

    Is there anyone on the forum that could advice me on how to achieve this result? It would be much appreciated!

    Best,

    David


  • #2
    Dear Statalisters,

    In the meantime I have found what I think is a possible solution to my problem:

    by running the following command: - gen Previous = MedianNetSales[_n-1] if YEAR[_n] != YEAR[_n-1] & SIC[_n] == SIC[_n-1] -, I seem to obtain the desired results, albeit that this only yields values for the first observations per group/subset YEAR SIC (which, for my purposes, is not an issue).

    Am I correct in assuming that as long as my data is sorted on YEAR SIC, this approach works properly?

    Best,

    David
    Last edited by David van Hoogstraten; 11 Jun 2022, 14:56.

    Comment


    • #3
      Yes, that will work. A somewhat more readable code that does this, and also populates every observation for the subset, is:
      Code:
      rangestat (first) previous = MedianNetSales, by(SIC) interval(YEAR -1 -1)
      -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

      By the way, both your code and mine require that the variable MedianNetSales be constant across all companies within any combination of SIC and YEAR.

      Comment


      • #4
        Dear Clyde,

        Thank you kindly for your swift response and useful suggestion.

        Comment

        Working...
        X