"rangestat" with "excludeself" and "interval(t -1 0)"

Marco Tacchi

Join Date: Feb 2019
Posts: 24

"rangestat" with "excludeself" and "interval(t -1 0)"

06 Mar 2019, 07:01

Dear Statalisters,

I have a question about the results after the rangestat command.

Suppose I input the following dataset:

Code:

input firmid sales year industry
1 10 1999 11
2 5 1999 11
5 19 1999 11
3 7 1999 22
4 2 1999 22
6 4 1999 22
1 8 2000 11
2 6 2000 11
5 12 2000 11
3 9 2000 22
4 3 2000 22
6 3 2000 22
end

I would like to compute average sales by a given industry and a given year excluding the focal firm. In addition, I would like to compute the moving average of this number, say over the last 2 years.

Code:

rangestat (mean) sales_av=sales, by(industry) excludeself interval(year 0 0)
rangestat (mean) sales_ma=sales, by(industry) excludeself interval(year -1 0)
list
     +------------------------------------------------------+
     | firm   sales   year   industry   sales_av   sales_ma |
     |------------------------------------------------------|
  1. |    1      10   1999         11         12         12 |
  2. |    2       5   1999         11       14.5       14.5 |
  3. |    5      19   1999         11        7.5        7.5 |
  4. |    3       7   1999         22          3          3 |
  5. |    4       2   1999         22        5.5        5.5 |
     |------------------------------------------------------|
  6. |    6       4   1999         22        4.5        4.5 |
  7. |    1       8   2000         11          9       10.4 |
  8. |    2       6   2000         11         10       10.8 |
  9. |    5      12   2000         11          7        9.6 |
 10. |    3       9   2000         22          3        3.8 |
     |------------------------------------------------------|
 11. |    4       3   2000         22          6          5 |
 12. |    6       3   2000         22          6          5 |
     +------------------------------------------------------+

It is clear how one gets average sales (sales_av) of 10 for firm 2 in 2000, that is (8+12)/2 where the value of 6 for the focal firm is excluded from the calculation. The moving average sales (sales_ma) for firm 2 in year 2000 is 10.8. Shouldn't it be (10+14.5)/2=12.25 instead of 10.8?

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35726
#2

06 Mar 2019, 07:24

Nothing in your syntax specifies operating by firm. excludeself necessarily means this industry, not this firm.

I am not able to play with your data right now.
Comment
Marco Tacchi

Join Date: Feb 2019

Posts: 24
#3

06 Mar 2019, 07:36

Dear Nick,

Thank you for your reply.

I think it is possible to solve the problem by first using

Code:

rangestat (mean) sales_av=sales, by(industry) excludeself interval(year 0 0)

and then

Code:

xtset firmid year gen sales_av_ma2=(sales_av+l.sales_av)/2
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35726

06 Mar 2019, 09:18

#2 is partly wrong. Sorry about that. Feel free to enjoy the example of a second author who doesn't understand everything about their own program. excludeself means excluding the present observation. And this is documented! (Warning: Don't answer by phone away from your Stata.)

Code:

. rangestat (mean) sales_av=sales, by(industry) excludeself interval(year 0 0)

. rangestat (mean) sales_ma=sales, by(industry) excludeself interval(year -1 0)

.
. list, sepby(year industry)

     +--------------------------------------------------------+
     | firmid   sales   year   industry   sales_av   sales_ma |
     |--------------------------------------------------------|
  1. |      1      10   1999         11         12         12 |
  2. |      2       5   1999         11       14.5       14.5 |
  3. |      5      19   1999         11        7.5        7.5 |
     |--------------------------------------------------------|
  4. |      3       7   1999         22          3          3 |
  5. |      4       2   1999         22        5.5        5.5 |
  6. |      6       4   1999         22        4.5        4.5 |
     |--------------------------------------------------------|
  7. |      1       8   2000         11          9       10.4 |
  8. |      2       6   2000         11         10       10.8 |
  9. |      5      12   2000         11          7        9.6 |
     |--------------------------------------------------------|
 10. |      3       9   2000         22          3        3.8 |
 11. |      4       3   2000         22          6          5 |
 12. |      6       3   2000         22          6          5 |
     +--------------------------------------------------------+


. di (10 + 5 + 19 + 8 + 12) / 5
10.8

As means are easy, what you need to proceed are

totals by industry

totals by firm

counts by industry

counts by firm

mean for other firms in same industry = (total for industry MINUS total for this firm) / (count for industry MINUS count for this firm)

and that is a series of egen exercises.

Last edited by Nick Cox; 06 Mar 2019, 09:24.

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

06 Mar 2019, 11:00

Assuming that there is only one observation per firm in a given industry year by-group, the first rangestat call will return the expected results (mean excluding the focal firm). As Nicks points out, the excludeself option excludes the value of the current observation so rangestat is perhaps not the best tool to calculate the second measure. You can however use its cousin rangerun (from SSC) to get there.

To make it easier to visualize, I sorted the data by industry year firmid. For each observation in the sample, rangerun creates a dataset of observations that fall within the specified by-group and interval. It then runs a user-defined program on this subsample and extracts results from last observation of all new numeric variables. For convenience, rangerun can create scalars with the value of all numeric variables for the current focal observation (using the prefix() option). So for the second measure, this means dropping all observations from the focal firm and then calculating the mean. I've included extra commands to show what's going on. In a full production run, you would probably want to use sum sales, meanonly as that is much faster (and comment the display and list commands from the program).

Code:

clear all
input float(firmid sales year industry)
1 10 1999 11
2  5 1999 11
5 19 1999 11
1  8 2000 11
2  6 2000 11
5 12 2000 11
3  7 1999 22
4  2 1999 22
6  4 1999 22
3  9 2000 22
4  3 2000 22
6  3 2000 22
end

* assuming that there's only one firm observation per -industry year-
isid industry year firmid
rangestat (mean) sales_av=sales, by(industry) excludeself interval(year 0 0)

program mean_exclude

    dis "doing firm " rr_firmid " industry = " rr_industry " year = " rr_year
    drop if firmid == rr_firmid
    list, sep(0)
    sum sales
    gen sales_ma = r(mean)

end
rangerun mean_exclude, by(industry) interval(year -1 0) verbose sprefix(rr_)

list, sepby(year)

and the results from the final list:

Code:

. list, sepby(year)

     +--------------------------------------------------------+
     | firmid   sales   year   industry   sales_av   sales_ma |
     |--------------------------------------------------------|
  1. |      1      10   1999         11         12         12 |
  2. |      2       5   1999         11       14.5       14.5 |
  3. |      5      19   1999         11        7.5        7.5 |
     |--------------------------------------------------------|
  4. |      1       8   2000         11          9       10.5 |
  5. |      2       6   2000         11         10      12.25 |
  6. |      5      12   2000         11          7       7.25 |
     |--------------------------------------------------------|
  7. |      3       7   1999         22          3          3 |
  8. |      4       2   1999         22        5.5        5.5 |
  9. |      6       4   1999         22        4.5        4.5 |
     |--------------------------------------------------------|
 10. |      3       9   2000         22          3          3 |
 11. |      4       3   2000         22          6       5.75 |
 12. |      6       3   2000         22          6       5.25 |
     +--------------------------------------------------------+

.

Comment

Attaullah Shah

Join Date: Aug 2014
Posts: 1669

06 Mar 2019, 12:37

You can also use asrol (from SSC) that offers two solutions for excluding focal firms. The first one is to exclude only the current observation while the second one is to exclude all observation of the relevant variable if there are similar (duplicate) values of the rangevar elsewhere in the given window. An example will better explain the distinction between the two options. Consider the following data of 5 observations, where X is the variable of interest for which we would like to calculate arithmetic mean and year is the rangevar. Our calculations do not use any rolling window, therefore the option window is dropped.

Code:

    Example A:

         asrol X, stat(mean) xf(focal) gen(xfocal)

    Example B:

        asrol X, stat(mean) xf(year) gen(xfyear)
        
          +---------------------------------------+
          | year     X        xfocal       xfyear |
          |---------------------------------------|
          | 2001   100          350           350 |
          | 2002   200          325           325 |
          | 2003   300          300     266.66667 |
          | 2003   400          275     266.66667 |
          | 2004   500          250           250 |
          +---------------------------------------+

Explanation :

In Example A, we invoke the option xf() as xf(focal). asrol generates a new variable xfocal that contains the mean values of the rest of the observations in the given window, excluding the focal observation. Therefore, in the year 2001, xfocal variable has a value of 350, that is the average of the values of X in the years 2002, 2003, 2003, 2004 i.e. (200+300+400+500)/4 = 350. Similarly, the second observation of the xfocal variable is 325, that is (100+300+400+500)/4 = 325. Similar calculations are made when required statistics are estimated in a rolling window.

Example B differs from Example A in definition of the focal observation(s). In Example B, we invoke the option xf() as xf(year), where year is an existing numeric variable. With this option, the focal observation(s) is(are) defined as the current observation and other observations where the focal observation of the rangevar has duplicates. Our data set has two duplicate values in the rangevar, i.e., year 2003. Therefore, the mean values are calculated as shown bellow:

Code:

        +-------------------------------------------------------+       
        |       obs 1: (200 + 300 + 400 + 500)/4 = 350          |
        |       obs 2: (100 + 300 + 400 + 500)/4 = 325          |                       
        |       obs 3: (100 + 200 + 500 )     /3 = 266.66667    |               
        |       obs 4: (100 + 200 + 500 )     /3 = 266.66667    |       
        |       obs 5: (100 + 200 + 300 + 400)/4 = 250          |                       
        +-------------------------------------------------------+

Solving the problem at hand

Code:

  input firmid sales year industry
1 10 1999 11
2 5 1999 11
5 19 1999 11
3 7 1999 22
4 2 1999 22
6 4 1999 22
1 8 2000 11
2 6 2000 11
5 12 2000 11
3 9 2000 22
4 3 2000 22
6 3 2000 22
end
bys industry year: asrol sales, st(mean) xf(focal)

bys firmid: asrol mean_sales, st(mean) window(year 2) gen(mean2_sales)

list firmid sales year industry mean_sales mean2_sales

     +--------------------------------------------------------+
     | firmid   sales   year   industry   mean_s~s   mean2_~s |
     |--------------------------------------------------------|
  1. |      1      10   1999         11         12         12 |
  2. |      1       8   2000         11          9       10.5 |
  3. |      2       5   1999         11       14.5       14.5 |
  4. |      2       6   2000         11         10      12.25 |
  5. |      3       7   1999         22          3          3 |
     |--------------------------------------------------------|
  6. |      3       9   2000         22          3          3 |
  7. |      4       2   1999         22        5.5        5.5 |
  8. |      4       3   2000         22          6       5.75 |
  9. |      5      19   1999         11        7.5        7.5 |
 10. |      5      12   2000         11          7       7.25 |
     |--------------------------------------------------------|
 11. |      6       4   1999         22        4.5        4.5 |
 12. |      6       3   2000         22          6       5.25 |
     +--------------------------------------------------------+

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

Comment

Marco Tacchi

Join Date: Feb 2019

Posts: 24
#7

06 Mar 2019, 13:42

Dear all,

Thank you for your detailed answers. They helped me to solve the problem.
Comment
Monowar Maroon

Join Date: Apr 2021

Posts: 10
#8

08 Sep 2021, 18:04

Dear All,
I hope you are staying well and keeping safe. I am new to variable creating codes and would be very grateful if you could advise on the following issue that I have been struggling with for the last few days.
I have a panel dataset and need to calculate an innovation score for each firm within the same industry using the following formula:

Innovation score = (number of firms with a worse value + (number of firms with the same value as the current firm's value/2))/ total number of firms. For example, let say there 10 firms in one industry and the focal firm's value is 80. There is another firm that has the same value. All other firms have scores lower than 80. So the score will be = (8 + (2/2))/10 = 0.90.

I need to calculate for all firms within other industries.

Many thanks in advance and best wishes.
Comment

Announcement

"rangestat" with "excludeself" and "interval(t -1 0)"

Comment

Comment

Comment

Comment

Comment

Comment

Comment