Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating dummy variable based on lagged quartile data

    Hi everyone,

    I am trying to generate a dummy variable that is conditional on whether the variable of interest (ma_score) is in the top quartile in both years t-2 and t-1. Specifically, the paper's instructions are as follows, "To identify high-ability managers, we first form quartiles (by industry and year) of the MA-Score. We define High-Ability Managers as those in the top quartile of MA-Score in both years t-2 and t-1. This approach reduces the likelihood that idiosyncratic performance in a single year affects our identification of high-ability managers. Note that we do not expect managerial ability to change in the short run. Rather, we consider the scores across 2 years to reduce possible measurement error".

    Can anyone help me with these instructions? I don't understand how to make the variable based on quartiles (by industry and year) as well.

    Sample data:
    input double year long gvkey float sic_2 double MA_SCORE_2018_w
    1984 1001 58 .1674201
    1985 1001 58 .0530939
    1983 1003 57 .048832
    1984 1003 57 .0081078
    1986 1003 57 .0695462
    1987 1003 57 .1106393
    1988 1003 57 .0730525
    1989 1003 57 .0283304
    1980 1004 50 -.0183764
    1981 1004 50 -.0333748
    1982 1004 50 -.0341477
    1983 1004 50 -.0444578
    1984 1004 50 -.0505183
    1985 1004 50 -.0110314
    1986 1004 50 -.0288378
    1987 1004 50 -.0385843
    1988 1004 50 -.0431447
    1989 1004 50 -.0293015
    1990 1004 50 -.0577608
    1991 1004 50 -.0493341
    1992 1004 50 -.0543336
    1993 1004 50 -.0513123
    1994 1004 50 -.0827768
    1995 1004 50 -.0793594
    1996 1004 50 -.0692754
    1997 1004 50 .0072931
    1998 1004 50 -.0387679
    1999 1004 50 -.0730754
    2000 1004 50 -.0696314
    2001 1004 50 -.020128
    2002 1004 50 -.0847194
    2003 1004 50 -.0698555
    2004 1004 50 -.0677616
    2005 1004 50 -.0726101
    2006 1004 50 -.0773012
    2007 1004 50 -.0549248
    2008 1004 50 -.089513
    2009 1004 50 -.0799284
    2010 1004 50 -.0529496
    2011 1004 50 -.0370836
    end

  • #2
    For any given industry year combination, your example data contains either no observations or just one, so there are no quartiles to be formed by industry and year. I'll assume that your actual data set is much larger and does not suffer this limitation.

    Also, your example data contains no variable identifying managers. For the purposes of illustrating the code here, I will pretend that gvkey identifies managers. But you will need to replace gvkey by the actual manager identifier variable to make this work for you.

    Code:
    by sic_2 year, sort: egen ma_quartile = xtile(MA_SCORE_2018_w), nq(4) // CREATE QUARTILES
    
    xtset gvkey year // REPLACE gvkey WITH MANAGER ID VARIABLE
    gen byte high_MA = inlist(L1.ma_quartile, 3, 4) & inlist(L2.ma_quartile, 3, 4)
    Notes:

    1. Official Stata does not have an -egen, xtile()- function. It is, rather, to be found in the -egenmore- package, available from SSC. It is easier to go this route than to develop loops around the official -xtile- command.

    2. I understand you are just following instructions here. But if the concern is dealing with measurement error, this approach seems rather poorly designed for it. A much better way to reduce measurement error would be to use the average of the two lagging MA_SCORE_2018_w values, rather than making a dichotomous variable that discards information and adds noise.

    Comment


    • #3
      Hi Clyde Schechter , yes the full dataset has adequate observations to form quartiles by industry and year. Also the gvkey specifies the firms, which in turns identifies the managers here. But I tried running the quartile creation code, but its sitting for ages and not showing any outputs. What could be the reason for that?

      Comment


      • #4
        But I tried running the quartile creation code, but its sitting for ages and not showing any outputs. What could be the reason for that?
        I have not encountered that difficulty with -egen, xtile()-. If your data set is very large, this can take a long time as it requires sorting. I don't know what you mean by "ages" and I don't know how many observations your data set has, so I can't assess whether there is really a problem or you just need more patience.

        Here's another way to calculate the quartiles which should be faster. It also has the advantage that it will provide periodic updates on its progress in the Results window, so you will be able to know if something has gone wrong. I should also add that even this faster way can be speeded up dramatically if you -drop- from your data set any variables that are not needed for further calculation.

        Code:
        capture program drop one_group
        program define one_group
            xtile ma_quartile = MA_SCORE_2018_w, nq(4)
            exit
        end
        
        runby one_group, by(gvkey year) status
        -runby- is written by Robert Picard and me, and is available from SSC.

        Note: If there are any combinations of gvkey and year that do not have enough observations to generate quartiles, these will be identified as "by-groups with errors" in the final output on the Results screen, and those observations also will not appear in the final data.

        Comment


        • #5
          Thanks Clyde Schechter. It was an issue with the size of the dataset, the code worked after around 10 mins of working. Thanks again.

          Comment

          Working...
          X