Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Having trouble creating a specific sample

    Hello everyone,

    I am using panel data from Compustat and I need to create a sample of high-growth firms (keep only the gvkey of HGFs). The first definition of an HGF which I need to try is at least a 20% increase in sales for 3 consecutive years and a minimum of 10 employees in the base year. The whole sample period is 7 years (2013-2019) and the 3-year high-growth period can be anywhere within that. After xtset gvkey fyear I was given the following code by a professor of mine (I can't ask him for clarifications as he is currently away):
    1. gen sale1 = L.sale
    2. gen growth_sales = (sale - sale1) / sale1
    3. gen suffgrowth = 1 if growth_sales >= 0.2
    4. bysort gvkey: gen baseyear_emp_temp = emp if fyear == (whatever the base year is)
    5. bysort gvkey: gen baseyear_emp = max(baseyear_emp_temp)
    6. gen hgf = 1 if suffgrowth == 1 & L.suffgrowth == 1 & L2.suffgrowth == 1 & baseyear_emp >= 10
    7. gen gvkey_hgf = gvkey if hgf == 1
    8. keep gvkey_hgf
    9. drop if gvkey_hgf != .
    I am not sure how "whatever the base year is" needs to be defined since this should refer to the base year of the 3-year period and not the whole sample period. Even if I try 2013, I get an "invalid synthax" error on line 6.

    Also, I am interested to know whether there is any other way of creating a sample with these conditions.

    Any help would be appreciated!


  • #2
    Aleksandar:
    welcome to this forum.
    1) -emp- variable is missing;
    2) as you surmise, "whatever the base year is" should be made numerical.
    More details (please see the FAQ) about your dataset can help interested listers helping you out. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      Thank you for your reply!

      All variables that should go into this are present in the dataset at the outset:
      • gvkey - Global Company Key
      • sale - Sales/Turnover
      • fyear - Fiscal year
      • emp - Number of employees
      There was an empty line that seems to have been deleted, so the error I mentioned when using 2013 for the base year actually occurs in line 5. Again, I don't think it should be 2013, this is just something I tried.

      Have you got any ideas about how to complete the task?

      Many thanks,
      Aleksandar
      Last edited by Aleksandar Filimonov; 25 May 2022, 07:18.

      Comment


      • #4
        Aleksandar:
        try this (soboptimal, I know) line of code and see it it works:
        Code:
        bysort gvkey: gen baseyear_emp_temp = emp if fyear ==2013
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,

          As already meantioned, I did try this exact line and I receive an "invalid syntax" error message on the next line. Any idea why that may be?

          Comment


          • #6
            Aleksandar:
            what you complained about was clear, but in the following toy-example (that you can easy replicate) the code works:
            Code:
            . use "https://www.stata-press.com/data/r17/nlswork.dta"
            (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
            
            . bysort idcode: gen baseyear_emp_temp=grade if year==70
            
            . tab baseyear_emp_temp
            
            baseyear_em |
                 p_temp |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      0 |          3        0.18        0.18
                      1 |          1        0.06        0.24
                      4 |          3        0.18        0.42
                      5 |          4        0.24        0.65
                      6 |         11        0.65        1.30
                      7 |         17        1.01        2.31
                      8 |         48        2.85        5.16
                      9 |         66        3.91        9.07
                     10 |        122        7.24       16.31
                     11 |        117        6.94       23.25
                     12 |        988       58.60       81.85
                     13 |         81        4.80       86.65
                     14 |         71        4.21       90.87
                     15 |         46        2.73       93.59
                     16 |         91        5.40       98.99
                     17 |         10        0.59       99.58
                     18 |          7        0.42      100.00
            ------------+-----------------------------------
                  Total |      1,686      100.00
            
            .
            .
            Therefore you have to provide more details and/pr delve into the format of your variables: is there any -string- format variable?
            Please help interested listers help you out. Thanks.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo,

              None of the variables are strings: gvkey - long; fyear - double; emp - double; sale - double; sale1 - float.

              This line itself works with my dataset as well. It is the next line where I get an error:
              Code:
              bysort gvkey: gen baseyear_emp = max(baseyear_emp_temp)
              Many thanks,
              Aleksandar

              Comment


              • #8
                Aleksandar:
                try:
                Code:
                bysort gvkey: egen baseyear_emp = max(baseyear_emp_temp)
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hi Carlo,

                  Thanks, that did resolve the error! However, I am still not sure that writing 2013 as the base year gives me the data under the conditions that I need. Do you have any suggestion how to make the 'base year' any year within the period? I tried the following:
                  Code:
                  bysort gvkey: gen baseyear_emp_temp = emp if fyear == 2013 | 2014 | 2015 | 2016 | 2017
                  But, again, I am not sure whether this is appropriate as there are some firms for which I only get 1 observation.

                  Thanks,
                  Aleksandar

                  Comment


                  • #10
                    Aleksandar:
                    you may want to try:
                    Code:
                    bysort gvkey: gen baseyear_emp_temp = emp if fyear <=2017
                    You can replace 2017 with 2019 if the latter is the last year in your dataset.

                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X