Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fama and French industry classifications

    Dear Stata users,

    I have been struggling to convert my sic code into the 12 industries classified by Fama and French. I have encoded my sic code into a numeric variable :
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long sic str4 SIC
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    287 "5080"
    189 "3661"
    189 "3661"
    189 "3661"
    189 "3661"
    189 "3661"
    226 "3844"
    226 "3844"
    226 "3844"
    89 "2834"
    89 "2834"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    248 "4512"
    264 "4922"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    165 "3564"
    219 "3825"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    373 "6799"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    192 "3670"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    263 "4911"
    385 "7359"
    385 "7359"
    385 "7359"
    385 "7359"
    385 "7359"
    385 "7359"
    385 "7359"
    385 "7359"
    385 "7359"
    385 "7359"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    89 "2834"
    70 "2621"
    end
    label values sic sic_code
    label def sic_code 70 "2621", modify
    label def sic_code 89 "2834", modify
    label def sic_code 165 "3564", modify
    label def sic_code 189 "3661", modify
    label def sic_code 192 "3670", modify
    label def sic_code 219 "3825", modify
    label def sic_code 226 "3844", modify
    label def sic_code 248 "4512", modify
    label def sic_code 263 "4911", modify
    label def sic_code 264 "4922", modify
    label def sic_code 287 "5080", modify
    label def sic_code 373 "6799", modify
    label def sic_code 385 "7359", modify
    [/CODE]


    However when typing in the following code
    Code:
    gen ff_12= 1 if (sic>=100 & sic<=999)|(sic>=2000 & sic<=2399)|(sic>=2700 & sic<=2749)|(sic>=2770 & sic<=2779)|(sic>=3100 & sic<=3199)|(sic>=3940 & sic<=3989) replace ff_12 = 2 if (sic>=2500 & sic<=2519)|(sic>=2590 & sic<=2599)|(sic>=3630 & sic<=3659)|(sic==3710) | (sic==3711) | (sic==3714) | (sic==3716) | (sic==3750) | (sic==3751) | (sic==3792)|(sic>=3900 & sic<=3939)|(sic>=3990 & sic<=3999) replace ff_12= 3 if (sic>=2520 & sic<=2589)|(sic>=2600 & sic<=2699)|(sic>=2750 & sic<=2769)|(sic>=3000 & sic<=3099)|(sic>=3200 & sic<=3569)|(sic>=3580 & sic<=3629)|(sic>=3700 & sic<=3709)| (sic==3712) | (sic==3713) | (sic==3715)|(sic>=3717 & sic<=3749)|(sic>=3752 & sic<=3791)|(sic>=3793 & sic<=3799)|(sic>=3830 & sic<=3839)|(sic>=3860 & sic<=3899) replace ff_12= 4 if (sic>=1200 & sic<=1399)|(sic>=2900 & sic<=2999) replace ff_12= 5 if (sic>=2800 & sic<=2829)|(sic>=2840 & sic<=2899) replace ff_12= 6 if (sic>=3570 & sic<=3579)|(sic>=3660 & sic<=3692)|(sic>=3694 & sic<=3699)|(sic>=3810 & sic<=3829)|(sic>=7370 & sic<=7379) replace ff_12= 7 if (sic>=4800 & sic<=4899) replace ff_12= 8 if (sic>=4900 & sic<=4949) replace ff_12= 9 if (sic>=5000 & sic<=5999)|(sic>=7200 & sic<=7299)|(sic>=7600 & sic<=7699) replace ff_12= 10 if (sic>=2830 & sic<=2839)|(sic==3693)|(sic>=3840 & sic<=3859)|(sic>=8000 & sic<=8099) replace ff_12= 11 if (sic>=6000 & sic<=6999) replace ff_12= 12 if ff_12==. Stata displays that 0 real changes have been made between the 2nd to the 11th industry codes only classifying the 1st and last industry sic codes. Afterwards, I tried to make sense of the file I attached to this post but I found it very difficult to navigate and ultimately leading to none of my sic codes being translated into the industry classes. I tried to download the help file but it does not open after I download it with a message saying "The Help for this program was created in Windows Help format, which depends on a feature that isn't included in Windows 8.1 or Windows RT 8.1". I have been really struggling with this and I would really appreciate if someone can aid me with this pressing issue. Thanks in advance, Alli (Mr)
    Attached Files

  • #2
    Below I include the code you provide in a more readable format.

    We can see in your sample data that all of the values of the variable "sic" lie between 100 and 999, so this code would recode them all to industry class 1.

    I think your problem is that you need to recode the 4-digit code given by the string variable SIC rather than the 3-digit code currently in the variable sic. Here is some code that does that, without having to retype any of your 12 commands.
    Code:
    rename sic sic3
    destring SIC, generate(sic)
    gen     ff_12= 1 if (sic>=100 & sic<=999)|(sic>=2000 & sic<=2399)|(sic>=2700 & sic<=2749)|(sic>=2770 & sic<=2779)|(sic>=3100 & sic<=3199)|(sic>=3940 & sic<=3989)
    replace ff_12= 2 if (sic>=2500 & sic<=2519)|(sic>=2590 & sic<=2599)|(sic>=3630 & sic<=3659)|(sic==3710) | (sic==3711) | (sic==3714) | (sic==3716) | (sic==3750) | (sic==3751) | (sic==3792)|(sic>=3900 & sic<=3939)|(sic>=3990 & sic<=3999)
    replace ff_12= 3 if (sic>=2520 & sic<=2589)|(sic>=2600 & sic<=2699)|(sic>=2750 & sic<=2769)|(sic>=3000 & sic<=3099)|(sic>=3200 & sic<=3569)|(sic>=3580 & sic<=3629)|(sic>=3700 & sic<=3709)| (sic==3712) | (sic==3713) | (sic==3715)|(sic>=3717 & sic<=3749)|(sic>=3752 & sic<=3791)|(sic>=3793 & sic<=3799)|(sic>=3830 & sic<=3839)|(sic>=3860 & sic<=3899)
    replace ff_12= 4 if (sic>=1200 & sic<=1399)|(sic>=2900 & sic<=2999)
    replace ff_12= 5 if (sic>=2800 & sic<=2829)|(sic>=2840 & sic<=2899)
    replace ff_12= 6 if (sic>=3570 & sic<=3579)|(sic>=3660 & sic<=3692)|(sic>=3694 & sic<=3699)|(sic>=3810 & sic<=3829)|(sic>=7370 & sic<=7379)
    replace ff_12= 7 if (sic>=4800 & sic<=4899)
    replace ff_12= 8 if (sic>=4900 & sic<=4949)
    replace ff_12= 9 if (sic>=5000 & sic<=5999)|(sic>=7200 & sic<=7299)|(sic>=7600 & sic<=7699)
    replace ff_12= 10 if (sic>=2830 & sic<=2839)|(sic==3693)|(sic>=3840 & sic<=3859)|(sic>=8000 & sic<=8099)
    replace ff_12= 11 if (sic>=6000 & sic<=6999)
    replace ff_12= 12 if ff_12==.
    tab ff_12
    Code:
    . tab ff_12
    
          ff_12 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              3 |         12       12.00       12.00
              6 |         17       17.00       29.00
              8 |         12       12.00       41.00
              9 |         11       11.00       52.00
             10 |         16       16.00       68.00
             11 |         11       11.00       79.00
             12 |         21       21.00      100.00
    ------------+-----------------------------------
          Total |        100      100.00
    On further reflection, it appears to me that you started with the string variable SIC and used the encode command to convert the values to the numeric variable sic. That was a mistake. The output of help encode tells us

    Do not use encode if varname contains numbers that merely happen to be stored as strings
    and directs us to the destring command that I used.
    Last edited by William Lisowski; 10 Jan 2019, 07:13.

    Comment


    • #3
      Thank you very much William it worked for me.

      Kind regards

      Alli (Mr)

      Comment


      • #4
        You may also find the sicff package useful. For example:
        Code:
        ssc install sicff
        sicff sic, ind(12)
        This will create a new variable: ff_12

        Comment

        Working...
        X