Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical variables to ordinal for dummy variables

    Hi all,
    I am a first time user of Stata currently doing my thesis which is exciting but at the same time challenging. I am excited to post here on this forum as I have read some great posts. Currently, I still clean up my data and convert variables into their correct form. My problem now is that I have 11 different owner types and 30 different industries that are to be grouped up into ordinal value. With these, I then want to be able to assign dummy variables as in the literature, for example, dummy = 1 if the industry is high tech, or dummy = 1 if owner type is a bank, in order to control for industry type and to test the effect of owner type respectively. I did the first step of encoding value labels from these string variables, and now I need to group these values, but I was not able to do this with recode. I hope my question makes sense.

    Thanks in advance.

    Regards

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str28 firm str15 country str38 industry double(rd sales rdintsales employees assets con) int age str15 type double innoscore float(year id size) long(indu ownertype)
    "ABENGOA"              "Spain"           "General Industrials"                    107.42      7356.47  .014602112154334889  24748 25246.594999999998 .5018  72 "Public"          49.27 2013  1 10.136447 13 10
    "ABENGOA"              "Spain"           "General Industrials"                    99.734     7150.567  .012864169504351118  24322          16627.199 .5131  73 "Public"          49.07 2014  1  9.718795 13 10
    "ABENGOA"              "Spain"           "General Industrials"                   133.654     5755.482  .023222034227541672  21923           9913.954 .4471  74 "Public"          49.19 2015  1  9.201698 13 10
    "ABENGOA"              "Spain"           "General Industrials"                   133.654     5755.482  .004438198091574821  20000               7243 .4471  75 "Public"          48.81 2016  1  8.887791 13 10
    "ACCENTURE"            "Ireland"         "Support Services"                   518.522238 22039.218163  .023527251927226177 275000 13596.033615316348 .0637  24 "Venture capital" 56.67 2013  2  9.517533 28 11
    "ACCENTURE"            "Ireland"         "Support Services"                   526.738065 26253.737144   .12412959381044487 305000 16287.168696441675  .068  25 "Venture capital" 59.13 2014  2  9.698133 28 11
    "ACCENTURE"            "Ireland"         "Support Services"                   574.576322 30232.788391  .019005072061796514 358000   18513.2994424012  .068  26 "Venture capital" 59.03 2015  2  9.826244 28 11
    "ACCENTURE"            "Ireland"         "Support Services"                   610.385141 33011.725445  .035528322321284736  44868  19188.06757590204  .068  27 "Venture capital" 58.13 2016  2  9.862044 28 11
    "ACTIA"                "France"          "Electronic & Electrical Equipment"       45.36      303.655   .14938005302069785   2680            304.505 .4891  27 "Corporate"       52.18 2013  3  5.718688  8  2
    "ACTIA"                "France"          "Electronic & Electrical Equipment"      22.055      339.893    .0648880677154281   2762                338 .4891  28 "Corporate"       53.59 2014  3  5.823046  8  2
    "ACTIA"                "France"          "Electronic & Electrical Equipment"      28.082      381.208    .0736658202346226   3067                363 .4891  29 "Corporate"       54.04 2015  3  5.894403  8  2
    "ACTIA"                "France"          "Electronic & Electrical Equipment"      34.416      521.648    .6454601605929586  16908                415 .4891  30 "Corporate"       54.18 2016  3  6.028278  8  2
    "ADIDAS"               "Germany"         "Personal Goods"                            128        14492   .00883245928788297  50728              12417  .075  64 "Bank"            56.02 2013  4  9.426822 25  1
    "ADIDAS"               "Germany"         "Personal Goods"                     103.780527        14534  .010369410264100569  53731              13343  .075  65 "Bank"            57.05 2014  4  9.498747 25  1
    "ADIDAS"               "Germany"         "Personal Goods"                            139        16915  .008217558380135975  55555              15176  .075  66 "Bank"            57.94 2015  4  9.627471 25  1
    "ADIDAS"               "Germany"         "Personal Goods"                            164        19291   .07755560817706691   4500              14522  .085  67 "Bank"            58.39 2016  4   9.58342 25  1
    "AEROPORTS DE PARIS"   "France"          "Industrial Transportation"              34.594     2754.457   .01255928119407927   9026               9792  .506  68 "Government"      52.18 2013  5  9.189321 19  6
    "AEROPORTS DE PARIS"   "France"          "Industrial Transportation"                  38         2791    .0136151916875672   8966              10448  .506  69 "Government"      53.59 2014  5  9.254166 19  6
    "AEROPORTS DE PARIS"   "France"          "Industrial Transportation"                  41         2916                 .016   8996              10592  .506  70 "Government"      54.04 2015  5  9.267855 19  6
    "AEROPORTS DE PARIS"   "France"          "Industrial Transportation"                  49         2947   .20851641770365464   6271              14276  .506  71 "Government"      54.18 2016  5  9.566335 19  6
    "AGFA-GEVAERT"         "Belgium"         "Electronic & Electrical Equipment"         146         2865   .05095986038394416  11047               2548 .7761  49 "Corporate"       51.69 2013  6  7.843064  8  2
    "AGFA-GEVAERT"         "Belgium"         "Electronic & Electrical Equipment"         146         2620                 .056  10506               2402 .7761  50 "Corporate"       50.91 2014  6  7.784057  8  2
    "AGFA-GEVAERT"         "Belgium"         "Electronic & Electrical Equipment"         144         2646   .05442176870748299  10241               2352    .5  51 "Corporate"       51.97 2015  6  7.763021  8  2
    "AGFA-GEVAERT"         "Belgium"         "Electronic & Electrical Equipment"         141         2537  .003452206213971185 156225               2233   .55  52 "Corporate"       49.85 2016  6  7.711101  8  2
    "AIRBUS"               "The Netherlands" "Aerospace & Defence"                      3581        59256  .060432698798433916 144061              96102   .11  13 "Corporate"       60.59 2013  7 11.473166  1  2
    "AIRBUS"               "The Netherlands" "Aerospace & Defence"                      3616        60713   .13137813211845104 138622             105782 .1095  14 "Corporate"       61.58 2014  7 11.569136  1  2
    "AIRBUS"               "The Netherlands" "Aerospace & Defence"                      3614        64450    .0560744763382467 136574             111133   .11  15 "Corporate"       58.29 2015  7 11.618483  1  2
    "AIRBUS"               "The Netherlands" "Aerospace & Defence"                      3281        66581   .04927832264459831 133782             113937   .11  16 "Corporate"       63.36 2016  7   11.6434  1  2
    "AKKA TECHNOLOGIES"    "France"          "Support Services"                       55.626      878.825   .06329587801894575  10784  
    end
    label values indu indu
    label def indu 1 "Aerospace & Defence", modify
    label def indu 4 "Beverages", modify
    label def indu 5 "Chemicals", modify
    label def indu 6 "Construction & Materials", modify
    label def indu 8 "Electronic & Electrical Equipment", modify
    label def indu 13 "General Industrials", modify
    label def indu 17 "Industrial Engineering", modify
    label def indu 18 "Industrial Metals & Mining", modify
    label def indu 19 "Industrial Transportation", modify
    label def indu 20 "Leisure Goods", modify
    label def indu 22 "Mining", modify
    label def indu 25 "Personal Goods", modify
    label def indu 26 "Pharmaceuticals & Biotechnology", modify
    label def indu 27 "Software & Computer Services", modify
    label def indu 28 "Support Services", modify
    label def indu 29 "Technology Hardware & Equipment", modify
    label values ownertype ownertype
    label def ownertype 1 "Bank", modify
    label def ownertype 2 "Corporate", modify
    label def ownertype 3 "Family", modify
    label def ownertype 4 "Financial", modify
    label def ownertype 6 "Government", modify
    label def ownertype 7 "Mutual", modify
    label def ownertype 10 "Public", modify
    label def ownertype 11 "Venture capital", modify

  • #2
    Hi Kristian, welcome to Statalist!

    And thanks for sharing some of your data using dataex!

    So the easiest way to create indicator variables is to use tabulate industry, gen(ind)

    Code:
    tabulate industry, gen(ind)
    tabulate ownertype, gen(own)
    
    . list id firm industry ind1- ind6 , sepby(id) noobs
    
      +-------------------------------------------------------------------------------------------------------+
      | id                 firm                            industry   ind1   ind2   ind3   ind4   ind5   ind6 |
      |-------------------------------------------------------------------------------------------------------|
      |  1              ABENGOA                 General Industrials      0      0      1      0      0      0 |
      |  1              ABENGOA                 General Industrials      0      0      1      0      0      0 |
      |  1              ABENGOA                 General Industrials      0      0      1      0      0      0 |
      |  1              ABENGOA                 General Industrials      0      0      1      0      0      0 |
      |-------------------------------------------------------------------------------------------------------|
      |  2            ACCENTURE                    Support Services      0      0      0      0      0      1 |
      |  2            ACCENTURE                    Support Services      0      0      0      0      0      1 |
      |  2            ACCENTURE                    Support Services      0      0      0      0      0      1 |
      |  2            ACCENTURE                    Support Services      0      0      0      0      0      1 |
      |-------------------------------------------------------------------------------------------------------|
      |  3                ACTIA   Electronic & Electrical Equipment      0      1      0      0      0      0 |
      |  3                ACTIA   Electronic & Electrical Equipment      0      1      0      0      0      0 |
      |  3                ACTIA   Electronic & Electrical Equipment      0      1      0      0      0      0 |
      |  3                ACTIA   Electronic & Electrical Equipment      0      1      0      0      0      0 |
      |-------------------------------------------------------------------------------------------------------|
      |  4               ADIDAS                      Personal Goods      0      0      0      0      1      0 |
      |  4               ADIDAS                      Personal Goods      0      0      0      0      1      0 |
      |  4               ADIDAS                      Personal Goods      0      0      0      0      1      0 |
      |  4               ADIDAS                      Personal Goods      0      0      0      0      1      0 |
      |-------------------------------------------------------------------------------------------------------|
      |  5   AEROPORTS DE PARIS           Industrial Transportation      0      0      0      1      0      0 |
      |  5   AEROPORTS DE PARIS           Industrial Transportation      0      0      0      1      0      0 |
      |  5   AEROPORTS DE PARIS           Industrial Transportation      0      0      0      1      0      0 |
      |  5   AEROPORTS DE PARIS           Industrial Transportation      0      0      0      1      0      0 |
      |-------------------------------------------------------------------------------------------------------|
    
    
    . list id firm industry ownertype own1- own5 if id<=5, sepby(id) noobs
    
      +------------------------------------------------------------------------------------------------------------------+
      | id                 firm                            industry         ownertype   own1   own2   own3   own4   own5 |
      |------------------------------------------------------------------------------------------------------------------|
      |  1              ABENGOA                 General Industrials            Public      0      0      0      1      0 |
      |  1              ABENGOA                 General Industrials            Public      0      0      0      1      0 |
      |  1              ABENGOA                 General Industrials            Public      0      0      0      1      0 |
      |  1              ABENGOA                 General Industrials            Public      0      0      0      1      0 |
      |------------------------------------------------------------------------------------------------------------------|
      |  2            ACCENTURE                    Support Services   Venture capital      0      0      0      0      1 |
      |  2            ACCENTURE                    Support Services   Venture capital      0      0      0      0      1 |
      |  2            ACCENTURE                    Support Services   Venture capital      0      0      0      0      1 |
      |  2            ACCENTURE                    Support Services   Venture capital      0      0      0      0      1 |
      |------------------------------------------------------------------------------------------------------------------|
      |  3                ACTIA   Electronic & Electrical Equipment         Corporate      0      1      0      0      0 |
      |  3                ACTIA   Electronic & Electrical Equipment         Corporate      0      1      0      0      0 |
      |  3                ACTIA   Electronic & Electrical Equipment         Corporate      0      1      0      0      0 |
      |  3                ACTIA   Electronic & Electrical Equipment         Corporate      0      1      0      0      0 |
      |------------------------------------------------------------------------------------------------------------------|
      |  4               ADIDAS                      Personal Goods              Bank      1      0      0      0      0 |
      |  4               ADIDAS                      Personal Goods              Bank      1      0      0      0      0 |
      |  4               ADIDAS                      Personal Goods              Bank      1      0      0      0      0 |
      |  4               ADIDAS                      Personal Goods              Bank      1      0      0      0      0 |
      |------------------------------------------------------------------------------------------------------------------|
      |  5   AEROPORTS DE PARIS           Industrial Transportation        Government      0      0      1      0      0 |
      |  5   AEROPORTS DE PARIS           Industrial Transportation        Government      0      0      1      0      0 |
      |  5   AEROPORTS DE PARIS           Industrial Transportation        Government      0      0      1      0      0 |
      |  5   AEROPORTS DE PARIS           Industrial Transportation        Government      0      0      1      0      0 |
      +------------------------------------------------------------------------------------------------------------------+
    
    * You may not need to create them because you can use factor notation in your regressions
    * See help fvvarlist 
    . reg sales i.indu i.year
    
          Source |       SS           df       MS      Number of obs   =        28
    -------------+----------------------------------   F(8, 19)        =    310.94
           Model |  1.2044e+10         8  1.5055e+09   Prob > F        =    0.0000
        Residual |  91990092.4        19  4841583.81   R-squared       =    0.9924
    -------------+----------------------------------   Adj R-squared   =    0.9892
           Total |  1.2136e+10        27   449468430   Root MSE        =    2200.4
    
    ----------------------------------------------------------------------------------------------------
                                 sales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
                                  indu |
    Electronic & Electrical Equipment  |   -61223.2    1347.44   -45.44   0.000    -64043.42   -58402.98
                  General Industrials  |   -56245.5   1555.889   -36.15   0.000    -59502.01   -52988.99
            Industrial Transportation  |  -59897.89   1555.889   -38.50   0.000     -63154.4   -56641.37
                       Personal Goods  |     -46442   1555.889   -29.85   0.000    -49698.51   -43185.49
                     Support Services  |  -34865.63   1555.889   -22.41   0.000    -38122.15   -31609.12
                                       |
                                  year |
                                 2014  |   762.1996   1176.142     0.65   0.525    -1699.494    3223.893
                                 2015  |   2032.811   1176.142     1.73   0.100     -428.882    4494.504
                                 2016  |   3082.579   1176.142     2.62   0.017     620.8861    5544.273
                                       |
                                 _cons |    61280.6   1314.967    46.60   0.000     58528.35    64032.86
    ----------------------------------------------------------------------------------------------------

    Comment


    • #3
      Originally posted by David Benson View Post
      Hi Kristian, welcome to Statalist!

      And thanks for sharing some of your data using dataex!

      So the easiest way to create indicator variables is to use tabulate industry, gen(ind)

      Code:
      tabulate industry, gen(ind)
      tabulate ownertype, gen(own)
      
      . list id firm industry ind1- ind6 , sepby(id) noobs
      
      +-------------------------------------------------------------------------------------------------------+
      | id firm industry ind1 ind2 ind3 ind4 ind5 ind6 |
      |-------------------------------------------------------------------------------------------------------|
      | 1 ABENGOA General Industrials 0 0 1 0 0 0 |
      | 1 ABENGOA General Industrials 0 0 1 0 0 0 |
      | 1 ABENGOA General Industrials 0 0 1 0 0 0 |
      | 1 ABENGOA General Industrials 0 0 1 0 0 0 |
      |-------------------------------------------------------------------------------------------------------|
      | 2 ACCENTURE Support Services 0 0 0 0 0 1 |
      | 2 ACCENTURE Support Services 0 0 0 0 0 1 |
      | 2 ACCENTURE Support Services 0 0 0 0 0 1 |
      | 2 ACCENTURE Support Services 0 0 0 0 0 1 |
      |-------------------------------------------------------------------------------------------------------|
      | 3 ACTIA Electronic & Electrical Equipment 0 1 0 0 0 0 |
      | 3 ACTIA Electronic & Electrical Equipment 0 1 0 0 0 0 |
      | 3 ACTIA Electronic & Electrical Equipment 0 1 0 0 0 0 |
      | 3 ACTIA Electronic & Electrical Equipment 0 1 0 0 0 0 |
      |-------------------------------------------------------------------------------------------------------|
      | 4 ADIDAS Personal Goods 0 0 0 0 1 0 |
      | 4 ADIDAS Personal Goods 0 0 0 0 1 0 |
      | 4 ADIDAS Personal Goods 0 0 0 0 1 0 |
      | 4 ADIDAS Personal Goods 0 0 0 0 1 0 |
      |-------------------------------------------------------------------------------------------------------|
      | 5 AEROPORTS DE PARIS Industrial Transportation 0 0 0 1 0 0 |
      | 5 AEROPORTS DE PARIS Industrial Transportation 0 0 0 1 0 0 |
      | 5 AEROPORTS DE PARIS Industrial Transportation 0 0 0 1 0 0 |
      | 5 AEROPORTS DE PARIS Industrial Transportation 0 0 0 1 0 0 |
      |-------------------------------------------------------------------------------------------------------|
      
      
      . list id firm industry ownertype own1- own5 if id<=5, sepby(id) noobs
      
      +------------------------------------------------------------------------------------------------------------------+
      | id firm industry ownertype own1 own2 own3 own4 own5 |
      |------------------------------------------------------------------------------------------------------------------|
      | 1 ABENGOA General Industrials Public 0 0 0 1 0 |
      | 1 ABENGOA General Industrials Public 0 0 0 1 0 |
      | 1 ABENGOA General Industrials Public 0 0 0 1 0 |
      | 1 ABENGOA General Industrials Public 0 0 0 1 0 |
      |------------------------------------------------------------------------------------------------------------------|
      | 2 ACCENTURE Support Services Venture capital 0 0 0 0 1 |
      | 2 ACCENTURE Support Services Venture capital 0 0 0 0 1 |
      | 2 ACCENTURE Support Services Venture capital 0 0 0 0 1 |
      | 2 ACCENTURE Support Services Venture capital 0 0 0 0 1 |
      |------------------------------------------------------------------------------------------------------------------|
      | 3 ACTIA Electronic & Electrical Equipment Corporate 0 1 0 0 0 |
      | 3 ACTIA Electronic & Electrical Equipment Corporate 0 1 0 0 0 |
      | 3 ACTIA Electronic & Electrical Equipment Corporate 0 1 0 0 0 |
      | 3 ACTIA Electronic & Electrical Equipment Corporate 0 1 0 0 0 |
      |------------------------------------------------------------------------------------------------------------------|
      | 4 ADIDAS Personal Goods Bank 1 0 0 0 0 |
      | 4 ADIDAS Personal Goods Bank 1 0 0 0 0 |
      | 4 ADIDAS Personal Goods Bank 1 0 0 0 0 |
      | 4 ADIDAS Personal Goods Bank 1 0 0 0 0 |
      |------------------------------------------------------------------------------------------------------------------|
      | 5 AEROPORTS DE PARIS Industrial Transportation Government 0 0 1 0 0 |
      | 5 AEROPORTS DE PARIS Industrial Transportation Government 0 0 1 0 0 |
      | 5 AEROPORTS DE PARIS Industrial Transportation Government 0 0 1 0 0 |
      | 5 AEROPORTS DE PARIS Industrial Transportation Government 0 0 1 0 0 |
      +------------------------------------------------------------------------------------------------------------------+
      
      * You may not need to create them because you can use factor notation in your regressions
      * See help fvvarlist
      . reg sales i.indu i.year
      
      Source | SS df MS Number of obs = 28
      -------------+---------------------------------- F(8, 19) = 310.94
      Model | 1.2044e+10 8 1.5055e+09 Prob > F = 0.0000
      Residual | 91990092.4 19 4841583.81 R-squared = 0.9924
      -------------+---------------------------------- Adj R-squared = 0.9892
      Total | 1.2136e+10 27 449468430 Root MSE = 2200.4
      
      ----------------------------------------------------------------------------------------------------
      sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -----------------------------------+----------------------------------------------------------------
      indu |
      Electronic & Electrical Equipment | -61223.2 1347.44 -45.44 0.000 -64043.42 -58402.98
      General Industrials | -56245.5 1555.889 -36.15 0.000 -59502.01 -52988.99
      Industrial Transportation | -59897.89 1555.889 -38.50 0.000 -63154.4 -56641.37
      Personal Goods | -46442 1555.889 -29.85 0.000 -49698.51 -43185.49
      Support Services | -34865.63 1555.889 -22.41 0.000 -38122.15 -31609.12
      |
      year |
      2014 | 762.1996 1176.142 0.65 0.525 -1699.494 3223.893
      2015 | 2032.811 1176.142 1.73 0.100 -428.882 4494.504
      2016 | 3082.579 1176.142 2.62 0.017 620.8861 5544.273
      |
      _cons | 61280.6 1314.967 46.60 0.000 58528.35 64032.86
      ----------------------------------------------------------------------------------------------------
      Hi David,

      Thank you for this and the detailed response! Is there a way to then use the generated ind variables to collapse a select group into a single variable? For example, 6 different industry indicators making up one group. Or am I missing something?


      Regards

      Comment


      • #4
        Hi Kristian Pal,

        Regarding your question:

        Is there a way to then use the generated ind variables to collapse a select group into a single variable? For example, 6 different industry indicators making up one group.
        I'm not quite sure what you mean--could you restate it? Are you talking about combining several groups into a single indicator variable?

        Code:
        * Create an indicator=1 if firm is in Electronic & Electrical Equip, Industrial Transport, or General Industrials
        gen d_industrial = 0
        replace d_industrial = 1 if ind2==1 | ind3==1 | ind4==1
        
        * Could also do as
        replace d_industrial = 1 if inlist(industry, "Electronic & Electrical Equipment", "Industrial Transportation", "General Industrials")

        Comment


        • #5
          Originally posted by David Benson View Post
          Hi Kristian Pal,

          Regarding your question:



          I'm not quite sure what you mean--could you restate it? Are you talking about combining several groups into a single indicator variable?

          Code:
          * Create an indicator=1 if firm is in Electronic & Electrical Equip, Industrial Transport, or General Industrials
          gen d_industrial = 0
          replace d_industrial = 1 if ind2==1 | ind3==1 | ind4==1
          
          * Could also do as
          replace d_industrial = 1 if inlist(industry, "Electronic & Electrical Equipment", "Industrial Transportation", "General Industrials")
          Hi David,
          Hope all is well. Thank you! I use your second suggestion and it seems to produce the result I was looking for, i.e. i can control for an industry group with this dummy. A note for anyone using inlist, the maximum count for strings is ten. One of my groups was 11, but I could solve this but simply doing 2 inlist functions separated by an "OR", |.

          Comment


          • #6
            As a follow up to this, I pooled owner-type into smaller groups from 10 types as per literature and end up with 5 dummies for owner types with the help of replace and inlist as you suggested.
            However, one of them is omitted due to colinearity when running an OLS regression. Is this to avoid the dummy trap, i.e. this becomes a reference variable? That is a mistake since this variable is of interest. Also, the VIF is not out of the ordinary. Stata does not explicitly omit if I use factor notation i.own in OLS, which is the un-pooled owner types, however only 9 coefficients are included in the regression?
            Last edited by Kristian Pal; 06 Mar 2019, 05:39.

            Comment

            Working...
            X