Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple industry variables

    Hello,

    I'm using STATA 15, SE.

    I'm facing an issue, trying to come up with a way to control for industry fixed effects in my regression and other analysis, but the problem is a firm can belong to several industries at the same time. Meaning, I have this:

    firm_id Industry1 Industry2 Industry3 Industry4
    A Auto Mining . .
    B Auto publishing Telecumincaiton Services
    C Mining publishing Services .
    D High-Tech Services . .
    As you can see, firms can have multiple industry classifications, and once they do, it is in alphabetical order.

    I'd like to be able to control for industry, such that each firm that is represented in that industry will be included. Meaning, if I run the regression:

    reg y x i.industry

    All firms that are in the Auto industry are given a coefficient, all firms in the mining industry are given a coefficient, and so on.

    Another way to think of this is to say I run the following command:

    bys industry: sum XXX All firms that are in the Auto industry are given a coefficient, all firms in the mining industry are given a coefficient, and so on.

    I couldn't find an answer to this question so I would really appreciate your thoughts on this.


    Thank you.
    Last edited by Ofir Gefen; 27 Feb 2020, 23:19.

  • #2
    What do you get with
    Code:
    regress y c.x i.industry i.firm_id
    and what don't you like about it?

    .ÿ
    .ÿversionÿ15.1

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿsetÿseedÿ`=strreverse("1538851")'

    .ÿquietlyÿsetÿobsÿ100

    .ÿgenerateÿbyteÿpidÿ=ÿ_n

    .ÿgenerateÿdoubleÿpid_uÿ=ÿrnormal()

    .ÿ
    .ÿquietlyÿexpandÿ5

    .ÿgenerateÿdoubleÿxÿ=ÿruniform(-0.,ÿ0.5)

    .ÿbysortÿpid:ÿgenerateÿbyteÿindÿ=ÿ_n

    .ÿ
    .ÿ//ÿEachÿcompanyÿbelongsÿtoÿanÿaverageÿofÿthreeÿindustries
    .ÿgenerateÿbyteÿkeepÿ=ÿruniformint(1,ÿ5)

    .ÿquietlyÿkeepÿifÿindÿ<=ÿkeep

    .ÿ
    .ÿgenerateÿdoubleÿyÿ=ÿxÿ+ÿ(indÿ-ÿ3)ÿ/ÿ5ÿ+ÿpid_uÿ+ÿrnormal()

    .ÿ
    .ÿxtregÿyÿc.xÿi.ind,ÿi(pid)ÿfe

    Fixed-effectsÿ(within)ÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ301
    Groupÿvariable:ÿpidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ100

    R-sq:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
    ÿÿÿÿÿwithinÿÿ=ÿ0.0807ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ1
    ÿÿÿÿÿbetweenÿ=ÿ0.0007ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ3.0
    ÿÿÿÿÿoverallÿ=ÿ0.0174ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ5

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿF(5,196)ÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ3.44
    corr(u_i,ÿXb)ÿÿ=ÿ-0.0685ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0053

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿyÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿ1.694076ÿÿÿ.4818364ÿÿÿÿÿ3.52ÿÿÿ0.001ÿÿÿÿÿ.7438266ÿÿÿÿ2.644325
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿindÿ|
    ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ.2939701ÿÿÿ.1539514ÿÿÿÿÿ1.91ÿÿÿ0.058ÿÿÿÿ-.0096438ÿÿÿÿÿ.597584
    ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ.3108797ÿÿÿ.1709137ÿÿÿÿÿ1.82ÿÿÿ0.070ÿÿÿÿ-.0261863ÿÿÿÿ.6479458
    ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿÿÿ.200375ÿÿÿ.2050227ÿÿÿÿÿ0.98ÿÿÿ0.330ÿÿÿÿ-.2039587ÿÿÿÿ.6047088
    ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿÿ.2290177ÿÿÿ.2678822ÿÿÿÿÿ0.85ÿÿÿ0.394ÿÿÿÿ-.2992839ÿÿÿÿ.7573193
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.3871508ÿÿÿ.1582035ÿÿÿÿ-2.45ÿÿÿ0.015ÿÿÿÿ-.6991505ÿÿÿ-.0751511
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿsigma_uÿ|ÿÿ1.2997426
    ÿÿÿÿÿsigma_eÿ|ÿÿÿ.9969693
    ÿÿÿÿÿÿÿÿÿrhoÿ|ÿÿ.62957712ÿÿÿ(fractionÿofÿvarianceÿdueÿtoÿu_i)
    ------------------------------------------------------------------------------
    Fÿtestÿthatÿallÿu_i=0:ÿF(99,ÿ196)ÿ=ÿ4.66ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿFÿ=ÿ0.0000

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Thank you for your reply.

      I get the following error:

      matsize too small
      You have attempted to create a matrix with too many rows or columns or
      attempted to fit a model with too many variables. You need to increase
      matsize; it is currently 10000. Use set matsize; see help matsize.

      If you are using factor variables and included an interaction that has lots
      of missing cells, either increase matsize or set emptycells drop to reduce
      the required matrix size; see help set emptycells.

      If you are using factor variables, you might have accidentally treated a
      continuous variable as a categorical, resulting in lots of categories. Use
      the c. operator on such variables.


      By the way, I have about 68,000 observations, each firm can choose up to 8 industries. Average is about industries per firm.


      Comment


      • #4
        I just dropped a large portion of the sample to try your suggestion.

        It obviously goes for thousands of rows like these, but you get the gist.

        Screen Shot 2020-02-28 at 3.02.00 PM.png

        Comment


        • #5
          You saw in the output of #2 that there's no inherent problem in the analysis that you're trying with a dataset that meets the conditions you've given. Why don't you show data for a random handful of companies, and the Stata command for regression model that you're attempting to fit. Maybe we can tease apart what's different about them, and get at what's the problem.

          Comment

          Working...
          X