Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckpoisson or Zero Inflated Negative Binomial?

    Dear Statalist Users,

    I am trying to figure out which method is more appropriate for my research: heckpoisson or a zero inflated negative binomial regression.

    I am looking at the innovation output of startups measured as patent applications per year, depending on the type of parent company (all startups are spin-offs).

    In my sample of 160 firms, 55 have not applied for any patents at all. Hence, a zero inflated negative binomial regression seems appropriate I guess.
    However, one of the independent variables I wanted to look at is "Patent Overlap", a binary variable that equals 1 if the startup and its parent company have patents in the same field, 0 if not. This variable obviously equals 0 if a startup did not apply for any patents at all. One of my colleagues pointed out that because of this a sample selection bias is present and he recommended using a heckman model.

    I am unsure what is the right choice here. Any advice?

    (I am happy to attach a dataex if that helps)

    Best wishes,
    Tom

  • #2
    You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    I don't think your colleague is right. You have observed values for all the variables, so I don't think you have a selection problem.You might have an endogeneity problem since number of patents may influence overlap.

    Comment


    • #3
      Dear Phil,

      thank you for your reply.

      Here is a dataex example of my variables:
      • Patentsumt3 is the amount of patents three years after the treatment (investment)
      • Industry is a binary variable that equals 1 if the investor and the startup are from the same industry and 0 if not
      • Patentoverlap is also a binary variable that equals 1 if the investor and the startup have patents in the same technological fields and 0 if not
      • Geoproxcvc measures the distance in kilometers between the startup and the investor
      • Priorpatent measures the amount of patents that the startup had prior to the treatment (investment)
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(patentsumt3 industry patentoverlap) double geoproxcvc byte age int priorpatent
       0 0 0    38.47 19  0
       0 0 0     35.8  6  0
       4 0 0  8297.56 27  2
       0 0 0  1195.79  4  0
       4 0 1   696.96  4  1
       8 1 1   345.02 23  0
       0 1 0  1521.02 10  0
       1 0 0  4205.12 12  1
       1 1 0  2936.07 24  3
       1 0 0     16.9  8  0
      38 0 0  4146.89 22 27
      22 1 1   326.89  5  0
       0 1 0  4111.21  6  0
       3 1 1   999.99 20  2
       2 1 1  1521.02 15  0
       0 1 0  8778.75 10  0
       3 1 1  9527.51 14  1
       1 1 1  5946.63 13  0
       0 0 0   538.35 13  0
       1 0 0  3648.37 21  0
       3 0 1  8297.56  6  2
      10 1 1   469.43  7  0
      16 0 0  8312.62 10 24
       0 0 0   700.66 10  0
       9 0 1   337.74 13  0
       0 0 0   559.44 16  0
       2 1 1  1113.71 10  8
       0 1 0  3854.06 14  0
       0 1 0     5955  6  2
       0 1 1  1752.75  7  0
       4 0 1    11.41 14  0
       1 1 1   279.64 13  0
       0 1 0  6137.52 14  0
       1 1 1    91.56  5  0
       4 0 1  4325.07  5 11
       1 0 1 12420.53 10  1
       0 1 0  4142.55 20  3
      16 0 1 10830.96 10  1
       3 1 1  7781.53  9  0
       6 0 0   539.73  8  4
       4 1 1   645.94 10  4
       1 1 1   737.92 12  0
       0 0 0   680.04 12  0
       8 0 0  2776.26 24 18
       0 1 0   940.54 19  1
       2 0 0  2679.39 23 13
       2 1 1    41.91 15  2
       5 0 0  3483.46  7  3
      18 0 1   775.63 38 30
      92 1 1  3802.22 11 60
      end
      The endogeneity problem is a good hint. I did some further research and found advice to use -ivpoisson- instead of a negative binomial regression in order to account for the endogeneity issue. What are your thoughts?

      Comment

      Working...
      X