Heckpoisson or Zero Inflated Negative Binomial?

Tom Menard

Join Date: Jul 2018

Posts: 29
#1

Heckpoisson or Zero Inflated Negative Binomial?

10 Jul 2019, 08:09

Dear Statalist Users,

I am trying to figure out which method is more appropriate for my research: heckpoisson or a zero inflated negative binomial regression.

I am looking at the innovation output of startups measured as patent applications per year, depending on the type of parent company (all startups are spin-offs).

In my sample of 160 firms, 55 have not applied for any patents at all. Hence, a zero inflated negative binomial regression seems appropriate I guess.
However, one of the independent variables I wanted to look at is "Patent Overlap", a binary variable that equals 1 if the startup and its parent company have patents in the same field, 0 if not. This variable obviously equals 0 if a startup did not apply for any patents at all. One of my colleagues pointed out that because of this a sample selection bias is present and he recommended using a heckman model.

I am unsure what is the right choice here. Any advice?

(I am happy to attach a dataex if that helps)

Best wishes,
Tom
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

11 Jul 2019, 11:13

You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

I don't think your colleague is right. You have observed values for all the variables, so I don't think you have a selection problem.You might have an endogeneity problem since number of patents may influence overlap.
Comment

Tom Menard

Join Date: Jul 2018
Posts: 29

15 Jul 2019, 07:21

Dear Phil,

thank you for your reply.

Here is a dataex example of my variables:

Patentsumt3 is the amount of patents three years after the treatment (investment)
Industry is a binary variable that equals 1 if the investor and the startup are from the same industry and 0 if not
Patentoverlap is also a binary variable that equals 1 if the investor and the startup have patents in the same technological fields and 0 if not
Geoproxcvc measures the distance in kilometers between the startup and the investor
Priorpatent measures the amount of patents that the startup had prior to the treatment (investment)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(patentsumt3 industry patentoverlap) double geoproxcvc byte age int priorpatent
 0 0 0    38.47 19  0
 0 0 0     35.8  6  0
 4 0 0  8297.56 27  2
 0 0 0  1195.79  4  0
 4 0 1   696.96  4  1
 8 1 1   345.02 23  0
 0 1 0  1521.02 10  0
 1 0 0  4205.12 12  1
 1 1 0  2936.07 24  3
 1 0 0     16.9  8  0
38 0 0  4146.89 22 27
22 1 1   326.89  5  0
 0 1 0  4111.21  6  0
 3 1 1   999.99 20  2
 2 1 1  1521.02 15  0
 0 1 0  8778.75 10  0
 3 1 1  9527.51 14  1
 1 1 1  5946.63 13  0
 0 0 0   538.35 13  0
 1 0 0  3648.37 21  0
 3 0 1  8297.56  6  2
10 1 1   469.43  7  0
16 0 0  8312.62 10 24
 0 0 0   700.66 10  0
 9 0 1   337.74 13  0
 0 0 0   559.44 16  0
 2 1 1  1113.71 10  8
 0 1 0  3854.06 14  0
 0 1 0     5955  6  2
 0 1 1  1752.75  7  0
 4 0 1    11.41 14  0
 1 1 1   279.64 13  0
 0 1 0  6137.52 14  0
 1 1 1    91.56  5  0
 4 0 1  4325.07  5 11
 1 0 1 12420.53 10  1
 0 1 0  4142.55 20  3
16 0 1 10830.96 10  1
 3 1 1  7781.53  9  0
 6 0 0   539.73  8  4
 4 1 1   645.94 10  4
 1 1 1   737.92 12  0
 0 0 0   680.04 12  0
 8 0 0  2776.26 24 18
 0 1 0   940.54 19  1
 2 0 0  2679.39 23 13
 2 1 1    41.91 15  2
 5 0 0  3483.46  7  3
18 0 1   775.63 38 30
92 1 1  3802.22 11 60
end

The endogeneity problem is a good hint. I did some further research and found advice to use -ivpoisson- instead of a negative binomial regression in order to account for the endogeneity issue. What are your thoughts?

Announcement

Heckpoisson or Zero Inflated Negative Binomial?

Comment

Comment