Hello,
This is my first time posting here, so apologies for any potential mistakes.
Short background: I examine how characteristics of citations made by a patent (i.e. backward citations) influence the number of citations that this patent subsequently receives (i.e. forward citations). This approach is similar to examining how the references cited by a journal article influence the number of times this article is subsequently cited.
Analyses: I am testing the potential U-shaped impact of an independent variable (i.e. ‘time gap’ of backward cites) on a dependent variable (i.e. number of ‘forward citations’) running a negative binomial regression model. The dependent variable is a count variable (named fw4), with the following distribution:
The independent variable looks like this:
A sample of the data is as follows:
Important to note is that, here:
pat_finalid denotes the firm that applied for the patent
pat_doc denotes the unique patent identification number
pat_priy denotes the year of application of the patent
I excluded the coefficients of the dummies for pat_finalid and pat_priy to save space.
When I run the negative binomial regression I obtain the following:
Subsequently, the plot of the predicted probabilities of ‘time gap’ look like this:
I subsequently rely on utest from SSC in Stata 14.2. This allows testing for the statistical significance of the inflection point of a curvilinear relationship, as well as estimating the statistical significance and sign of the slope on both sides of the inflection point.
The utest command returns the following:
However, I am a bit confused why the average marginal effects return different results, showing that the average marginal effects beyond a value of 13 are not statistically significant (p>0.05):
I am not sure how to reconcile these results. On the one hand, the predicted probabilities (and utest) show evidence of a U-shaped relationship between gap and cites received, but then the average marginal effects return non-significant effects beyond a value of 13.
I also ran an additional check using OLS regression, and a log-transformed dependent variable, which produced results in line with a U-shaped relationship between ‘time gap’ and ‘forward cites’ (i.e. statistically significant linear and quadratic coefficients, predictive margins, average marginal effects). I don’t report these results for the sake of brevity.
So, in summary, my question is: How can I explain the discrepancies between the predicted probabilities and average marginal effects in my negative binomial regression?
I would hugely appreciate any help you could give me with this problem.
This is my first time posting here, so apologies for any potential mistakes.
Short background: I examine how characteristics of citations made by a patent (i.e. backward citations) influence the number of citations that this patent subsequently receives (i.e. forward citations). This approach is similar to examining how the references cited by a journal article influence the number of times this article is subsequently cited.
Analyses: I am testing the potential U-shaped impact of an independent variable (i.e. ‘time gap’ of backward cites) on a dependent variable (i.e. number of ‘forward citations’) running a negative binomial regression model. The dependent variable is a count variable (named fw4), with the following distribution:
Code:
sum fw4, detail
Citations received
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 21,117
25% 0 0 Sum of Wgt. 21,117
50% 1 Mean 2.48165
Largest Std. Dev. 4.22759
75% 3 52
90% 7 70 Variance 17.87252
95% 10 81 Skewness 4.294934
99% 20 85 Kurtosis 37.63707
Code:
sum timgap, detail
Gap
-------------------------------------------------------------
Percentiles Smallest
1% 1 0
5% 1 0
10% 1 0 Obs 21,117
25% 1 0 Sum of Wgt. 21,117
50% 1 Mean 1.724582
Largest Std. Dev. 1.765048
75% 2 28
90% 3 38 Variance 3.115394
95% 4 45.5 Skewness 7.354439
99% 9.5 58 Kurtosis 113.7121
A sample of the data is as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(fw4 count) float(reusebeginofyear medianage timgap) byte teamsize float(patentgrant uniqueoffice shareint recbreadth pat_finalid) int pat_priy long pat_doc 1 8 7.875 5 1.5 1 1 1 0 .768595 28 1967 24645379 0 8 7.375 3 1 3 1 3 0 .9469435 4 2007 39709822 0 14 12.285714 4 1 2 1 3 .4285714 0 18 2003 34701225 1 10 9.9 3.5 1 4 1 5 .2 .43055555 1 2006 38535557 0 10 6.5 9 4 2 0 3 .2 .9921035 39 2007 39710045 3 8 1.375 2 1 3 1 2 0 .607438 17 2004 35060906 0 10 2.8 5 1 2 1 1 .2 .2520661 5 2004 35183594 3 9 20.333334 12 1 5 0 1 .11111111 .9238535 121 2004 36615448 1 8 .25 8.5 8.5 3 0 1 0 .838843 112 2000 18843534 3 13 4.923077 2 2 4 1 5 .3846154 .52076125 2 2002 32510676 3 8 .375 12.5 11 2 1 1 0 .9746667 23 1999 17011054 1 8 7.25 5 1.5 4 0 4 0 .9153979 1 2007 40381446 2 8 3.5 2 1 3 1 2 0 .9693205 116 2003 34102979 0 8 11.25 8.5 1 2 0 1 0 .934375 10 1995 7771531 1 8 .625 5 1.5 1 1 1 0 .505 118 2000 18685685 3 9 40.22222 9 1 3 1 3 .11111111 .9566575 17 2003 34394218 0 8 19.625 4 1 1 1 4 0 .607438 1 2002 32473712 3 9 12.444445 13 1 5 1 4 .11111111 .9823909 21 1997 17434323 1 8 4.75 7 1.5 1 0 1 0 .46 1 2004 35183815 6 8 6.625 6 1 3 1 2 0 0 10 2000 7628904 2 8 14 14.5 3 4 1 5 0 .9738292 81 2003 33427484 end
pat_finalid denotes the firm that applied for the patent
pat_doc denotes the unique patent identification number
pat_priy denotes the year of application of the patent
I excluded the coefficients of the dummies for pat_finalid and pat_priy to save space.
When I run the negative binomial regression I obtain the following:
Code:
nbreg fw4 count shareint recb teamsize uniqueo patentgrant reuseb medianage timgap c.timgap#c.timgap i.pat_p i.pat_f, robust
Fitting Poisson model:
Iteration 0: log pseudolikelihood = -47246.278
Iteration 1: log pseudolikelihood = -46222.733
Iteration 2: log pseudolikelihood = -46205.947
Iteration 3: log pseudolikelihood = -46205.874
Iteration 4: log pseudolikelihood = -46205.874
Fitting constant-only model:
Iteration 0: log pseudolikelihood = -44087.015
Iteration 1: log pseudolikelihood = -43010.087
Iteration 2: log pseudolikelihood = -43002.914
Iteration 3: log pseudolikelihood = -43002.914
Fitting full model:
Iteration 0: log pseudolikelihood = -39978.186
Iteration 1: log pseudolikelihood = -38095.319
Iteration 2: log pseudolikelihood = -37879.505
Iteration 3: log pseudolikelihood = -37875.829
Iteration 4: log pseudolikelihood = -37875.829
Negative binomial regression Number of obs = 21,117
Wald chi2(196) = 11631.79
Dispersion = mean Prob > chi2 = 0.0000
Log pseudolikelihood = -37875.829 Pseudo R2 = 0.1192
-----------------------------------------------------------------------------------
| Robust
fw4 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
count | .0156753 .0010517 14.90 0.000 .013614 .0177367
shareint | -.3507485 .0451294 -7.77 0.000 -.4392006 -.2622965
recbreadth | .1487739 .0277731 5.36 0.000 .0943396 .2032082
teamsize | .0261769 .0046877 5.58 0.000 .0169891 .0353647
uniqueoffice | .109674 .004895 22.41 0.000 .10008 .119268
patentgrant | .2145301 .0243185 8.82 0.000 .1668666 .2621935
reusebeginofyear | .0082285 .0015958 5.16 0.000 .0051009 .0113562
medianage | -.0090597 .0027831 -3.26 0.001 -.0145146 -.0036049
timgap | -.0974457 .0127799 -7.62 0.000 -.1224938 -.0723975
|
c.timgap#c.timgap | .0028285 .000721 3.92 0.000 .0014152 .0042417
|
_cons | .8661081 .6416052 1.35 0.177 -.3914151 2.123631
------------------+----------------------------------------------------------------
/lnalpha | -.2696179 .0207389 -.3102654 -.2289704
------------------+----------------------------------------------------------------
alpha | .7636713 .0158377 .7332523 .7953521
-----------------------------------------------------------------------------------
Code:
margins, at(timgap=(0(1)58))
Predictive margins Number of obs = 21,117
Model VCE : Robust
Expression : Predicted number of events, predict()
1._at : timgap = 0
2._at : timgap = 1
3._at : timgap = 2
4._at : timgap = 3
5._at : timgap = 4
6._at : timgap = 5
7._at : timgap = 6
8._at : timgap = 7
9._at : timgap = 8
10._at : timgap = 9
11._at : timgap = 10
12._at : timgap = 11
13._at : timgap = 12
14._at : timgap = 13
15._at : timgap = 14
16._at : timgap = 15
17._at : timgap = 16
18._at : timgap = 17
19._at : timgap = 18
20._at : timgap = 19
21._at : timgap = 20
22._at : timgap = 21
23._at : timgap = 22
24._at : timgap = 23
25._at : timgap = 24
26._at : timgap = 25
27._at : timgap = 26
28._at : timgap = 27
29._at : timgap = 28
30._at : timgap = 29
31._at : timgap = 30
32._at : timgap = 31
33._at : timgap = 32
34._at : timgap = 33
35._at : timgap = 34
36._at : timgap = 35
37._at : timgap = 36
38._at : timgap = 37
39._at : timgap = 38
40._at : timgap = 39
41._at : timgap = 40
42._at : timgap = 41
43._at : timgap = 42
44._at : timgap = 43
45._at : timgap = 44
46._at : timgap = 45
47._at : timgap = 46
48._at : timgap = 47
49._at : timgap = 48
50._at : timgap = 49
51._at : timgap = 50
52._at : timgap = 51
53._at : timgap = 52
54._at : timgap = 53
55._at : timgap = 54
56._at : timgap = 55
57._at : timgap = 56
58._at : timgap = 57
59._at : timgap = 58
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | 2.907145 .0561051 51.82 0.000 2.797181 3.017109
2 | 2.644691 .0295555 89.48 0.000 2.586764 2.702619
3 | 2.41958 .0303876 79.62 0.000 2.360022 2.479139
4 | 2.226188 .0445835 49.93 0.000 2.138806 2.31357
5 | 2.059873 .0581709 35.41 0.000 1.94586 2.173886
6 | 1.916795 .0691658 27.71 0.000 1.781233 2.052358
7 | 1.793775 .0778237 23.05 0.000 1.641243 1.946306
8 | 1.688172 .0847155 19.93 0.000 1.522133 1.854212
9 | 1.5978 .0904177 17.67 0.000 1.420585 1.775016
10 | 1.520845 .0954568 15.93 0.000 1.333753 1.707937
11 | 1.455808 .1003065 14.51 0.000 1.259211 1.652405
12 | 1.401458 .1053954 13.30 0.000 1.194887 1.608029
13 | 1.356791 .1111151 12.21 0.000 1.139009 1.574573
14 | 1.320999 .1178276 11.21 0.000 1.090061 1.551937
15 | 1.293447 .1258707 10.28 0.000 1.046745 1.54015
16 | 1.273655 .1355661 9.40 0.000 1.007951 1.53936
17 | 1.261281 .1472277 8.57 0.000 .9727198 1.549842
18 | 1.256112 .1611748 7.79 0.000 .9402152 1.572009
19 | 1.258061 .1777473 7.08 0.000 .9096829 1.606439
20 | 1.267161 .197323 6.42 0.000 .8804155 1.653907
21 | 1.283568 .2203371 5.83 0.000 .8517152 1.715421
22 | 1.307563 .2473036 5.29 0.000 .8228568 1.792269
23 | 1.339563 .2788378 4.80 0.000 .7930509 1.886075
24 | 1.380131 .3156826 4.37 0.000 .7614046 1.998858
25 | 1.429995 .358739 3.99 0.000 .7268791 2.13311
26 | 1.490065 .4091018 3.64 0.000 .6882402 2.29189
27 | 1.561467 .4681041 3.34 0.001 .6439999 2.478934
28 | 1.645573 .5373717 3.06 0.002 .592344 2.698802
29 | 1.744047 .618891 2.82 0.005 .5310434 2.957052
30 | 1.858901 .7150945 2.60 0.009 .4573412 3.26046
31 | 1.992558 .828968 2.40 0.016 .3678102 3.617305
32 | 2.147941 .9641859 2.23 0.026 .2581713 4.037711
33 | 2.328577 1.125284 2.07 0.039 .1230608 4.534093
34 | 2.538724 1.317878 1.93 0.054 -.0442682 5.121717
35 | 2.783539 1.548941 1.80 0.072 -.2523303 5.819408
36 | 3.069275 1.827167 1.68 0.093 -.5119064 6.650457
37 | 3.403542 2.163423 1.57 0.116 -.8366882 7.643773
38 | 3.795625 2.571345 1.48 0.140 -1.244119 8.835368
39 | 4.256887 3.068105 1.39 0.165 -1.756489 10.27026
40 | 4.801288 3.675405 1.31 0.191 -2.402373 12.00495
41 | 5.446031 4.420767 1.23 0.218 -3.218514 14.11058
42 | 6.212399 5.339235 1.16 0.245 -4.25231 16.67711
43 | 7.126811 6.475586 1.10 0.271 -5.565105 19.81873
44 | 8.222199 7.887259 1.04 0.297 -7.236545 23.68094
45 | 9.53976 9.648215 0.99 0.323 -9.370393 28.44991
46 | 11.13124 11.85406 0.94 0.348 -12.10229 34.36478
47 | 13.06191 14.62887 0.89 0.372 -15.61014 41.73396
48 | 15.4144 18.13428 0.85 0.395 -20.12814 50.95694
49 | 18.29377 22.58176 0.81 0.418 -25.96567 62.55321
50 | 21.83416 28.24902 0.77 0.440 -33.53289 77.20122
51 | 26.20756 35.50228 0.74 0.460 -43.37562 95.79075
52 | 31.63542 44.82649 0.71 0.480 -56.22288 119.4937
53 | 38.40407 56.86643 0.68 0.499 -73.05209 149.8602
54 | 46.8854 72.48303 0.65 0.518 -95.17872 188.9495
55 | 57.56452 92.83061 0.62 0.535 -124.3801 239.5092
56 | 71.07697 119.4635 0.59 0.552 -163.0672 305.2211
57 | 88.25913 154.4835 0.57 0.568 -214.523 391.0413
58 | 110.2166 200.7452 0.55 0.583 -283.2368 503.6701
59 | 138.4177 262.1421 0.53 0.597 -375.3715 652.2068
------------------------------------------------------------------------------
The utest command returns the following:
Code:
generate timgapsq = timgap*timgap
quietly nbreg fw4 count shareint recb teamsize uniqueo patentgrant reuseb medianage timgap timgapsq i.pat_p i.pat_f, robust
. utest timgap timgapsq, prefix(fw4) fieller
(87 missing values generated)
Specification: f(x)=x^2
Extreme point: 17.2259
Test:
H1: U shape
vs. H0: Monotone or Inverse U shape
-------------------------------------------------
| Lower bound Upper bound
-----------------+-------------------------------
Interval | 0 58
Slope | -.0974457 .2306562
t-value | -7.62492 3.125554
P>|t| | 1.27e-14 .0008886
-------------------------------------------------
Overall test of presence of a U shape:
t-value = 3.13
P>|t| = .000889
95% Fieller interval for extreme point: [13.191711; 27.997666]
Code:
quietly nbreg fw4 count shareint recb teamsize uniqueo patentgrant reuseb medianage timgap c.timgap#c.timgap i.pat_p i.pat_f, robust
margins, dydx(timgap) at(timgap=(0(1)58))
Average marginal effects Number of obs = 21,117
Model VCE : Robust
Expression : Predicted number of events, predict()
dy/dx w.r.t. : timgap
1._at : timgap = 0
2._at : timgap = 1
3._at : timgap = 2
4._at : timgap = 3
5._at : timgap = 4
6._at : timgap = 5
7._at : timgap = 6
8._at : timgap = 7
9._at : timgap = 8
10._at : timgap = 9
11._at : timgap = 10
12._at : timgap = 11
13._at : timgap = 12
14._at : timgap = 13
15._at : timgap = 14
16._at : timgap = 15
17._at : timgap = 16
18._at : timgap = 17
19._at : timgap = 18
20._at : timgap = 19
21._at : timgap = 20
22._at : timgap = 21
23._at : timgap = 22
24._at : timgap = 23
25._at : timgap = 24
26._at : timgap = 25
27._at : timgap = 26
28._at : timgap = 27
29._at : timgap = 28
30._at : timgap = 29
31._at : timgap = 30
32._at : timgap = 31
33._at : timgap = 32
34._at : timgap = 33
35._at : timgap = 34
36._at : timgap = 35
37._at : timgap = 36
38._at : timgap = 37
39._at : timgap = 38
40._at : timgap = 39
41._at : timgap = 40
42._at : timgap = 41
43._at : timgap = 42
44._at : timgap = 43
45._at : timgap = 44
46._at : timgap = 45
47._at : timgap = 46
48._at : timgap = 47
49._at : timgap = 48
50._at : timgap = 49
51._at : timgap = 50
52._at : timgap = 51
53._at : timgap = 52
54._at : timgap = 53
55._at : timgap = 54
56._at : timgap = 55
57._at : timgap = 56
58._at : timgap = 57
59._at : timgap = 58
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
timgap |
_at |
1 | -.2832887 .041849 -6.77 0.000 -.3653113 -.2012661
2 | -.2427529 .031913 -7.61 0.000 -.3053013 -.1802045
3 | -.2084028 .0243079 -8.57 0.000 -.2560454 -.1607603
4 | -.1791522 .0185715 -9.65 0.000 -.2155518 -.1427527
5 | -.1541155 .014378 -10.72 0.000 -.1822958 -.1259352
6 | -.1325676 .0115064 -11.52 0.000 -.1551197 -.1100154
7 | -.1139121 .0097904 -11.64 0.000 -.1331008 -.0947233
8 | -.097656 .0090425 -10.80 0.000 -.115379 -.079933
9 | -.0833896 .0090162 -9.25 0.000 -.101061 -.0657182
10 | -.07077 .0094602 -7.48 0.000 -.0893116 -.0522283
11 | -.0595082 .0101912 -5.84 0.000 -.0794825 -.0395339
12 | -.0493586 .0111085 -4.44 0.000 -.071131 -.0275862
13 | -.0401102 .0121726 -3.30 0.001 -.063968 -.0162524
14 | -.0315793 .0133801 -2.36 0.018 -.0578037 -.0053548
15 | -.0236037 .0147491 -1.60 0.110 -.0525114 .005304
16 | -.0160375 .0163108 -0.98 0.325 -.0480062 .0159311
17 | -.0087468 .0181059 -0.48 0.629 -.0442336 .0267401
18 | -.0016052 .0201834 -0.08 0.937 -.0411639 .0379535
19 | .0055091 .0226016 0.24 0.807 -.0387893 .0498075
20 | .0127172 .0254295 0.50 0.617 -.0371237 .0625581
21 | .0201429 .0287488 0.70 0.484 -.0362037 .0764895
22 | .0279162 .0326574 0.85 0.393 -.0360912 .0919236
23 | .0361772 .0372734 0.97 0.332 -.0368773 .1092317
24 | .0450802 .0427395 1.05 0.292 -.0386876 .1288479
25 | .0547983 .0492295 1.11 0.266 -.0416899 .1512864
26 | .0655294 .0569563 1.15 0.250 -.0461029 .1771617
27 | .0775026 .0661807 1.17 0.242 -.0522093 .2072144
28 | .090986 .0772244 1.18 0.239 -.0603711 .2423431
29 | .1062968 .0904849 1.17 0.240 -.0710504 .2836439
30 | .1238125 .1064554 1.16 0.245 -.0848362 .3324613
31 | .1439865 .1257496 1.15 0.252 -.1024781 .3904512
32 | .1673656 .1491339 1.12 0.262 -.1249315 .4596627
33 | .1946132 .1775686 1.10 0.273 -.1534149 .5426413
34 | .2265379 .2122607 1.07 0.286 -.1894855 .6425612
35 | .2641297 .2547327 1.04 0.300 -.2351372 .7633966
36 | .3086058 .3069114 1.01 0.315 -.2929295 .9101412
37 | .3614689 .3712442 0.97 0.330 -.3661563 1.089094
38 | .4245811 .4508497 0.94 0.346 -.4590681 1.30823
39 | .5002591 .5497171 0.91 0.363 -.5771665 1.577685
40 | .5913963 .6729661 0.88 0.380 -.727593 1.910386
41 | .7016201 .8271925 0.85 0.396 -.9196473 2.322888
42 | .8354954 1.020924 0.82 0.413 -1.165478 2.836469
43 | .9987892 1.265225 0.79 0.430 -1.481006 3.478585
44 | 1.198815 1.574511 0.76 0.446 -1.887169 4.284799
45 | 1.444884 1.967625 0.73 0.463 -2.411591 5.301359
46 | 1.748898 2.469302 0.71 0.479 -3.090845 6.58864
47 | 2.126127 3.112124 0.68 0.494 -3.973524 8.225778
48 | 2.596246 3.939187 0.66 0.510 -5.124419 10.31691
49 | 3.184705 5.007713 0.64 0.525 -6.630232 12.99964
50 | 3.924555 6.393976 0.61 0.539 -8.607407 16.45652
51 | 4.858902 8.200058 0.59 0.553 -11.21292 20.93072
52 | 6.044189 10.56313 0.57 0.567 -14.65916 26.74754
53 | 7.55464 13.66824 0.55 0.580 -19.23462 34.3439
54 | 9.488271 17.76609 0.53 0.593 -25.33264 44.30918
55 | 11.97506 23.19769 0.52 0.606 -33.49158 57.4417
56 | 15.18811 30.42881 0.50 0.618 -44.45125 74.82747
57 | 19.35896 40.09829 0.48 0.629 -59.23224 97.95016
58 | 24.79866 53.08607 0.47 0.640 -79.24813 128.8454
59 | 31.92688 70.6093 0.45 0.651 -106.4648 170.3186
------------------------------------------------------------------------------
I also ran an additional check using OLS regression, and a log-transformed dependent variable, which produced results in line with a U-shaped relationship between ‘time gap’ and ‘forward cites’ (i.e. statistically significant linear and quadratic coefficients, predictive margins, average marginal effects). I don’t report these results for the sake of brevity.
So, in summary, my question is: How can I explain the discrepancies between the predicted probabilities and average marginal effects in my negative binomial regression?
I would hugely appreciate any help you could give me with this problem.

Comment