Hello,
This is my first time posting here, so apologies for any potential mistakes.
Short background: I examine how characteristics of citations made by a patent (i.e. backward citations) influence the number of citations that this patent subsequently receives (i.e. forward citations). This approach is similar to examining how the references cited by a journal article influence the number of times this article is subsequently cited.
Analyses: I am testing the potential U-shaped impact of an independent variable (i.e. ‘time gap’ of backward cites) on a dependent variable (i.e. number of ‘forward citations’) running a negative binomial regression model. The dependent variable is a count variable (named fw4), with the following distribution:
The independent variable looks like this:
A sample of the data is as follows:
Important to note is that, here:
pat_finalid denotes the firm that applied for the patent
pat_doc denotes the unique patent identification number
pat_priy denotes the year of application of the patent
I excluded the coefficients of the dummies for pat_finalid and pat_priy to save space.
When I run the negative binomial regression I obtain the following:
Subsequently, the plot of the predicted probabilities of ‘time gap’ look like this:
I subsequently rely on utest from SSC in Stata 14.2. This allows testing for the statistical significance of the inflection point of a curvilinear relationship, as well as estimating the statistical significance and sign of the slope on both sides of the inflection point.
The utest command returns the following:
However, I am a bit confused why the average marginal effects return different results, showing that the average marginal effects beyond a value of 13 are not statistically significant (p>0.05):
I am not sure how to reconcile these results. On the one hand, the predicted probabilities (and utest) show evidence of a U-shaped relationship between gap and cites received, but then the average marginal effects return non-significant effects beyond a value of 13.
I also ran an additional check using OLS regression, and a log-transformed dependent variable, which produced results in line with a U-shaped relationship between ‘time gap’ and ‘forward cites’ (i.e. statistically significant linear and quadratic coefficients, predictive margins, average marginal effects). I don’t report these results for the sake of brevity.
So, in summary, my question is: How can I explain the discrepancies between the predicted probabilities and average marginal effects in my negative binomial regression?
I would hugely appreciate any help you could give me with this problem.
This is my first time posting here, so apologies for any potential mistakes.
Short background: I examine how characteristics of citations made by a patent (i.e. backward citations) influence the number of citations that this patent subsequently receives (i.e. forward citations). This approach is similar to examining how the references cited by a journal article influence the number of times this article is subsequently cited.
Analyses: I am testing the potential U-shaped impact of an independent variable (i.e. ‘time gap’ of backward cites) on a dependent variable (i.e. number of ‘forward citations’) running a negative binomial regression model. The dependent variable is a count variable (named fw4), with the following distribution:
Code:
sum fw4, detail Citations received ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 21,117 25% 0 0 Sum of Wgt. 21,117 50% 1 Mean 2.48165 Largest Std. Dev. 4.22759 75% 3 52 90% 7 70 Variance 17.87252 95% 10 81 Skewness 4.294934 99% 20 85 Kurtosis 37.63707
Code:
sum timgap, detail Gap ------------------------------------------------------------- Percentiles Smallest 1% 1 0 5% 1 0 10% 1 0 Obs 21,117 25% 1 0 Sum of Wgt. 21,117 50% 1 Mean 1.724582 Largest Std. Dev. 1.765048 75% 2 28 90% 3 38 Variance 3.115394 95% 4 45.5 Skewness 7.354439 99% 9.5 58 Kurtosis 113.7121
A sample of the data is as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(fw4 count) float(reusebeginofyear medianage timgap) byte teamsize float(patentgrant uniqueoffice shareint recbreadth pat_finalid) int pat_priy long pat_doc 1 8 7.875 5 1.5 1 1 1 0 .768595 28 1967 24645379 0 8 7.375 3 1 3 1 3 0 .9469435 4 2007 39709822 0 14 12.285714 4 1 2 1 3 .4285714 0 18 2003 34701225 1 10 9.9 3.5 1 4 1 5 .2 .43055555 1 2006 38535557 0 10 6.5 9 4 2 0 3 .2 .9921035 39 2007 39710045 3 8 1.375 2 1 3 1 2 0 .607438 17 2004 35060906 0 10 2.8 5 1 2 1 1 .2 .2520661 5 2004 35183594 3 9 20.333334 12 1 5 0 1 .11111111 .9238535 121 2004 36615448 1 8 .25 8.5 8.5 3 0 1 0 .838843 112 2000 18843534 3 13 4.923077 2 2 4 1 5 .3846154 .52076125 2 2002 32510676 3 8 .375 12.5 11 2 1 1 0 .9746667 23 1999 17011054 1 8 7.25 5 1.5 4 0 4 0 .9153979 1 2007 40381446 2 8 3.5 2 1 3 1 2 0 .9693205 116 2003 34102979 0 8 11.25 8.5 1 2 0 1 0 .934375 10 1995 7771531 1 8 .625 5 1.5 1 1 1 0 .505 118 2000 18685685 3 9 40.22222 9 1 3 1 3 .11111111 .9566575 17 2003 34394218 0 8 19.625 4 1 1 1 4 0 .607438 1 2002 32473712 3 9 12.444445 13 1 5 1 4 .11111111 .9823909 21 1997 17434323 1 8 4.75 7 1.5 1 0 1 0 .46 1 2004 35183815 6 8 6.625 6 1 3 1 2 0 0 10 2000 7628904 2 8 14 14.5 3 4 1 5 0 .9738292 81 2003 33427484 end
pat_finalid denotes the firm that applied for the patent
pat_doc denotes the unique patent identification number
pat_priy denotes the year of application of the patent
I excluded the coefficients of the dummies for pat_finalid and pat_priy to save space.
When I run the negative binomial regression I obtain the following:
Code:
nbreg fw4 count shareint recb teamsize uniqueo patentgrant reuseb medianage timgap c.timgap#c.timgap i.pat_p i.pat_f, robust Fitting Poisson model: Iteration 0: log pseudolikelihood = -47246.278 Iteration 1: log pseudolikelihood = -46222.733 Iteration 2: log pseudolikelihood = -46205.947 Iteration 3: log pseudolikelihood = -46205.874 Iteration 4: log pseudolikelihood = -46205.874 Fitting constant-only model: Iteration 0: log pseudolikelihood = -44087.015 Iteration 1: log pseudolikelihood = -43010.087 Iteration 2: log pseudolikelihood = -43002.914 Iteration 3: log pseudolikelihood = -43002.914 Fitting full model: Iteration 0: log pseudolikelihood = -39978.186 Iteration 1: log pseudolikelihood = -38095.319 Iteration 2: log pseudolikelihood = -37879.505 Iteration 3: log pseudolikelihood = -37875.829 Iteration 4: log pseudolikelihood = -37875.829 Negative binomial regression Number of obs = 21,117 Wald chi2(196) = 11631.79 Dispersion = mean Prob > chi2 = 0.0000 Log pseudolikelihood = -37875.829 Pseudo R2 = 0.1192 ----------------------------------------------------------------------------------- | Robust fw4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------+---------------------------------------------------------------- count | .0156753 .0010517 14.90 0.000 .013614 .0177367 shareint | -.3507485 .0451294 -7.77 0.000 -.4392006 -.2622965 recbreadth | .1487739 .0277731 5.36 0.000 .0943396 .2032082 teamsize | .0261769 .0046877 5.58 0.000 .0169891 .0353647 uniqueoffice | .109674 .004895 22.41 0.000 .10008 .119268 patentgrant | .2145301 .0243185 8.82 0.000 .1668666 .2621935 reusebeginofyear | .0082285 .0015958 5.16 0.000 .0051009 .0113562 medianage | -.0090597 .0027831 -3.26 0.001 -.0145146 -.0036049 timgap | -.0974457 .0127799 -7.62 0.000 -.1224938 -.0723975 | c.timgap#c.timgap | .0028285 .000721 3.92 0.000 .0014152 .0042417 | _cons | .8661081 .6416052 1.35 0.177 -.3914151 2.123631 ------------------+---------------------------------------------------------------- /lnalpha | -.2696179 .0207389 -.3102654 -.2289704 ------------------+---------------------------------------------------------------- alpha | .7636713 .0158377 .7332523 .7953521 -----------------------------------------------------------------------------------
Code:
margins, at(timgap=(0(1)58)) Predictive margins Number of obs = 21,117 Model VCE : Robust Expression : Predicted number of events, predict() 1._at : timgap = 0 2._at : timgap = 1 3._at : timgap = 2 4._at : timgap = 3 5._at : timgap = 4 6._at : timgap = 5 7._at : timgap = 6 8._at : timgap = 7 9._at : timgap = 8 10._at : timgap = 9 11._at : timgap = 10 12._at : timgap = 11 13._at : timgap = 12 14._at : timgap = 13 15._at : timgap = 14 16._at : timgap = 15 17._at : timgap = 16 18._at : timgap = 17 19._at : timgap = 18 20._at : timgap = 19 21._at : timgap = 20 22._at : timgap = 21 23._at : timgap = 22 24._at : timgap = 23 25._at : timgap = 24 26._at : timgap = 25 27._at : timgap = 26 28._at : timgap = 27 29._at : timgap = 28 30._at : timgap = 29 31._at : timgap = 30 32._at : timgap = 31 33._at : timgap = 32 34._at : timgap = 33 35._at : timgap = 34 36._at : timgap = 35 37._at : timgap = 36 38._at : timgap = 37 39._at : timgap = 38 40._at : timgap = 39 41._at : timgap = 40 42._at : timgap = 41 43._at : timgap = 42 44._at : timgap = 43 45._at : timgap = 44 46._at : timgap = 45 47._at : timgap = 46 48._at : timgap = 47 49._at : timgap = 48 50._at : timgap = 49 51._at : timgap = 50 52._at : timgap = 51 53._at : timgap = 52 54._at : timgap = 53 55._at : timgap = 54 56._at : timgap = 55 57._at : timgap = 56 58._at : timgap = 57 59._at : timgap = 58 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | 2.907145 .0561051 51.82 0.000 2.797181 3.017109 2 | 2.644691 .0295555 89.48 0.000 2.586764 2.702619 3 | 2.41958 .0303876 79.62 0.000 2.360022 2.479139 4 | 2.226188 .0445835 49.93 0.000 2.138806 2.31357 5 | 2.059873 .0581709 35.41 0.000 1.94586 2.173886 6 | 1.916795 .0691658 27.71 0.000 1.781233 2.052358 7 | 1.793775 .0778237 23.05 0.000 1.641243 1.946306 8 | 1.688172 .0847155 19.93 0.000 1.522133 1.854212 9 | 1.5978 .0904177 17.67 0.000 1.420585 1.775016 10 | 1.520845 .0954568 15.93 0.000 1.333753 1.707937 11 | 1.455808 .1003065 14.51 0.000 1.259211 1.652405 12 | 1.401458 .1053954 13.30 0.000 1.194887 1.608029 13 | 1.356791 .1111151 12.21 0.000 1.139009 1.574573 14 | 1.320999 .1178276 11.21 0.000 1.090061 1.551937 15 | 1.293447 .1258707 10.28 0.000 1.046745 1.54015 16 | 1.273655 .1355661 9.40 0.000 1.007951 1.53936 17 | 1.261281 .1472277 8.57 0.000 .9727198 1.549842 18 | 1.256112 .1611748 7.79 0.000 .9402152 1.572009 19 | 1.258061 .1777473 7.08 0.000 .9096829 1.606439 20 | 1.267161 .197323 6.42 0.000 .8804155 1.653907 21 | 1.283568 .2203371 5.83 0.000 .8517152 1.715421 22 | 1.307563 .2473036 5.29 0.000 .8228568 1.792269 23 | 1.339563 .2788378 4.80 0.000 .7930509 1.886075 24 | 1.380131 .3156826 4.37 0.000 .7614046 1.998858 25 | 1.429995 .358739 3.99 0.000 .7268791 2.13311 26 | 1.490065 .4091018 3.64 0.000 .6882402 2.29189 27 | 1.561467 .4681041 3.34 0.001 .6439999 2.478934 28 | 1.645573 .5373717 3.06 0.002 .592344 2.698802 29 | 1.744047 .618891 2.82 0.005 .5310434 2.957052 30 | 1.858901 .7150945 2.60 0.009 .4573412 3.26046 31 | 1.992558 .828968 2.40 0.016 .3678102 3.617305 32 | 2.147941 .9641859 2.23 0.026 .2581713 4.037711 33 | 2.328577 1.125284 2.07 0.039 .1230608 4.534093 34 | 2.538724 1.317878 1.93 0.054 -.0442682 5.121717 35 | 2.783539 1.548941 1.80 0.072 -.2523303 5.819408 36 | 3.069275 1.827167 1.68 0.093 -.5119064 6.650457 37 | 3.403542 2.163423 1.57 0.116 -.8366882 7.643773 38 | 3.795625 2.571345 1.48 0.140 -1.244119 8.835368 39 | 4.256887 3.068105 1.39 0.165 -1.756489 10.27026 40 | 4.801288 3.675405 1.31 0.191 -2.402373 12.00495 41 | 5.446031 4.420767 1.23 0.218 -3.218514 14.11058 42 | 6.212399 5.339235 1.16 0.245 -4.25231 16.67711 43 | 7.126811 6.475586 1.10 0.271 -5.565105 19.81873 44 | 8.222199 7.887259 1.04 0.297 -7.236545 23.68094 45 | 9.53976 9.648215 0.99 0.323 -9.370393 28.44991 46 | 11.13124 11.85406 0.94 0.348 -12.10229 34.36478 47 | 13.06191 14.62887 0.89 0.372 -15.61014 41.73396 48 | 15.4144 18.13428 0.85 0.395 -20.12814 50.95694 49 | 18.29377 22.58176 0.81 0.418 -25.96567 62.55321 50 | 21.83416 28.24902 0.77 0.440 -33.53289 77.20122 51 | 26.20756 35.50228 0.74 0.460 -43.37562 95.79075 52 | 31.63542 44.82649 0.71 0.480 -56.22288 119.4937 53 | 38.40407 56.86643 0.68 0.499 -73.05209 149.8602 54 | 46.8854 72.48303 0.65 0.518 -95.17872 188.9495 55 | 57.56452 92.83061 0.62 0.535 -124.3801 239.5092 56 | 71.07697 119.4635 0.59 0.552 -163.0672 305.2211 57 | 88.25913 154.4835 0.57 0.568 -214.523 391.0413 58 | 110.2166 200.7452 0.55 0.583 -283.2368 503.6701 59 | 138.4177 262.1421 0.53 0.597 -375.3715 652.2068 ------------------------------------------------------------------------------
The utest command returns the following:
Code:
generate timgapsq = timgap*timgap quietly nbreg fw4 count shareint recb teamsize uniqueo patentgrant reuseb medianage timgap timgapsq i.pat_p i.pat_f, robust . utest timgap timgapsq, prefix(fw4) fieller (87 missing values generated) Specification: f(x)=x^2 Extreme point: 17.2259 Test: H1: U shape vs. H0: Monotone or Inverse U shape ------------------------------------------------- | Lower bound Upper bound -----------------+------------------------------- Interval | 0 58 Slope | -.0974457 .2306562 t-value | -7.62492 3.125554 P>|t| | 1.27e-14 .0008886 ------------------------------------------------- Overall test of presence of a U shape: t-value = 3.13 P>|t| = .000889 95% Fieller interval for extreme point: [13.191711; 27.997666]
Code:
quietly nbreg fw4 count shareint recb teamsize uniqueo patentgrant reuseb medianage timgap c.timgap#c.timgap i.pat_p i.pat_f, robust margins, dydx(timgap) at(timgap=(0(1)58)) Average marginal effects Number of obs = 21,117 Model VCE : Robust Expression : Predicted number of events, predict() dy/dx w.r.t. : timgap 1._at : timgap = 0 2._at : timgap = 1 3._at : timgap = 2 4._at : timgap = 3 5._at : timgap = 4 6._at : timgap = 5 7._at : timgap = 6 8._at : timgap = 7 9._at : timgap = 8 10._at : timgap = 9 11._at : timgap = 10 12._at : timgap = 11 13._at : timgap = 12 14._at : timgap = 13 15._at : timgap = 14 16._at : timgap = 15 17._at : timgap = 16 18._at : timgap = 17 19._at : timgap = 18 20._at : timgap = 19 21._at : timgap = 20 22._at : timgap = 21 23._at : timgap = 22 24._at : timgap = 23 25._at : timgap = 24 26._at : timgap = 25 27._at : timgap = 26 28._at : timgap = 27 29._at : timgap = 28 30._at : timgap = 29 31._at : timgap = 30 32._at : timgap = 31 33._at : timgap = 32 34._at : timgap = 33 35._at : timgap = 34 36._at : timgap = 35 37._at : timgap = 36 38._at : timgap = 37 39._at : timgap = 38 40._at : timgap = 39 41._at : timgap = 40 42._at : timgap = 41 43._at : timgap = 42 44._at : timgap = 43 45._at : timgap = 44 46._at : timgap = 45 47._at : timgap = 46 48._at : timgap = 47 49._at : timgap = 48 50._at : timgap = 49 51._at : timgap = 50 52._at : timgap = 51 53._at : timgap = 52 54._at : timgap = 53 55._at : timgap = 54 56._at : timgap = 55 57._at : timgap = 56 58._at : timgap = 57 59._at : timgap = 58 ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- timgap | _at | 1 | -.2832887 .041849 -6.77 0.000 -.3653113 -.2012661 2 | -.2427529 .031913 -7.61 0.000 -.3053013 -.1802045 3 | -.2084028 .0243079 -8.57 0.000 -.2560454 -.1607603 4 | -.1791522 .0185715 -9.65 0.000 -.2155518 -.1427527 5 | -.1541155 .014378 -10.72 0.000 -.1822958 -.1259352 6 | -.1325676 .0115064 -11.52 0.000 -.1551197 -.1100154 7 | -.1139121 .0097904 -11.64 0.000 -.1331008 -.0947233 8 | -.097656 .0090425 -10.80 0.000 -.115379 -.079933 9 | -.0833896 .0090162 -9.25 0.000 -.101061 -.0657182 10 | -.07077 .0094602 -7.48 0.000 -.0893116 -.0522283 11 | -.0595082 .0101912 -5.84 0.000 -.0794825 -.0395339 12 | -.0493586 .0111085 -4.44 0.000 -.071131 -.0275862 13 | -.0401102 .0121726 -3.30 0.001 -.063968 -.0162524 14 | -.0315793 .0133801 -2.36 0.018 -.0578037 -.0053548 15 | -.0236037 .0147491 -1.60 0.110 -.0525114 .005304 16 | -.0160375 .0163108 -0.98 0.325 -.0480062 .0159311 17 | -.0087468 .0181059 -0.48 0.629 -.0442336 .0267401 18 | -.0016052 .0201834 -0.08 0.937 -.0411639 .0379535 19 | .0055091 .0226016 0.24 0.807 -.0387893 .0498075 20 | .0127172 .0254295 0.50 0.617 -.0371237 .0625581 21 | .0201429 .0287488 0.70 0.484 -.0362037 .0764895 22 | .0279162 .0326574 0.85 0.393 -.0360912 .0919236 23 | .0361772 .0372734 0.97 0.332 -.0368773 .1092317 24 | .0450802 .0427395 1.05 0.292 -.0386876 .1288479 25 | .0547983 .0492295 1.11 0.266 -.0416899 .1512864 26 | .0655294 .0569563 1.15 0.250 -.0461029 .1771617 27 | .0775026 .0661807 1.17 0.242 -.0522093 .2072144 28 | .090986 .0772244 1.18 0.239 -.0603711 .2423431 29 | .1062968 .0904849 1.17 0.240 -.0710504 .2836439 30 | .1238125 .1064554 1.16 0.245 -.0848362 .3324613 31 | .1439865 .1257496 1.15 0.252 -.1024781 .3904512 32 | .1673656 .1491339 1.12 0.262 -.1249315 .4596627 33 | .1946132 .1775686 1.10 0.273 -.1534149 .5426413 34 | .2265379 .2122607 1.07 0.286 -.1894855 .6425612 35 | .2641297 .2547327 1.04 0.300 -.2351372 .7633966 36 | .3086058 .3069114 1.01 0.315 -.2929295 .9101412 37 | .3614689 .3712442 0.97 0.330 -.3661563 1.089094 38 | .4245811 .4508497 0.94 0.346 -.4590681 1.30823 39 | .5002591 .5497171 0.91 0.363 -.5771665 1.577685 40 | .5913963 .6729661 0.88 0.380 -.727593 1.910386 41 | .7016201 .8271925 0.85 0.396 -.9196473 2.322888 42 | .8354954 1.020924 0.82 0.413 -1.165478 2.836469 43 | .9987892 1.265225 0.79 0.430 -1.481006 3.478585 44 | 1.198815 1.574511 0.76 0.446 -1.887169 4.284799 45 | 1.444884 1.967625 0.73 0.463 -2.411591 5.301359 46 | 1.748898 2.469302 0.71 0.479 -3.090845 6.58864 47 | 2.126127 3.112124 0.68 0.494 -3.973524 8.225778 48 | 2.596246 3.939187 0.66 0.510 -5.124419 10.31691 49 | 3.184705 5.007713 0.64 0.525 -6.630232 12.99964 50 | 3.924555 6.393976 0.61 0.539 -8.607407 16.45652 51 | 4.858902 8.200058 0.59 0.553 -11.21292 20.93072 52 | 6.044189 10.56313 0.57 0.567 -14.65916 26.74754 53 | 7.55464 13.66824 0.55 0.580 -19.23462 34.3439 54 | 9.488271 17.76609 0.53 0.593 -25.33264 44.30918 55 | 11.97506 23.19769 0.52 0.606 -33.49158 57.4417 56 | 15.18811 30.42881 0.50 0.618 -44.45125 74.82747 57 | 19.35896 40.09829 0.48 0.629 -59.23224 97.95016 58 | 24.79866 53.08607 0.47 0.640 -79.24813 128.8454 59 | 31.92688 70.6093 0.45 0.651 -106.4648 170.3186 ------------------------------------------------------------------------------
I also ran an additional check using OLS regression, and a log-transformed dependent variable, which produced results in line with a U-shaped relationship between ‘time gap’ and ‘forward cites’ (i.e. statistically significant linear and quadratic coefficients, predictive margins, average marginal effects). I don’t report these results for the sake of brevity.
So, in summary, my question is: How can I explain the discrepancies between the predicted probabilities and average marginal effects in my negative binomial regression?
I would hugely appreciate any help you could give me with this problem.
Comment