I have a network dataset featuring a set of individuals who can unilaterally send, either as individuals or as groups of individuals, messages to other individuals or to groups of individuals. Each observation corresponds to a message sent from an individual to another individual. The content of each message is labelled with one out of five categories. I also have an index for the "saliency" of each message. Groups of individuals are unfortunately endogenous. I know a bunch of demographic characteristics for each individual as well as the time evolution of the network.
My goal would be to estimate the probability for an individual to respond to a message as a function of past actions and the evolving structure of the network.
I understand there are several problems with a dataset of this sort: in particular the network is endogenous, and there are potentially many processes in place going on in parallel with the observed evolution of the communication structure. Even with exogenous regressors, standard errors would be inconsistent (but this could be fixed using Fafchamps and Gubert 2007, if I remember correctly)
In an ideal world, I would run a logit regression of a binary variable (call it"response" picking value of 1 if a message comes from an individual to another individual who contacted them before) on a bunch of dyadic-level measures (such as whether the two individuals live in the same area), a battery of lags to capture whether the receiver sent (and when) a message earlier, a dynamically-adjusting measure of network centrality and time polynomial to fit possible time trends.
It looks like the model R^2 undergoes a massive increment from about 0.16 to 0.50 once I introduce the time polynomials. Significance and values of pre-existing regressors is stable after introduction of polynomials. I am wondering: is that due to the fact that the longer the passing time, the more likely a message is received?
More in general, do I have any hope to extract anything informative from a regression of this sort?
thanks for your time!
My goal would be to estimate the probability for an individual to respond to a message as a function of past actions and the evolving structure of the network.
I understand there are several problems with a dataset of this sort: in particular the network is endogenous, and there are potentially many processes in place going on in parallel with the observed evolution of the communication structure. Even with exogenous regressors, standard errors would be inconsistent (but this could be fixed using Fafchamps and Gubert 2007, if I remember correctly)
In an ideal world, I would run a logit regression of a binary variable (call it"response" picking value of 1 if a message comes from an individual to another individual who contacted them before) on a bunch of dyadic-level measures (such as whether the two individuals live in the same area), a battery of lags to capture whether the receiver sent (and when) a message earlier, a dynamically-adjusting measure of network centrality and time polynomial to fit possible time trends.
It looks like the model R^2 undergoes a massive increment from about 0.16 to 0.50 once I introduce the time polynomials. Significance and values of pre-existing regressors is stable after introduction of polynomials. I am wondering: is that due to the fact that the longer the passing time, the more likely a message is received?
More in general, do I have any hope to extract anything informative from a regression of this sort?
HTML Code:
Logistic regression Number of obs = 152,160 Wald chi2(16) = . Prob > chi2 = . Log pseudolikelihood = -1213.6146 Pseudo R2 = 0.5037 -------------------------------------------------------------------------------------------------- | Robust response_bin | Coefficient std. err. z P>|z| [95% conf. interval] ---------------------------------+---------------------------------------------------------------- age_diff | -.0420463 .0069516 -6.05 0.000 -.0556712 -.0284215 neigh_match | -.0746072 .1194708 -0.62 0.532 -.3087657 .1595513 ethn_match | 1.008889 .1245479 8.10 0.000 .7647799 1.252999 sex_match | -.1762324 .1198478 -1.47 0.141 -.4111298 .0586649 eigen_cent_diff_std | -6.781157 1.462708 -4.64 0.000 -9.648013 -3.914302 eigen_cent_sum_std | 6.54639 1.155631 5.66 0.000 4.281394 8.811385 message_sent_1De| -2.260556 1.521529 -1.49 0.137 -5.242698 .7215858 message_sent_5De| 1.283659 .6192781 2.07 0.038 .0698965 2.497422 message_sent_30De| .8581493 .2525828 3.40 0.001 .363096 1.353203 trend | .0029432 .0017652 1.67 0.095 -.0005164 .0064029 trend2 | -3.21e-06 1.78e-06 -1.81 0.071 -6.70e-06 2.76e-07 trend3 | 1.54e-09 8.00e-10 1.93 0.054 -2.54e-11 3.11e-09 trend4 | -3.67e-13 1.76e-13 -2.08 0.038 -7.12e-13 -2.12e-14 trend5 | 4.23e-17 1.87e-17 2.27 0.023 5.73e-18 7.89e-17 trend6 | -1.88e-21 7.58e-22 -2.48 0.013 -3.37e-21 -3.97e-22 _cons | -5.214207 .6676472 -7.81 0.000 -6.522771 -3.905642 -------------------------------------------------------------------------------------------------- Note: 143336 failures and 0 successes completely determined.