I am a Stata novice so please bare with me. Any help would be greatly appreciated.
I have a collection of research articles, the number of citations each paper has received, when it was released, the number of authors who have worked on the paper, and where these authors are from.
I am regressing the number of citations (CIT) received against the number of authors (AUT), and the quality of the universities and research facilities, each one I have selected the top 5 universities and top 5 non academic research affiliates in the subject. I have a dummy variable for each of these as to whether a top 5 university or top 5 affiliate was involved (UNID and AFFD).
I then have to control the skewness of this data as papers tend to recieve most of their citations within the first 4-5 years. To do this I will use Ln on the citations and authors, a use the i. command to control for year.
So the full command I use is: reg lnCIT lnAUT UNID AFFD i.year, robust
When I do this I get P-values of near zero, and an R-Squared of about 0.38, which seems like quite good results.
We decided to go deeper, I changed the dummy variables to count how many top 5 universities and top 5 affiliates worked on each paper (UNI and AFF)
Using the same form of regression: reg lnCIT lnAUT UNI AFF i.year, robust
This gave me slightly higher p-values, still less than 0.05, and about the same R-squared.
I then figured it would be best to use the Ln on both these new variables, as they are no longer Dummies.
reg lnCIT lnAUT lnUNI lnAFF i.year, robust
This then gave me a r-squared of almost 0.5, but all the p values shot up to much more the 0.05.
Can anyone help me in interpreting this, or point out anything I've missed. From everything I've learnt in Econometrics this should not be the case, but it is fair to say this is not my strongest subject.
Many thanks in advanced
I have a collection of research articles, the number of citations each paper has received, when it was released, the number of authors who have worked on the paper, and where these authors are from.
I am regressing the number of citations (CIT) received against the number of authors (AUT), and the quality of the universities and research facilities, each one I have selected the top 5 universities and top 5 non academic research affiliates in the subject. I have a dummy variable for each of these as to whether a top 5 university or top 5 affiliate was involved (UNID and AFFD).
I then have to control the skewness of this data as papers tend to recieve most of their citations within the first 4-5 years. To do this I will use Ln on the citations and authors, a use the i. command to control for year.
So the full command I use is: reg lnCIT lnAUT UNID AFFD i.year, robust
When I do this I get P-values of near zero, and an R-Squared of about 0.38, which seems like quite good results.
We decided to go deeper, I changed the dummy variables to count how many top 5 universities and top 5 affiliates worked on each paper (UNI and AFF)
Using the same form of regression: reg lnCIT lnAUT UNI AFF i.year, robust
This gave me slightly higher p-values, still less than 0.05, and about the same R-squared.
I then figured it would be best to use the Ln on both these new variables, as they are no longer Dummies.
reg lnCIT lnAUT lnUNI lnAFF i.year, robust
This then gave me a r-squared of almost 0.5, but all the p values shot up to much more the 0.05.
Can anyone help me in interpreting this, or point out anything I've missed. From everything I've learnt in Econometrics this should not be the case, but it is fair to say this is not my strongest subject.
Many thanks in advanced
Comment