Issue with data collection time frame: need a suggestion on a proper test or justification.

Anton Ivanov

Join Date: Sep 2014

Posts: 267
#1

Issue with data collection time frame: need a suggestion on a proper test or justification.

22 Oct 2014, 17:21

Dear community members,

Please help me to address the following issue. I will keep my example simple:

1. DV - rating score (0-100) - data for which was collected in March 2014, the rating itself was published in May 2014.
2. IV - number of Twitter followers (0 - to infinity) - data collected in the end of July 2014.

I am afraid that the reviews may point out that IV data was collected 3 months after DV data was published, and thus the results are biased.

Please suggest a proper way to justify this, or test somehow.

My initial idea was to: (A) re-collect IV data (which I did today for 20% randomly selected entities), and (B) run a paired t-test to see if there any significant increase (or decrease) over time. The null was rejected, that is increase in mean over time is significant. However, I am not sure that this is the correct approach, because really there are many possible factors that could impact this significant increase.

Thank you in advance,
Anton
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

22 Oct 2014, 18:50

am afraid that the reviews may point out that IV data was collected 3 months after DV data was published, and thus the results are biased.

And I am afraid that the reviewers would be absolutely correct. There's no defensible way that I can see of using IV data collected after the DV data. The only "proper" analysis with the current data is to switch the roles of the current DV and IV and predict the later data from the earlier.

But since you've already started sampling IV data prior to March 2014, I suggest that you continue and do a truly proper analysis of your study question.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anton Ivanov

Join Date: Sep 2014

Posts: 267
#3

22 Oct 2014, 21:29

Thank you for the comment, Mr. Samuels.

My data set consists from secondary data collected from six different sources. It is close to impossible to match all that data time-wise. So, I'd still call it a "fundamental limitation", rather than a fatal error. Time to start looking for citations on similar situations, I guess.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#4

23 Oct 2014, 03:08

In principle you could relate your response (your "DV") to the previous year's predictor (your "IV").
Comment
daniel klein

Join Date: Mar 2014

Posts: 3890
#5

23 Oct 2014, 03:20

Any chance you can get information on when these followers started following?

Otherwise you might have come up with an IV (instrument variable) for your IV (independent variable).

Best
Daniel
Comment
Anton Ivanov

Join Date: Sep 2014

Posts: 267
#6

24 Oct 2014, 15:56

Thank you for responses, Daniel and Nick.

As a follow-up, I have contacted the rating organization(DV), and they told me that the data was actually released on July 14th, which, I assume matches a time frame with my data collection for IVs, which occurred on the 20th of July.
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#7

24 Oct 2014, 16:52

I don't think the data release date matters unless you're hypothesizing that something about the release of these ratings affected something else (in which case the ratings would be an explanatory variable not an outcome and your problem disappears). If I collect data in 2010 and wait 3 years to release it, it's still data that reflects the state of things in 2010. It doesn't magically become 2013 data.

Basically it sounds like you're going to try to argue that a measurement taken after your outcome somehow explains your outcome. That's logically impossible.

If your explanatory measures (taken after the outcome measurement) were the result of some slow-moving process you might be able to argue that the explanatory variables measured 5 months after your outcome variable were a good proxy for what was actually happening at or before the time of the outcome measurement. In the context of anything related to the internet, however, you won't be able convince a reviewer that the process is slow-moving enough that those 5 months don't matter.

You really need to rethink this analysis.
Comment
Anton Ivanov

Join Date: Sep 2014

Posts: 267
#8

24 Oct 2014, 18:07

Sarah, thank you for the comment.

My DV is an authoritative rating score, which consists of multiple components. The data for some components is indeed lagged, but not more than 3 months. The calculation of the rating score and corresponding ranking occurred on the 14th of July. The IVs of the study are dynamic in nature and I collected data on them as close as possible to the date DV was published. The assumption (supported in literature) is that mean change in the IV's is insignificant over such short period (5-7 days). Hence, I don't consider DV preceding IV's in terms of time. Or am I wrong? Thank you for comments.

Last edited by Anton Ivanov; 24 Oct 2014, 18:36.
Comment

Announcement

Issue with data collection time frame: need a suggestion on a proper test or justification.

Comment

Comment

Comment

Comment

Comment

Comment

Comment