2 waves of data in one file, how do i split them up

thomas patrick

Join Date: Nov 2016

Posts: 8
#1

2 waves of data in one file, how do i split them up

10 Dec 2016, 10:17

I have 2 waves of data (year 1 and year 2) and I want to use the year 2 dependent variable (DV), with the year 1 independent variables (IV's).

How do I separate these out?

I currently I have run the following

encode started, generate (wave)
replace wave = 1 in 1/411
replace wave = 2 in 412/764
tab wave, gen(wave_)

So now I have 2 waves, but I don't know how to pick the wave 1 IVS and wave 2 DV

Thank you for your input

Kind Regards.
Thomas
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#2

10 Dec 2016, 11:17

It sounds like you have separate observations for wave 1 and wave 2, with a variable wave that indicates which is which. Now you want to do some kind of estimation command with wave 1's values as the IV and wave 2's values as the DV. In order for this to even be possible, there must be some other variable which tracks which wave 1 observation corresponds to the same entity as which wave 2 observation. I'll assume you have such a variable, and I'll call it id herer. This is one of the uncommon situations in Stata where wide layout will be better than the current long. You don't say what the name of this variable is. I'll just call it x here for illustrative purposes. And, again, to illustrate, I'll assume your analysis ultimately will be ordinary linear regression.

Code:

reshape wide x, i(id) j(wave) //... regress x2 x1 // and perhaps other variables

Note: In order for the -reshape- to go through, the observations having the same values of id will have to agree on all variables other than x and wave. But if some of those variables are time-varying this will not be the case. In that case, you have to decide how to proceed. You are trying to regress the wave 2 value of x against the wave 1 value. What role do those other variables play in the analysis? If they are going to be included, which value is to be included, the wave 1 version, or the wave 2 version, or perhaps both? You will have to manage your data accordingly. The possibilities here are numerous and it isn't possible to go through them all here. But these are issues you need to think about in order to proceed.
Comment
thomas patrick

Join Date: Nov 2016

Posts: 8
#3

10 Dec 2016, 14:09

Thank you Clyde for your feedback. I don't think I explained the set-up well . . . so let me try this again

I have a survey on teamwork measures that was taken 2 times by respondents with a 1 year time in between (not all of the respondent match, 411 for year 1, and 352 for year 2.

I am doing HLM regression since the individual respondents are nested in teams, but the basic model has Coordination as the dependent variable.

All four of these variables come from the same respondents at year 1, and at year 2. My initial runs consisted of just using the year 1 data, so a cross sectional design. A colleague
suggested a sensitivity analysis where I use Coordination from year 2, and use Leadership Solidarity and Participation from year 1 to see if the results hold.

Coordination (DV) = Leadership(IV1) Solidarity(IV2) Participation(IV3)

My challenge is that all of the data (year 1 and year 2) is on one spreadsheet, so 411 + 352 = 763 individuals where year one is at the top with all the variables in rows, and year 2 is at the bottom, with all of the variables running in rows, some of the respondents may not be the same but there is probably some overlap.

What I want to ask STATA is to use Coordination from Year 2, and L S and P from year 2 as a sensitivity analysis.

Does this make more sense?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#4

10 Dec 2016, 14:26

It would have been helpful if you had posted a sample of the Stata data using -dataex-. As it is, you are leaving much to my imagination here. But I'll take a stab at it.

I assume there is a variable, called id, that identifies respondents. In particular if the same person is responding in both years, the same id is used in both of those observations. Evidently there is some team identifier as well. I'll call that one team.

Code:

by id, sort: keep if _N == 2 // ELIMINATE PEOPLE NOT SURVEYED BOTH YEARS by id (team), sort: assert team[1] == team[_N] // VERIFY SAME TEAM EACH YEAR sort id year collapse (last) Coordination (first) Leadership Solidarity Participation team, by(id) xtset team xtreg Coordination Leadership Solidarity Participation, fe // OR re IF YOU PREFER
Comment

Announcement

2 waves of data in one file, how do i split them up

Comment

Comment

Comment