Finding change in a variable measured at three time points between two groups

Jake Pascoe

Join Date: Jan 2017

Posts: 6
#1

Finding change in a variable measured at three time points between two groups

02 Jan 2017, 13:40

Hi all, i'm new here and will try to include as much detail as possible if there is anything important omitted then let me know and I shall edit. Thanks for any help in advance.

I have a longitudinal panel dataset obtained via surveys (questionnaires, interviews etc.). The data were obtained from children at ages 3 years, 5 years, and 7 years, and while the children remained the same throughout some dropped out and so the number of observations decreased as age increased. There were for example 15000 respondents at age 3, 14000 at age 5 and so on.
There are three dependent variables, those being the children's emotional dysregulation score at age 3, 5 and 7. These are scores ranging from 1 to 3, though the child could have scored 1.1 etc. (not all integers). The independent variable is language ability with two levels measured binary (0 = child has language delay, 1 = child has capable language). There is not an equal quantity of observations for each group (there are 14000 for language delayed, and 1000 for capable language). My codes to transform data to long form and to create a time variable are as follows;

gen id_num = _n
order id_num, a(MCSID)
reshape long emotdysreg, i(MCSID) j(time)
gen age = 0 if time == 1
replace age = 2 if time == 2
replace age = 4 if time == 3
gen age2 = age*age
xtset id_num time

I would like to compare the changes in emotional dysregulation over the three ages between groups of the IV, i.e. does language delay cause a change in emotional state over time and is this change larger than the change observed in children with capable language.
Tags: panel data, regression, Suggestion, Time Series
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

02 Jan 2017, 14:19

So, although you do not say so explicitly, I infer from your code that the language ability variable is only measured once for each child. There is no hope of establishing causality here unless the language variable measurement precedes all of the emotional dysregulation measurements. Even if that much is true, this is clearly observational data, so causal conclusions will be tenuous and must rely on information external to just this study.

So let's forget about causality and just talk about associations. Perhaps we can rephrase your question as: does emotional disregulation change over time, and, if so, does it change more in children with language delay.

Now there are three other aspects of your data that confuses me. 1. In your original description you say that the data were obtained at ages 3, 5, and 7, yet your age variable takes on values of 0, 2, and 4. What am I missing here? 2. You also appear to be interested in a quadratic model as you have generated an age square variable. But you have only three time points, and unless you expect that emotional dysregulation will exhibit a U-shaped, or inverted-U-shaped relationship with the nadir (or peak) at age 5, it is very unlikely that even a huge data set such as yours will meaningfully distinguish a quadratic from a linear model. Nevertheless, in the solution I show below, I include the quadratic. 3. You have an id_num and another variable MCSID. I don't understand the roles of these variables. On the one hand you -xtset- your data with id_num, but on the other hand you -reshape-d with MCSID as the i() variable. Assuming that the code ran without error messages, each of these variables uniquely identifies the data. So why do you need both? Is MCSID perhaps a string, so not usable in -xtset-?

Anyway, here's how I would approach this

Code:

xtreg emotdysreg i.language_delay##c.age##c.age, fe // OR re IF YOU PREFER (WHICH I OFTEN DO) margins language_delay, at(age = (0 2 4)) marginsplot margins language_delay, dydx(age) at(age = (0 2 4))

Notes:

1. Your output will not show the "main effect" of language_delay as omitted, because it is colinear with the fixed effect (assuming I was correct above when I posited that it is measured only once in each child.) This is not a problem and you need not worry about it.

2. The output of the first -margins- command will show you the expected values of emotional dysregulation in each group at each age, and the -marginsplot- command will generate a nice graph showing those data. The second -margins- command will show you the marginal effect of a one-year increase in age, at each age, in each group. These outputs should be everything you need.

To understand this code, you first need to learn about factor variable notation. -help fvvarlist-, and the manual section linked therein. Then you also need to learn about the -margins- command. For that, I think the best introduction is https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. That will enable you to grasp what is going on here. Then, you can start learning more about the many wonderful things that -margins- can do in the corresponding section of the user's manuals.
1 like
Comment

Announcement

Finding change in a variable measured at three time points between two groups

Comment