Bysort command for longitudinal data analysis: am I doing it right?

Wes Beliski

Join Date: Apr 2023
Posts: 7

Bysort command for longitudinal data analysis: am I doing it right?

27 Apr 2023, 04:04

Hi everyone,

I am currently analyzing longitudinal data. For each of my participants, the dependent variable was measured at 3 different time points. Therefore, my dataset has a long format, and it looks like this:

id	time	dependent variable	independent variable	covariate 1
1	1	value 1	value 1	value 1
1	2	value 2	value 2	value 2
1	3	value 3	value 3	value 3
2	1	value 1	value 1	value 1
2	2	value 2	value 2	value 2
2	3	value 3	value 3	value 3

My participants come from two different areas (either area 1 or area 2), and area is a fixed variable so it does not change over time. Now, I want to have separate results for my participants based on their living area (so separate results for participants living in area 1 vs. participants living in area 2). For this, I used the following command:

bysort area: [dependent variable] [independent variable] || id:

This definitely seems to work, because separate output is produced for participants living in area 1 vs. area 2.

I know that the data are now sorted based on living area, so everyone living in area 1 takes up the first rows of the dataset. This is also the case in my dataset. However, after executing this command, the observations of the same subject are not located together anymore. So, for example, my dataset now looks like this:

id	time	dependent variable	independent variable	covariate 1
5	2	value 2	value 2	value 2
8	1	value 1	value 1	value 1
18	2	value 2	value 2	value 2
33	3	value 3	value 3	value 3
5	1	value 1	value 1	value 1

etc...

The data are still the same, and every participant still has 3 observations at three different time points, but I don't understand why the data are sorted this way.

Therefore, my question is: are my results still valid, even though the data are sorted this way? Or should I use a different command?

I hope my question is clear... Thanks in advance for the reply!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

27 Apr 2023, 10:36

-bysort area: [dependent variable] [independent variable] || id:- doesn't contain any command after the -bysort area:- prefix. Presumably there is something else there that you omitted, perhaps -mixed- or one of the other -me- commands.

Anyway, the sort order of the data doesn't matter for those commands. And the sorting by area is necessary only for doing the analysis separately by area. So, yes, your results are OK.

Understand that in Stata, -sort-ing on one variable does not preserve any earlier sorting. That is, you seem to expect the result of the sort to leave observations sorted on id and time within area. But that's not how it works in Stata. If you were going to do something that did require that, you would have to do it explicitly: -bysort area (id time): command-. Or, do the sort separately and specify the stable option:

Code:

sort area, stable by area: command

The -stable- option tells Stata that when sorting on area it should preserve other existing sort orders. Without that option, it does not do so--in fact it randomizes the order within area. Note also that the -by area:- command does not itself call for a sort here. It relies on the sort already done by the preceding command.
1 like
Comment
Wes Beliski

Join Date: Apr 2023

Posts: 7
#3

28 Apr 2023, 00:48

Dear Clyde,

Indeed, my command was bysort area: mixed [dependent variable] [independent variable] || id:

Thank you for your clear explanation, it's clear to me now.
Comment

Announcement

Bysort command for longitudinal data analysis: am I doing it right?

Comment

Comment