Hi everyone,
I am currently analyzing longitudinal data. For each of my participants, the dependent variable was measured at 3 different time points. Therefore, my dataset has a long format, and it looks like this:
My participants come from two different areas (either area 1 or area 2), and area is a fixed variable so it does not change over time. Now, I want to have separate results for my participants based on their living area (so separate results for participants living in area 1 vs. participants living in area 2). For this, I used the following command:
bysort area: [dependent variable] [independent variable] || id:
This definitely seems to work, because separate output is produced for participants living in area 1 vs. area 2.
I know that the data are now sorted based on living area, so everyone living in area 1 takes up the first rows of the dataset. This is also the case in my dataset. However, after executing this command, the observations of the same subject are not located together anymore. So, for example, my dataset now looks like this:
etc...
The data are still the same, and every participant still has 3 observations at three different time points, but I don't understand why the data are sorted this way.
Therefore, my question is: are my results still valid, even though the data are sorted this way? Or should I use a different command?
I hope my question is clear... Thanks in advance for the reply!
I am currently analyzing longitudinal data. For each of my participants, the dependent variable was measured at 3 different time points. Therefore, my dataset has a long format, and it looks like this:
id | time | dependent variable | independent variable | covariate 1 |
1 | 1 | value 1 | value 1 | value 1 |
1 | 2 | value 2 | value 2 | value 2 |
1 | 3 | value 3 | value 3 | value 3 |
2 | 1 | value 1 | value 1 | value 1 |
2 | 2 | value 2 | value 2 | value 2 |
2 | 3 | value 3 | value 3 | value 3 |
bysort area: [dependent variable] [independent variable] || id:
This definitely seems to work, because separate output is produced for participants living in area 1 vs. area 2.
I know that the data are now sorted based on living area, so everyone living in area 1 takes up the first rows of the dataset. This is also the case in my dataset. However, after executing this command, the observations of the same subject are not located together anymore. So, for example, my dataset now looks like this:
id | time | dependent variable | independent variable | covariate 1 |
5 | 2 | value 2 | value 2 | value 2 |
8 | 1 | value 1 | value 1 | value 1 |
18 | 2 | value 2 | value 2 | value 2 |
33 | 3 | value 3 | value 3 | value 3 |
5 | 1 | value 1 | value 1 | value 1 |
The data are still the same, and every participant still has 3 observations at three different time points, but I don't understand why the data are sorted this way.
Therefore, my question is: are my results still valid, even though the data are sorted this way? Or should I use a different command?
I hope my question is clear... Thanks in advance for the reply!
Comment