Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bysort command for longitudinal data analysis: am I doing it right?

    Hi everyone,

    I am currently analyzing longitudinal data. For each of my participants, the dependent variable was measured at 3 different time points. Therefore, my dataset has a long format, and it looks like this:
    id time dependent variable independent variable covariate 1
    1 1 value 1 value 1 value 1
    1 2 value 2 value 2 value 2
    1 3 value 3 value 3 value 3
    2 1 value 1 value 1 value 1
    2 2 value 2 value 2 value 2
    2 3 value 3 value 3 value 3
    My participants come from two different areas (either area 1 or area 2), and area is a fixed variable so it does not change over time. Now, I want to have separate results for my participants based on their living area (so separate results for participants living in area 1 vs. participants living in area 2). For this, I used the following command:

    bysort area: [dependent variable] [independent variable] || id:

    This definitely seems to work, because separate output is produced for participants living in area 1 vs. area 2.

    I know that the data are now sorted based on living area, so everyone living in area 1 takes up the first rows of the dataset. This is also the case in my dataset. However, after executing this command, the observations of the same subject are not located together anymore. So, for example, my dataset now looks like this:
    id time dependent variable independent variable covariate 1
    5 2 value 2 value 2 value 2
    8 1 value 1 value 1 value 1
    18 2 value 2 value 2 value 2
    33 3 value 3 value 3 value 3
    5 1 value 1 value 1 value 1
    etc...

    The data are still the same, and every participant still has 3 observations at three different time points, but I don't understand why the data are sorted this way.

    Therefore, my question is: are my results still valid, even though the data are sorted this way? Or should I use a different command?

    I hope my question is clear... Thanks in advance for the reply!

  • #2
    -bysort area: [dependent variable] [independent variable] || id:- doesn't contain any command after the -bysort area:- prefix. Presumably there is something else there that you omitted, perhaps -mixed- or one of the other -me- commands.

    Anyway, the sort order of the data doesn't matter for those commands. And the sorting by area is necessary only for doing the analysis separately by area. So, yes, your results are OK.

    Understand that in Stata, -sort-ing on one variable does not preserve any earlier sorting. That is, you seem to expect the result of the sort to leave observations sorted on id and time within area. But that's not how it works in Stata. If you were going to do something that did require that, you would have to do it explicitly: -bysort area (id time): command-. Or, do the sort separately and specify the stable option:
    Code:
    sort area, stable
    by area: command
    The -stable- option tells Stata that when sorting on area it should preserve other existing sort orders. Without that option, it does not do so--in fact it randomizes the order within area. Note also that the -by area:- command does not itself call for a sort here. It relies on the sort already done by the preceding command.

    Comment


    • #3
      Dear Clyde,

      Indeed, my command was bysort area: mixed [dependent variable] [independent variable] || id:

      Thank you for your clear explanation, it's clear to me now.



      Comment

      Working...
      X