Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Longitudinal data - how to connect multiple observations pr. ID to one group?

    Hello,

    I'm currently writing my bachelor's thesis where I have to compare two different treatment groups in regards to several different blood test parameters, BMI and so forth. This is my first time working with Stata (and with data of this sort). The data I'm working with is gathered automatically from their electronic journals. I have provided a fabricated and simplified example below. My real dataset consists of 32 variables with 1158 observations. As far as I've understood, the data is in "long form". Each patient has their own unique ID (for example 16 and 25) and have a different number of observations depending on their past hospital visits (some patients might have 1, others 7 and so on).

    I'm having some trouble figuring out how I can "categorize" each patient into their correct treatment group. Per the example, patient 16 is in the treatment groups "transdermal" and "begge" with a dosage of 350, while patient 25 is in the treatment group "transdermal" with a dosage of 200. How do I make sure those treatment groups are linked to all observations belonging to their respective patient-IDs? Should I put 1/0 and dosage in every "line" belonging to each patient? And if so, won't that skewer my data so it looks as if I have 1158 different patients receiving treatment (and not, say, 200 patients with different number of observations)?
    As of right now, if I were to analyse observations in the group "transdermal", I would only include patient 16's data from 1/30/2022 and patient 25's data from 9/28/2017, right?

    Furthermore, how would you go about visualizing this? I would be interested in visualizing each treatment groups' status in regards to a certain parameter over time - for example, the LH-level of patients receiving transdermal vs oral treatment over time (in the same plot). As far as my googling skills have taken me, I would have to use the xtline and overlay commands. However, I'm not interested in the specific dates of the patient visits as a time unit - it doesn't have to be that specific, more in the likes of "visit 1, visit 2, visit 3" and so forth. Do I have to manually rewrite all dates to that form or? Is there an easier way?

    I apologize in advance if my requests are confusing or perhaps seem a little ignorant. As I've stated, this is a first for me, and I fear I might be in a little over my head :-)

    I hope you get the drift of it all and are able to help - I would be eternally grateful!

    Thanks in advance

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long DWEKBorger double Konsultationstart str61 Diagnose double(BHæmoglobin strPLH strPFSH) byte(Alder Transdermal) double Transdermaldosisugentlig byte Peroral double Peroraldosisdaglig byte Begge
    16 1.7624736e+12 "Turners syndrom" 9.3 11.6 31 20 .   . . . .
    16 1.8016992e+12 "Turners syndrom"   .    .  . 21 .   . . . .
    16   1.95912e+12 "Turners syndrom" 9.2  3.3 10 22 1 350 0 . 1
    25  1.822176e+12 "45,X/46,XX"        .    .  . 50 1 200 0 . 1
    25  1.955232e+12 "45,X/46,XX"      9.1  5.6  . 53 .   . . . .
    end
    format %tcnn/dd/ccYY_hh:MM Konsultationstart

  • #2
    Helene:
    welcome to this forum.
    Some comments about your post:
    1) you have repeated observations for the same patients. Hence you're dealing with a panel dataset. By -xtset-ting your dataset you avoid the risk of having each repeated observations per patient to be considered as an unique observation for a different patient;
    2) unfortunately, from your description I find difficult to suggest which -xt- suite command fits your research goal at best, as I cannot get if your dependent variable is continuous or discrete;
    3) regardless of points 1) and 2), you seem to have many missing values in your dataset. As you may be already aware of, Stata applies listwise deletion to each observation with at least 1 missing values. Therefore, the second observation of patient #16 and the first one of patient 25 will be omitted from any subsequent calculation.

    As an aside, if you feel that this forum can lend you a hand, post with no worries. As one of the most prolific member of this list (Nick Cox) wrote once, in an English just a bit better than mine :"We are all beginners; some of us are only more experienced".
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hello again,

      Thank you very much for the kind reply, Carlo! I appreciate it.

      I thought I'd wait to post an update until I'd met with my supervisor. Unfortunately, we didn't reach a solution.

      In regards to what you wrote,
      1) I seem to have run into a few issues with -xtset-ting my dataset.
      I use
      HTML Code:
      xtset DWEKBorger Konsultationstart
      and get the message
      HTML Code:
      time variable must contain only integer values
      r(451);
      I tried to overcome this by creating a new time variable and changing the format,
      HTML Code:
      gen tids = dofc(Konsultationstart)
      HTML Code:
      format tids %td
      and
      HTML Code:
      format tids %tg
      but both formats give me the same error as before.
      I am quite unsure as to how I can overcome this, as the numbers, when formatted with %tg, are whole numbers such as 21607, 21899 and 22259? How are those not intergers?

      2) I am pretty sure it's continuous!


      Furthermore - my supervisor thinks we should reshape the data to wide form instead. Do you think this would help?

      Again, thanks in advance!

      Comment


      • #4
        Helene:
        1) I cannot replicate that issue with your original data excerpt:
        Code:
        . xtset DWEKBorger Konsultationstart
        
        Panel variable: DWEKBorger (unbalanced)
         Time variable: Konsultationstart, 11/7/2015 12:00 to 1/30/2022 12:00, but with gaps
                 Delta: .001 seconds
        2) if your regressand (aka dependent variable; y) is continuous, you may want to consider -xtreg-, that
        3) prefers (just like most of the Stata commands) the -long- format.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X