Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subsample and main sample comparison

    Hi.
    I don't know if it is possible to do this, I have seen other similar posts, but I havent found the answer.

    I have a main sample of 2000 subjects and only 500 of them were selected for some laboratory tests. However, from a main sample of 2000 subjets, 800 participants were excluded because of missing data in some outcome variables, resulting in 1200 participants.

    Main sample: 2000
    "Sample 1": 1200
    "Subsample": 500 (included within 1200 participants)

    Therefore, I will analyze some variables with 1,200 participants (sample 1) and others with 500 participants (subsample). However, I would like to compare the characteristics of the participants in sample 1 (1200) with those of the subsample (500) (although they would not meet the characteristics of independence and the number of subjects would be different). Or should I compare the main sample (2000) with the sub-sample (500)?


    If there is a way to do this in stata, could you help me?
    Any advice would be greatly appreciated.

    CR
    Last edited by Carla RAS; 23 Jun 2020, 11:27. Reason: sample, sampling, STATA 15, subsample

  • #2
    You may create a binary - selected - variable, which would gave 500 observations in the ‘yes’ category, and compare features according to this variable. The ‘no’ category will depend on what you wish to compare: 1500 observations with missing values plus no blood samples; or 700 observations with no blood samples. Depending on the study design, the real sample would be 500 observations, the rest being part of a flow chart related to the sample acquisition.
    Best regards,

    Marcos

    Comment


    • #3
      Hi. Thank you so much.

      I did something like this:

      Code:
      gen comp = .
      
      replace comp = 1 if hba1 !=. & il6 !=. & fib !=. & fr !=.
      
      replace comp = 0 if hba1 ==.
      
      replace comp = 0 if il6 ==.
      
      replace comp = 0 if fib ==.
      
      replace comp = 0 if fr ==.
      1 for observations (500)
      0 for missing (1200)

      But, how should I proceed to compare means, i.e., among gender groups, age or physical activity in METS (chi - tab-, ttest, ranksum, kwallis)? (I'd like to get a p value in order to determine whether participants in the subsample had similar characteristics to those of the main sample (2000) or sample 1 (1200)). Besides, those 500 with blood tests are included in the sample with 1200.

      Thank you in advance.
      Last edited by Carla RAS; 24 Jun 2020, 04:01.

      Comment


      • #4
        You can type something like:

        Code:
        ttest varname, by(comp)
        ranksum varname, by(comp)
        tabulate comp catvar, chi2
        Logically, if the missing-group variable is binary, the Kruskal-Wallis test is not applicable.
        Best regards,

        Marcos

        Comment


        • #5
          I was almost forgetting:the command - egen - has a good machinery to deal with the generation of variables which spot missing values, the option - rowmiss - being one of them.
          Best regards,

          Marcos

          Comment


          • #6
            Thank you very much, I really appreciate it.

            Comment

            Working...
            X