Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making predictor variables to have the same frequencies as the outcome variable (i.e. freq 39165=study sample)

    Hi. I created a composite variable "TBHIV_Knowledge" consisting of a total frequency of 39165 (which is people who answered all the 5 questions in my composite variable, which named: Prevention, Transmission1, Transmission2, Cure, coinfection). 39165 is my study sample. However, when I table my predictor variables, they show higher frequencies (e.g. Age 66200, Sex 65000, Employment 39,468 and so on). How can I generate these variables in a way that only shows a total frequencies of 39165 (i.e. each predictor variable has the same frequency as the outcome variable, TBHIV_Knowledge)?

    My data looks like this
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float TB_HIV_Knowledge byte(Prevention Transmission_1 Transmission_2 Cure Coinfection) int Age_Recode byte(Sex Residence_Type Region Marital_Status Employment) long Gross_Income2
    . . . . . .  1 1 1 1 . . .
    . . . . . .  0 2 2 5 . . .
    . . . . . .  0 1 1 7 . . .
    . . . . . .  0 2 1 7 . . .
    . . . . . .  0 2 1 7 . . .
    . . . . . .  0 2 1 7 . . .
    . . . . . .  2 2 1 5 . . .
    . . . . . .  1 1 1 7 . . .
    . . . . . .  3 2 1 1 . . .
    . . . . . . 10 2 1 7 . . .
    . . . . . .  7 2 1 7 . . .
    . . . . . .  3 2 1 7 . . .
    . . . . . . 10 1 1 7 . . .
    . . . . . .  0 2 1 7 . . .
    . . . . . .  0 1 1 7 . . .
    . . . . . .  0 2 1 7 . . .
    . . . . . .  3 2 1 1 . . .
    . . . . . .  3 1 2 3 . . .
    . . . . . .  0 1 2 4 . . .
    . . . . . .  5 2 2 4 . . .
    . . . . . . 10 . 2 5 . . .
    . . . . . .  0 2 2 5 . . .
    . . . . . . 10 2 2 5 . . .
    . . . . . . 10 2 1 5 . . .
    . . . . . .  0 2 1 8 . . .
    . . . . . .  1 1 1 8 . . .
    . . . . . .  0 2 1 8 . . .
    . . . . . .  0 2 2 8 . . .
    . . . . . .  2 1 1 8 . . .
    . . . . . .  3 1 1 8 . . .
    . . . . . .  0 1 1 8 . . .
    . . . . . .  0 1 1 8 . . .
    1 1 1 1 2 1  9 1 2 1 1 4 .
    . . . . . .  7 1 2 1 1 . .
    1 1 1 1 3 2  7 2 2 1 2 4 .
    1 1 1 1 1 2  2 1 2 1 2 4 .
    3 3 3 3 3 3  1 1 2 1 2 1 .
    . . . . . .  1 1 2 1 2 . .
    . . . . . .  0 1 2 1 . . .
    . . . . . .  0 2 2 1 . . .
    1 1 1 1 1 2  1 2 2 1 2 3 1
    . . . . . .  0 2 2 5 . . .
    1 1 1 1 1 2  4 2 2 1 1 3 .
    1 1 1 1 1 2 10 2 2 1 1 1 .
    1 1 1 1 1 2 10 1 2 1 1 1 .
    . . . . . .  0 1 2 1 . . .
    . . . . . .  2 1 2 1 2 1 .
    1 2 1 1 2 2 10 1 2 1 2 4 1
    2 2 3 1 3 2  4 2 2 1 2 4 .
    2 1 1 3 3 2  4 1 2 1 2 4 1
    2 2 2 2 2 2  3 2 2 1 2 1 .
    1 1 1 1 2 2  7 2 2 1 2 4 1
    1 1 1 1 3 2  3 1 2 1 2 1 .
    1 1 1 1 1 2  1 2 2 1 2 3 .
    1 1 1 1 1 2  6 2 2 1 2 4 .
    1 1 1 1 1 2  5 1 2 1 2 4 1
    . . . . . .  0 2 2 1 . . .
    . . . . . .  0 1 2 1 . . .
    . . . . . .  0 1 2 1 . . .
    2 3 2 1 2 3  2 2 2 1 1 4 1
    2 1 3 1 3 3  8 1 2 1 1 4 .
    2 1 3 3 3 3 10 2 2 1 1 1 .
    1 1 1 1 1 3  2 1 2 1 2 4 1
    . . . . . .  0 1 2 1 . . .
    1 1 1 1 3 2  5 1 2 1 1 4 1
    1 1 1 1 1 2  4 2 2 1 1 4 1
    1 1 1 1 2 2  1 1 2 1 2 4 1
    1 1 1 1 3 2  1 2 2 1 2 3 .
    . . . . . .  0 2 2 1 . . .
    . . . . . .  0 1 2 1 . . .
    1 1 1 1 1 2  7 1 2 1 2 2 1
    1 1 2 1 1 2  4 1 2 2 3 1 .
    1 1 1 1 2 2  4 2 2 1 2 1 .
    . . . . . .  0 1 2 1 . . .
    1 1 2 1 1 2  4 1 2 2 3 1 .
    . . . . . .  0 1 2 1 . . .
    . . . . . .  0 2 2 1 . . .
    . 3 3 . 3 3  9 1 2 1 3 1 .
    1 2 1 1 1 2  6 1 2 1 2 4 1
    1 1 1 1 1 2  5 2 2 1 2 4 1
    . . . . . .  0 1 2 1 . . .
    1 1 1 1 1 2  2 2 2 1 2 4 .
    3 3 3 3 3 3  9 1 2 1 1 4 1
    3 3 3 3 3 3 10 2 2 1 1 1 1
    2 1 3 1 3 2  1 2 2 1 2 3 .
    1 1 1 1 2 2  8 1 2 1 1 1 .
    1 1 1 1 3 2 10 1 2 1 1 4 3
    . . . . . .  0 1 2 1 . . .
    . . . . . .  0 1 2 1 . . .
    1 1 1 1 1 2  5 2 2 2 2 1 .
    . . . . . .  0 2 2 1 . . .
    . . . . . .  0 1 2 1 . . .
    1 1 1 1 1 1  4 1 1 1 2 4 1
    1 1 1 1 3 1  2 1 1 1 2 4 1
    1 1 1 1 1 2  2 1 1 1 2 4 1
    2 1 3 3 1 2  2 1 1 1 2 4 1
    2 2 2 3 2 2  2 1 1 1 2 4 1
    1 1 1 1 1 2  3 1 1 1 2 1 .
    . . . . . .  0 1 1 1 . . .
    1 1 1 1 1 2  9 2 1 1 2 1 1
    end
    label values TB_HIV_Knowledge TB_HIV_Knowledge
    label def TB_HIV_Knowledge 1 "True", modify
    label def TB_HIV_Knowledge 2 "False", modify
    label def TB_HIV_Knowledge 3 "Do Not Know", modify
    label values Prevention q3_2f
    label def q3_2f 1 "True", modify
    label def q3_2f 2 "False", modify
    label def q3_2f 3 "Do not know", modify
    label values Transmission_1 q3_1b
    label def q3_1b 1 "True", modify
    label def q3_1b 2 "False", modify
    label def q3_1b 3 "Do not know", modify
    label values Transmission_2 q3_1c
    label def q3_1c 1 "True", modify
    label def q3_1c 2 "False", modify
    label def q3_1c 3 "Do not know", modify
    label values Cure q3_3e
    label def q3_3e 1 "True", modify
    label def q3_3e 2 "False", modify
    label def q3_3e 3 "Do not know", modify
    label values Coinfection q3_4
    label def q3_4 1 "True", modify
    label def q3_4 2 "False", modify
    label def q3_4 3 "Do not know", modify
    label values Age_Recode Age_Recode
    label def Age_Recode 0 "Not_In_The-Study", modify
    label def Age_Recode 1 "15-19", modify
    label def Age_Recode 2 "20-24", modify
    label def Age_Recode 3 "25-29", modify
    label def Age_Recode 4 "30-34", modify
    label def Age_Recode 5 "35-39", modify
    label def Age_Recode 6 "40-44", modify
    label def Age_Recode 7 "45-49", modify
    label def Age_Recode 8 "50-54", modify
    label def Age_Recode 9 "55-59", modify
    label def Age_Recode 10 "60+", modify
    label values Sex sex_q
    label def sex_q 1 "Male", modify
    label def sex_q 2 "Female", modify
    label values Residence_Type Residence_Type
    label def Residence_Type 1 "Urban", modify
    label def Residence_Type 2 "Rural", modify
    label values Region province
    label def province 1 "Western Cape", modify
    label def province 2 "Eastern Cape", modify
    label def province 3 "Northern Cape", modify
    label def province 4 "Free State", modify
    label def province 5 "KwaZulu-Natal", modify
    label def province 7 "Gauteng", modify
    label def province 8 "Mpumalanga", modify
    label values Marital_Status Marital_Status
    label def Marital_Status 1 "Married", modify
    label def Marital_Status 2 "Never Married", modify
    label def Marital_Status 3 "No longer Married", modify
    label values Employment q1_7
    label def q1_7 1 "Unemployed", modify
    label def q1_7 2 "Sick/disabled and unable to work", modify
    label def q1_7 3 "Student/pupil/learner", modify
    label def q1_7 4 "Employed / Self Employed", modify
    label values Gross_Income2 Gross_Income2
    label def Gross_Income2 1 "Poor >R3500", modify
    label def Gross_Income2 3 "Working Class R8100-R22000", modify

  • #2
    your question is not completely clear to me but I think the following will do what you want
    Code:
    qui regress TB_HIV_Knowledge
    gen byte keep=e(sample)
    now you can have the same sample as in your main variable by just adding "if keep" to you later commands; or, if you want to make a new data of the correct size:
    Code:
    keep if keep
    save newdata

    Comment


    • #3
      Thank you very much Rich Goldstein! The codes worked perfectly well. This is exactly what I wanted.

      Comment


      • #4
        Originally posted by Sonwabile Mbuma View Post
        Hi. I created a composite variable "TBHIV_Knowledge" consisting of a total frequency of 39165 (which is people who answered all the 5 questions in my composite variable, which named: Prevention, Transmission1, Transmission2, Cure, coinfection). 39165 is my study sample. However, when I table my predictor variables, they show higher frequencies (e.g. Age 66200, Sex 65000, Employment 39,468 and so on). How can I generate these variables in a way that only shows a total frequencies of 39165 (i.e. each predictor variable has the same frequency as the outcome variable, TBHIV_Knowledge)?

        My data looks like this
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float TB_HIV_Knowledge byte(Prevention Transmission_1 Transmission_2 Cure Coinfection) int Age_Recode byte(Sex Residence_Type Region Marital_Status Employment) long Gross_Income2
        . . . . . . 1 1 1 1 . . .
        . . . . . . 0 2 2 5 . . .
        . . . . . . 0 1 1 7 . . .
        . . . . . . 0 2 1 7 . . .
        . . . . . . 0 2 1 7 . . .
        . . . . . . 0 2 1 7 . . .
        . . . . . . 2 2 1 5 . . .
        . . . . . . 1 1 1 7 . . .
        . . . . . . 3 2 1 1 . . .
        . . . . . . 10 2 1 7 . . .
        . . . . . . 7 2 1 7 . . .
        . . . . . . 3 2 1 7 . . .
        . . . . . . 10 1 1 7 . . .
        . . . . . . 0 2 1 7 . . .
        . . . . . . 0 1 1 7 . . .
        . . . . . . 0 2 1 7 . . .
        . . . . . . 3 2 1 1 . . .
        . . . . . . 3 1 2 3 . . .
        . . . . . . 0 1 2 4 . . .
        . . . . . . 5 2 2 4 . . .
        . . . . . . 10 . 2 5 . . .
        . . . . . . 0 2 2 5 . . .
        . . . . . . 10 2 2 5 . . .
        . . . . . . 10 2 1 5 . . .
        . . . . . . 0 2 1 8 . . .
        . . . . . . 1 1 1 8 . . .
        . . . . . . 0 2 1 8 . . .
        . . . . . . 0 2 2 8 . . .
        . . . . . . 2 1 1 8 . . .
        . . . . . . 3 1 1 8 . . .
        . . . . . . 0 1 1 8 . . .
        . . . . . . 0 1 1 8 . . .
        1 1 1 1 2 1 9 1 2 1 1 4 .
        . . . . . . 7 1 2 1 1 . .
        1 1 1 1 3 2 7 2 2 1 2 4 .
        1 1 1 1 1 2 2 1 2 1 2 4 .
        3 3 3 3 3 3 1 1 2 1 2 1 .
        . . . . . . 1 1 2 1 2 . .
        . . . . . . 0 1 2 1 . . .
        . . . . . . 0 2 2 1 . . .
        1 1 1 1 1 2 1 2 2 1 2 3 1
        . . . . . . 0 2 2 5 . . .
        1 1 1 1 1 2 4 2 2 1 1 3 .
        1 1 1 1 1 2 10 2 2 1 1 1 .
        1 1 1 1 1 2 10 1 2 1 1 1 .
        . . . . . . 0 1 2 1 . . .
        . . . . . . 2 1 2 1 2 1 .
        1 2 1 1 2 2 10 1 2 1 2 4 1
        2 2 3 1 3 2 4 2 2 1 2 4 .
        2 1 1 3 3 2 4 1 2 1 2 4 1
        2 2 2 2 2 2 3 2 2 1 2 1 .
        1 1 1 1 2 2 7 2 2 1 2 4 1
        1 1 1 1 3 2 3 1 2 1 2 1 .
        1 1 1 1 1 2 1 2 2 1 2 3 .
        1 1 1 1 1 2 6 2 2 1 2 4 .
        1 1 1 1 1 2 5 1 2 1 2 4 1
        . . . . . . 0 2 2 1 . . .
        . . . . . . 0 1 2 1 . . .
        . . . . . . 0 1 2 1 . . .
        2 3 2 1 2 3 2 2 2 1 1 4 1
        2 1 3 1 3 3 8 1 2 1 1 4 .
        2 1 3 3 3 3 10 2 2 1 1 1 .
        1 1 1 1 1 3 2 1 2 1 2 4 1
        . . . . . . 0 1 2 1 . . .
        1 1 1 1 3 2 5 1 2 1 1 4 1
        1 1 1 1 1 2 4 2 2 1 1 4 1
        1 1 1 1 2 2 1 1 2 1 2 4 1
        1 1 1 1 3 2 1 2 2 1 2 3 .
        . . . . . . 0 2 2 1 . . .
        . . . . . . 0 1 2 1 . . .
        1 1 1 1 1 2 7 1 2 1 2 2 1
        1 1 2 1 1 2 4 1 2 2 3 1 .
        1 1 1 1 2 2 4 2 2 1 2 1 .
        . . . . . . 0 1 2 1 . . .
        1 1 2 1 1 2 4 1 2 2 3 1 .
        . . . . . . 0 1 2 1 . . .
        . . . . . . 0 2 2 1 . . .
        . 3 3 . 3 3 9 1 2 1 3 1 .
        1 2 1 1 1 2 6 1 2 1 2 4 1
        1 1 1 1 1 2 5 2 2 1 2 4 1
        . . . . . . 0 1 2 1 . . .
        1 1 1 1 1 2 2 2 2 1 2 4 .
        3 3 3 3 3 3 9 1 2 1 1 4 1
        3 3 3 3 3 3 10 2 2 1 1 1 1
        2 1 3 1 3 2 1 2 2 1 2 3 .
        1 1 1 1 2 2 8 1 2 1 1 1 .
        1 1 1 1 3 2 10 1 2 1 1 4 3
        . . . . . . 0 1 2 1 . . .
        . . . . . . 0 1 2 1 . . .
        1 1 1 1 1 2 5 2 2 2 2 1 .
        . . . . . . 0 2 2 1 . . .
        . . . . . . 0 1 2 1 . . .
        1 1 1 1 1 1 4 1 1 1 2 4 1
        1 1 1 1 3 1 2 1 1 1 2 4 1
        1 1 1 1 1 2 2 1 1 1 2 4 1
        2 1 3 3 1 2 2 1 1 1 2 4 1
        2 2 2 3 2 2 2 1 1 1 2 4 1
        1 1 1 1 1 2 3 1 1 1 2 1 .
        . . . . . . 0 1 1 1 . . .
        1 1 1 1 1 2 9 2 1 1 2 1 1
        end
        label values TB_HIV_Knowledge TB_HIV_Knowledge
        label def TB_HIV_Knowledge 1 "True", modify
        label def TB_HIV_Knowledge 2 "False", modify
        label def TB_HIV_Knowledge 3 "Do Not Know", modify
        label values Prevention q3_2f
        label def q3_2f 1 "True", modify
        label def q3_2f 2 "False", modify
        label def q3_2f 3 "Do not know", modify
        label values Transmission_1 q3_1b
        label def q3_1b 1 "True", modify
        label def q3_1b 2 "False", modify
        label def q3_1b 3 "Do not know", modify
        label values Transmission_2 q3_1c
        label def q3_1c 1 "True", modify
        label def q3_1c 2 "False", modify
        label def q3_1c 3 "Do not know", modify
        label values Cure q3_3e
        label def q3_3e 1 "True", modify
        label def q3_3e 2 "False", modify
        label def q3_3e 3 "Do not know", modify
        label values Coinfection q3_4
        label def q3_4 1 "True", modify
        label def q3_4 2 "False", modify
        label def q3_4 3 "Do not know", modify
        label values Age_Recode Age_Recode
        label def Age_Recode 0 "Not_In_The-Study", modify
        label def Age_Recode 1 "15-19", modify
        label def Age_Recode 2 "20-24", modify
        label def Age_Recode 3 "25-29", modify
        label def Age_Recode 4 "30-34", modify
        label def Age_Recode 5 "35-39", modify
        label def Age_Recode 6 "40-44", modify
        label def Age_Recode 7 "45-49", modify
        label def Age_Recode 8 "50-54", modify
        label def Age_Recode 9 "55-59", modify
        label def Age_Recode 10 "60+", modify
        label values Sex sex_q
        label def sex_q 1 "Male", modify
        label def sex_q 2 "Female", modify
        label values Residence_Type Residence_Type
        label def Residence_Type 1 "Urban", modify
        label def Residence_Type 2 "Rural", modify
        label values Region province
        label def province 1 "Western Cape", modify
        label def province 2 "Eastern Cape", modify
        label def province 3 "Northern Cape", modify
        label def province 4 "Free State", modify
        label def province 5 "KwaZulu-Natal", modify
        label def province 7 "Gauteng", modify
        label def province 8 "Mpumalanga", modify
        label values Marital_Status Marital_Status
        label def Marital_Status 1 "Married", modify
        label def Marital_Status 2 "Never Married", modify
        label def Marital_Status 3 "No longer Married", modify
        label values Employment q1_7
        label def q1_7 1 "Unemployed", modify
        label def q1_7 2 "Sick/disabled and unable to work", modify
        label def q1_7 3 "Student/pupil/learner", modify
        label def q1_7 4 "Employed / Self Employed", modify
        label values Gross_Income2 Gross_Income2
        label def Gross_Income2 1 "Poor >R3500", modify
        label def Gross_Income2 3 "Working Class R8100-R22000", modify
        Hi, ma you please help. I need further assistance. Is there something more I can do besides dropping lower frequencies (I need all the below listed variables)? The code was able to equalize Prevention, transmission1, Transmission2, Cure, Coinfection, Age, sex, residence, and Province to 39165 (frequency of TBHIV_Knowledge).
        However, observations were deleted, variables that had lower frequencies were not changed to values lower than 39165.

        Different Sample/frequencies
        Marital Status = 39157
        Employment = 39116
        Highest Education = 28119
        Gross Income = 19569

        Comment


        • #5
          Originally posted by Rich Goldstein View Post
          your question is not completely clear to me but I think the following will do what you want
          Code:
          qui regress TB_HIV_Knowledge
          gen byte keep=e(sample)
          now you can have the same sample as in your main variable by just adding "if keep" to you later commands; or, if you want to make a new data of the correct size:
          Code:
          keep if keep
          save newdata
          Hi, ma you please help. I need further assistance. Is there something more I can do besides dropping lower frequencies (I need all the below listed variables)? The code was able to equalize Prevention, transmission1, Transmission2, Cure, Coinfection, Age, sex, residence, and Province to 39165 (frequency of TBHIV_Knowledge).
          However, observations were deleted, variables that had lower frequencies were not changed to values lower than 39165.

          Different Sample/frequencies
          Marital Status = 39157
          Employment = 39116
          Highest Education = 28119
          Gross Income = 19569

          Comment


          • #6
            Use all relevant variables to generate the estimation sample. Assuming these are all numerical variables in the dataset:

            Code:
            qui ds, has(type numeric)
            qui regress `r(varlist)'
            gen sample= e(sample)

            Comment

            Working...
            X