Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with generating subample using binary variable

    I have primary dataset with 574 responses within which I wanted to perform descriptive statistics specifically for one topic in the questionnaire which has 8 questions and the answers are on likert scale from 1 to 5 (never, rarely, sometimes, often, every day). I have generated binary variable (never=0, remaining all=1) taking each question and then combined binary variable which has 442 observations (324=1, 118=0) . Now I wanted to analyse this subset of 442. I used code gen subsample=1 if var_bin==1. Obviously, I saw only 324 observations. Can someone please help me to get both observations.

    tab var_bin

    var_bin | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 118 26.70 26.70
    1 | 324 73.30 100.00
    ------------+-----------------------------------
    Total | 442 100.00


    dtable i.var1 i.var2 i.var3 if subsample==1

    Summary
    ---------------------------------
    N 324
    var1
    8-18 49 (15.6%)
    19-29 24 (7.6%)
    40-49 37 (11.8%)
    50-59 55 (17.5%)
    60-69 81 (25.8%)
    70+ 68 (21.7%)
    var2
    Above 1k 111 (34.4%)
    Below 1k 210 (65.0%)
    No response 2 (0.6%)
    var3
    Above A level 156 (48.3%)
    Below A level 167 (51.7%)


    Can somebody please guide me through this to get all responses (442)

    Thank you

  • #2
    You have a total of 442 observations for which the variable var_bin is non-missing. Of those 324 have var_bin = 1.

    Digression: You then create a new variable, subsample which is equal to 1 if var_bin = 1 and missing otherwise. Now, that's a bad idea: 1/. coding is a setup for problems in Stata. Better is to code 1/0, with missing values reserved only for when it is undefinable. But, this is actually of no consequence here. In fact, the variable subsample is not needed for the purpose you have shown, you could just use var_bin in the same way. End of Digression.

    Be that as it may, subsample, like var_bin takes on the value 1 in 324 observations. You then run -dtable...if subsample == 1- so, of course, you get a table summarizing 324 observations. If you want the table to include all 442 responses, then instead of -if subsample == 1- use -if !missing(var_bin)-.

    One other suggestion that is not directly related to your question. Don't use variable names like var1, var2, var3,... Even if at this moment, having been working intensely with this dataset, you know what all of these variables actually represent, if you have to come back to this after being away from it even just for a week or so, you likely will have forgotten and will waste a lot of time relearning it. Even more important, if somebody who is an outsider to your work needs to read your code, it will be completely incomprehensible. Transparency in coding is important so that others can understand what we do and gain confidence in our results. So give variables names that explain what they are.
    Last edited by Clyde Schechter; 23 Jul 2024, 13:07.

    Comment


    • #3
      Thank you Professor for your help, it worked. Thanks for your suggestion too. I agree with your point reg variable names (however, I saved those in my do-file before posting it here). From next time onwards I follow this.

      Comment

      Working...
      X