Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding and Multiplying Variables with Missing Data

    Hello!

    I am trying to create a new variable based off other variables. I need to add them, but I also need to give each of the variables different weights. The problem I'm running into is that Stata seems to be ignoring all the responses that have at least one missing value (which is basically all of them). Specifically, the Q13_x#'s are variables about social media usage, and since not every respondent uses every social medium, many responses have missing data. I considered using the rowtotal command, but then I don't know how to use the weights. I also considered adding weights to those variables, but they are different for each new variable I'm trying to create. I'm writing the commands I have been working with below. However, every time I try to use these, it says that there are "No observations". Any advice? I'm still very new to Stata, so I really appreciate your help in advance.

    gen interactivity = (Q13_x1 * 1.00 + Q13_x2 * 1.00 + Q13_x3 * .67 + Q13_x4 * .67 + Q13_x5 * .67 + Q13_x6 * .33 + Q13_x7 * .33 + Q13_x8 * 1.00)
    gen socialcues = (Q13_x1 * .67 + Q13_x2 * 1.00 + Q13_x3 * 1.00 + Q13_x4 * .67 + Q13_x5 * 1.00 + Q13_x6 * 1.00 + Q13_x7 * 1.00 + Q13_x8 * 1.00)
    gen reach = (Q13_x1 * .67 + Q13_x2 * .67 + Q13_x3 * .33 + Q13_x4 * .33 + Q13_x5 * .33 + Q13_x6 * 1.00 + Q13_x7 * 1.00 + Q13_x8 * .67)
    gen tempstructure = (Q13_x1 * 1.00 + Q13_x2 * .33 + Q13_x3 * .33 + Q13_x4 * 1.00 + Q13_x5 * 1.00 + Q13_x6 * .33 + Q13_x7 * .33 + Q13_x8 * .33)
    Last edited by Jess Lee; 26 Apr 2017, 17:30.

  • #2
    There is no doubt about how Stata works here: any missing values in the variables included in the expressions defining the new variables will result in missing results. I am like Stata; if you tell me that Q13_x3 is missing I can't tell you what Q13_x3 * 0.67 is, or should be. .

    Simply, what should happen in your view when any variable is missing? If you can give a reasonable procedure there will be code to implement it. Above all, if missing really means zero, then you can fix that directly with mvencode.

    Comment


    • #3
      Like most thing in Stata, doing this with wide layout data is like going into the boxing ring with one or both hands tied behind your back. It's easy in long.

      Code:
      //    CREATE A FILE OF WEIGHTS
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(_j interactivity_wt socialcues_wt reach_wt tempstructure_wt)
      1   1 .67 .67   1
      2   1   1 .67 .33
      3 .67   1 .33 .33
      4 .67 .67 .33   1
      5 .67   1 .33   1
      6 .33   1   1 .33
      7 .33   1   1 .33
      8   1   1 .67 .33
      end
      tempfile weights
      save `weights'
      
      //    ILLUSTRATIVE (MADE-UP) DATA
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(Q13_x1 Q13_x2 Q13_x3 Q13_x4 Q13_x5 Q13_x6 Q13_x7 Q13_x8)
      51 58 52 44 61 67 48 53
      50 60 57 50 53  . 42 36
       . 50 50 60 52  . 57 43
      51 47 56 50 43 46 63 43
      47 54 48 42  .  . 49 61
      51 55 46  . 47 67 51 54
       . 50 45 39  . 53 43 50
      52 63 49 56  . 45 60  .
      49 58 47 57 58 46 52 48
      56 31 53 44 45 57 65  .
      51 42 62 59  . 49 40 57
      49 44 62 53 48 43 50  .
      39 56 59 42 55 45 41  .
      51  . 42  . 33  . 58 65
      48 54 57 41 47 48 39 56
      55 46 49 40 51 40 46 57
      50 55 51 42  . 51 41 53
      49 51 40 49 50 38 45 48
      47 53 51 55 56 53 49 40
      58  . 49 51 49 50 50 41
      47 44 42 47 56 54 49 54
      51 44 45 48 60 44  . 49
       . 50 47 52 58 55 55 40
      52 57  . 58 47 55 49 47
       . 53 52  . 45  . 53 68
      end
      
      //    CREATE AN ID VARIABLE (ASSUMING NONE ALREADY EXISTS)
      gen long id = _n
      
      //    GO TO LONG LAYOUT
      reshape long Q13_x, i(id) j(_j)
      
      //    MERGE IN THE WEIGHTS
      merge m:1 _j using `weights', assert(match) nogenerate
      
      //    CALCULATE THE WEIGHTED AVERAGES
      foreach x in interactivity socialcues reach tempstructure {
          by id (_j), sort: egen `x' = total(`x'_wt*Q13_x)
          by id (_j): egen `x'_wt_total = total(`x'_wt*!missing(Q13_x))
          replace `x' = `x'/`x'_wt_total
      }
      
      //    CLEAN UP
      drop *_total *_wt
      
      //    GO BACK TO WIDE LAYOUT
      reshape wide Q13_, i(id) j(_j)
      First step is to create a set of weights. If what you showed in your post is the actual set of variables and weights, then you can just use what's in the code above. But either way, save it as a real file.

      The second data set shown is just some made up random numbers to illustrate the approach as you didn't supply any example data of your own.

      I don't know what you will be doing next with this. The chances are excellent that your next steps, too, will be easier in long layout. So give serious consideration to omitting that final -reshape wide- command.

      Added: Crossed with Nick's post which makes some excellent points. The above code implicitly assumes that missing responses are simply omitted from the calculation of the weighted mean (with corresponding adjustment of the weights themselves). This is equivalent to treating the missing response as equal to the (weighted) mean of the non-missing responses and is sometimes referred to as ipsative mean imputation.

      Also, corrected error in code.
      Last edited by Clyde Schechter; 26 Apr 2017, 18:11.

      Comment


      • #4
        It dawns on me that I misunderstood your original post and thought you were looking to calculate weighted averages of the Q13_x* variables. But on a more careful reading, I see you wanted weighted sums. This makes Nick's question about the meaning of that when there are missing values more pressing.

        You can take my code and remove the two lines

        Code:
          
           by id (_j): egen `x'_wt_total = total(`x'_wt*!missing(Q13_x))
           replace `x' = `x'/`x'_wt_total
        and that will leave you with weighted sums that, in effect, treat the missing values as if they were zero. But there aren't many circumstances where such sums are useful. So I think you should clarify what you want.


        Comment


        • #5
          Thanks so much, all! This was really helpful and I got my data to do what I wanted. Have a good one!

          Comment

          Working...
          X