Generating new variables based on calculation from existing variables

Akissi Amon

Join Date: Oct 2017
Posts: 42

Generating new variables based on calculation from existing variables

12 May 2018, 11:09

Dear STATALIST,

I am coming to you with an embarrassingly simple request.

I have three variables (resp1, resp2, resp3), representing 3 response options that each voter could pick (only one response) at each voting round. The votes were conducted in households, and in each household, all eligible inhabitants could vote (so more than one voter per household). The voting round is represented by variable round (4 voting rounds), and variable hhid is household ID. So in each household and at each round, you have x number of voters who picked resp1, y number who picked resp2, etc. Variable nbvoter is the number of voters per household.

All I wish to do is, for each voting round, calculate the total number of respondents who voted for each response option, so that I could say for instance: At round 1, n (30%) voters voted for resp1, n (10%) voters for resp2, and n (60%) voters for resp3.

What STATA is doing is, for resp1 for instance, calculating the number of times resp1 was not picked, the number of times it was picked by a single person, the number of times it was picked by 2 respondents, etc (so frequencies). However, given the variables are not categorical but continuous, it just does not make sense. For each voting round, and by answer option, I need to compute the total number of respondents who picked, say resp1, and divid it by the total number of voters. I will also have to use svy to compute 95% CI around the computed proportions to account for repeated observations within households, and the data is svyset.

I have tried to collapse the data several times in different ways to generate new variables which would enable me to solve the issue, but to no avail. I would much appreciate your help.

Here is the dataset.

Thank you very much again.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 hhid float(resp1 resp2 resp3 nbvoter round)
"k001"    0 9  9  9 1
"a001"    1 9 16 17 1
"e001"    0 2  5  7 1
"1741001" 0 0 15 15 1
"2989001" 0 3  6  9 1
"2267001" 3 1  9 13 1
"3184001" 0 1  6  7 1
"s001"    1 3  9 13 1
"3018001" 0 2  3  5 1
"2759001" 0 1 20 21 1
"1479001" 4 7  5 16 1
"2835001" 2 9 14 16 1
"2441001" 0 9  8  8 1
"1760001" 0 3  7 10 1
"v001"    0 9 11 11 1
"y001"    0 9  6  6 1
"1713001" 0 1  7  8 1
"l001"    0 6  1  7 1
"3124001" 3 0 15 18 1
"n001"    2 2  4  8 1
"2761001" 0 4  7 11 1
"3612001" 0 1 12 13 1
"w001"    0 9 10 10 1
"3612002" 2 3  8 13 2
"3124002" 0 0  6  6 2
"1741002" 0 1  7  8 2
"3184002" 0 0  6  6 2
"2989002" 0 0  6  6 2
"y002"    0 2  7  9 2
"s002"    1 0  7  8 2
"2441002" 0 0 10 10 2
"e002"    1 1  4  6 2
"v002"    1 2  7 10 2
"2267002" 0 1  9 10 2
"1479002" 0 3  6  9 2
"2835002" 0 0  6  6 2
"w002"    0 1  9 10 2
"1713002" 0 2  8 10 2
"l002"    0 5  1  6 2
"3018002" 1 0  6  7 2
"a002"    0 4  1  5 2
"k002"    0 0  7  7 2
"2759002" 0 0  9  9 2
"n002"    0 3  6  9 2
"1760002" 1 0  7  8 2
"2761002" 0 0  9  9 2
"e003"    0 0  7  7 3
"2759003" 0 0  7  7 3
"3124003" 0 0 12 12 3
"2835003" 0 0  6  6 3
"n003"    0 0  6  6 3
"s003"    0 0 11 11 3
"w003"    1 0  7  8 3
"a003"    0 0  7  7 3
"2989003" 0 0  6  6 3
"3612003" 1 0  5  6 3
"1713003" 0 0  8  8 3
"2441003" 0 0  5  5 3
"k003"    0 1  7  8 3
"3018003" 1 2  7 10 3
"1479003" 0 2  8 10 3
"2267003" 2 1  8 11 3
"1741003" 0 1  6  7 3
"y003"    0 2  5  7 3
"1760003" 0 0  5  5 3
"3184003" 0 0  6  6 3
"l003"    0 0  8  8 3
"2761003" 1 0  5  6 3
"v003"    0 0  6  6 3
"a004"    0 0  6  6 4
"2441004" 0 1  8  9 4
"1713004" 0 1 10 11 4
"3184004" 0 0  6  6 4
"k004"    0 1  7  8 4
"1760004" 0 1  8  9 4
"3018004" 0 0  7  7 4
"1479004" 2 3  6 11 4
"1741004" 0 0 10 10 4
"2989004" 0 0  6  6 4
"3612004" 0 4  7 11 4
"2835004" 1 1  5  7 4
"n004"    0 0 12 12 4
"l004"    1 0  9 10 4
"v004"    0 0  5  5 4
"2267004" 0 1  9 10 4
"e004"    0 1  8  9 4
"y004"    0 1  7  8 4
"s004"    0 0  8  8 4
"2759004" 0 3  8 11 4
"w004"    3 2  5 10 4
"3124004" 0 0 13 13 4
"2761004" 0 0  9  9 4
end

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35651

12 May 2018, 11:28

Is this part of what you want?

Code:

forval j = 1/3  {
    egen total`j' = total(resp`j') , by(round)
}

tabdisp round, c(total*)

----------------------------------------------
    round |     total1      total2      total3
----------+-----------------------------------
        1 |         16         100         205
        2 |          7          28         152
        3 |          6           9         158
        4 |          7          20         179
----------------------------------------------

Code:

. drop total?

. reshape long resp , i(hhid) j(question)
(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       92   ->     276
Number of variables                   6   ->       5
j variable (3 values)                     ->   question
xij variables:
                      resp1 resp2 resp3   ->   resp
-----------------------------------------------------------------------------

. tab question [w=resp]
(frequency weights assumed)

   question |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         36        4.06        4.06
          2 |        157       17.70       21.76
          3 |        694       78.24      100.00
------------+-----------------------------------
      Total |        887      100.00

. tab round question [w=resp]
(frequency weights assumed)

           |             question
     round |         1          2          3 |     Total
-----------+---------------------------------+----------
         1 |        16        100        205 |       321
         2 |         7         28        152 |       187
         3 |         6          9        158 |       173
         4 |         7         20        179 |       206
-----------+---------------------------------+----------
     Total |        36        157        694 |       887

You can push tabulate further to get percents in rows and/or columns.

I don't follow any of this (NB edit)

What Stata is doing is, for resp1 for instance, calculating the number of times resp1 was not picked, the number of times it was picked by a single person, the number of times it was picked by 2 respondents, etc (so frequencies). However, given the variables are not categorical but continuous, it just does not make sense.

This seems to refer to some code you tried but don't show us. I don't see any continuous variables here in any case.

Comment

Akissi Amon

Join Date: Oct 2017

Posts: 42
#3

12 May 2018, 11:34

Dear Nick Cox,

Thank you so so much for such a rapid response. I am going to take a few minutes to look at all you did and will get back to you very shortly.

I apologise, I meant to say discrete variables not continuous variables.

Thank you very much.

Kind regards
Comment
Akissi Amon

Join Date: Oct 2017

Posts: 42
#4

12 May 2018, 12:13

Dear Nick Cox,

Thank you so much again for your help. I went for the second solution that you proposed. I just cross-checked the above data with excell and I am finding the exact same total values for each variables. Thank you so much.

The only thing I am now not sure about is regarding the 95% CI. Normally, I use robust standard errors (so svy command), when it is survey data, and I would use this command (for example):
svy: proportion varX. However, based on the reshape done and new variables generated, I am not sure how to go about it. Could you please advise me?

Thank you very much.

Kind regards
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#5

12 May 2018, 13:09

Sorry, no. svy lies beyond terrritory I have ever surveyed.
Comment
Akissi Amon

Join Date: Oct 2017

Posts: 42
#6

12 May 2018, 13:12

Dear Nick Cox,

Thank you very much for your help again. I shall manually calculate the CIs.

Kind regards.
Comment

Announcement

Generating new variables based on calculation from existing variables

Comment

Comment

Comment

Comment

Comment