create a new variable from weighted variables?

Cristian Popa

Join Date: Sep 2017

Posts: 21
#1

create a new variable from weighted variables?

26 Sep 2019, 03:49

Hello everyone, I'm now stuck into this one:

I have a data-set and a (sample) weight variable. Due to the sample design I have to weight for all my procedures.
Now I have to generate a new variable (v1) based on a condition using other two variables in the data-set, this new variable being used later in some analysis (logistic regression etc):

gen byte v1 = 0
replace v1=1 if days >300&days< 500 & condition ==1

v1 is byte, days is continuous (number of days), condition is categorical, with 5 levels (0, 1, 2, 3, 4)

However, gen egen / replace doesn't accept weights and declaring survey is not working since svy: doesn't work with gen egen.

Generating v1 from unweighted days and condition doesn't seem right.

Is there a way around? Or something more general like "weight cases" in SPSS (turning weight on/off)? Or am I missing something?

Thank you!
Cristian
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3856
#2

26 Sep 2019, 04:02

Usually, you weight cases/observations not variables. Therefore, you use weights for analyses not for data management tasks.

In your example, neither the values of days nor that of condition should depend on the sampling weights. If you think this is not so, please explain why; say more about your dataset or better yet, provide an example.

Best
Daniel
1 like
Comment
Cristian Popa

Join Date: Sep 2017

Posts: 21
#3

26 Sep 2019, 04:26

Who said "values"? Not the "values" but the (i.e.) frequencies. And, yes, the frequencies [of the cases within those variables] do affect the output variable.

PS what do you mean by "data management"? Weighting per-se and reporting weighted frequencies isn't a case of "data management"?

Regards,
C.

Last edited by Cristian Popa; 26 Sep 2019, 04:34.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3856
#4

26 Sep 2019, 04:43

I am sorry, I do not understand what you mean. When you

Code:

gen byte v1 = 0 replace v1=1 if days >300&days< 500 & condition ==1

you are creating a new variable, v1, that holds values (0 and 1 up to that point).

In fact, variables always hold values. In general, those values could be frequencies but if this is so, you need to explain that. Moreover, frequencies are the result of analyses and the latter should be weighted. I do not see how the code would create frequencies.

Best
Daniel
1 like
Comment
Cristian Popa

Join Date: Sep 2017

Posts: 21
#5

26 Sep 2019, 05:35

Not the values are the problem, they are exactly the same, weighted or unweighted; But their frequency. A variable generated with a Boolean condition between two unweighted variables, and then weighted will might slightly differ [in term of it's values frequencies] from one generated from the same variables prior- adjusted by frequency weight. Which I am looking for is the later approach - this wants to be the topic about...
However the question is: Is it a way to generate a new variable from existing variables using weight? (or a way around).
Thank you!
Comment

daniel klein

Join Date: Mar 2014
Posts: 3856

26 Sep 2019, 06:06

Again, I am sorry but I do not follow. Please provide a simple example that illustrates what you have and what you (think) you want.

Here is an example that (hopefully) illustrates what I do not understand

Code:

// example data
clear
input days condition weight
365 1 .42
365 0 .73
  1 0 .42
  1 0 .73
 end

// your code
gen byte v1 = 0
replace v1=1 if days >300&days< 500 & condition ==1

// the results
list

Here is the output

Code:

. list

     +-------------------------------+
     | days   condit~n   weight   v1 |
     |-------------------------------|
  1. |  365          1      .42    1 |
  2. |  365          0      .73    0 |
  3. |    1          0      .42    0 |
  4. |    1          0      .73    0 |
     +-------------------------------+

How would you want v1 to look instead? How would it depend on weight? And, why do you think that it should?

Best
Daniel

Comment

Cristian Popa

Join Date: Sep 2017

Posts: 21
#7

26 Sep 2019, 06:59

Daniel, clearly we're talking completely different things, please re-read my initial entry, I think it would be clear. If not, my bad. Thank you for your efforts!
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#8

26 Sep 2019, 07:28

It's not clear.
Daniel's suggestion that you "provide a simple example that illustrates what you have and what you (think) you want" really is the best way forward.
An actual data example is almost always more clear then lengthy descriptions.

You could also help clarify by turning the statement:

Code:

However, gen egen / replace doesn't accept weights and declaring survey is not working since svy: doesn't work with gen egen. Generating v1 from unweighted days and condition doesn't seem right.

Into code that you tried, even if that didn't work. That would narrow down or eliminate a lot of guesswork for people trying to help answer this.
Comment
Cristian Popa

Join Date: Sep 2017

Posts: 21
#9

26 Sep 2019, 08:03

Thank you, too, Jorrit, I'm afraid an example wouldn't be much clearer than this or the fact that gen, egen and svy are not accepting weights. The rest is just around this problem.

I'll let this one here just for a bit, maybe someone considered at one point this problem and makes some sense for her/him, otherwise I will deleting this post entirely as useless.

Best,
C.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

26 Sep 2019, 09:30

I'm afraid deleting a post won't dovetail with the didatic approach of this forum. What is more, it cannot be done in the majority of circunstances.

Since you're new in the Stata Forum/Statalist, I kindly suggest to read the FAQ carefully. They will convey the, well, spirit of the forum. What is more, there we find excellent tips on how to share data/command/output/pictures.

Meawhile, I share an excerpt of the FAQ, exactly the one about deleting posts:

16.2 What can you delete

Starting a thread does not convey ownership of that thread. Re-opening a thread by yourself or others is always allowed, and encouraged when any one has something relevant to add, say by reporting another solution, an update of a program, or a very similar question. Lapse of time is often not important: for example, it's fine to announce an update of a program in the same thread a few years after the original post. A new post always bumps a thread temporarily to the top of the list, so that additions can be noticed and read in context.

You cannot delete a thread you started. Please don't mangle your own posts starting a thread, even if you solved your problem yourself or realised that the question was silly. Explain the solution, even if it was trivial. Often someone else will have the same problem.

You can edit posts within a hour of posting. This allows fixes of many kinds, such as typo corrections, extra detail, or improved wording. Such edits within an hour include being able to delete any post that does not start a thread.

Best regards,

Marcos
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35700
#11

26 Sep 2019, 09:52

I don't understand what is being sought here either but the implication that generate and egen do not accept weights is only correct in the syntactic sense that qualifiers of the form [weightword=exp] are not allowed. A very good reason for that is that many if not most needs under that heading are likely to be soluble with statements such as

Code:

generate newvar = weightvar * oldvar

and it's not a great problem that some calculations may require two or more steps, say to scale weights to some sum.
1 like
Comment

Cristian Popa

Join Date: Sep 2017
Posts: 21

#12

26 Sep 2019, 11:23

Hello, Nick,

Thanks for the input.

Code:

 generate newvar = weightvar * oldvar

is just multiplying the value of the weight variable with the numeric code of the variable (or the value)

example:

Code:

gen newvar = sample * wt
list in 7/11

     +------------------------------------+
     | sample   pop         wt     newvar |
     |------------------------------------|
  7. |      1     2   .5555556   .5555556 |
  8. |      1     2   .5555556   .5555556 |
  9. |      1     2   .5555556   .5555556 |
 10. |      2     2   1.363636   2.727273 |
 11. |      2     2   1.363636   2.727273 |
     +------------------------------------+

Considering the mean frequency, it work, but its' not what am I looking for.
I will try a different strategy, to sub-sample to match the population.

Best regards,
Cristian

Last edited by Cristian Popa; 26 Sep 2019, 11:25.

Announcement