Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • create a new variable from weighted variables?

    Hello everyone, I'm now stuck into this one:

    I have a data-set and a (sample) weight variable. Due to the sample design I have to weight for all my procedures.
    Now I have to generate a new variable (v1) based on a condition using other two variables in the data-set, this new variable being used later in some analysis (logistic regression etc):

    gen byte v1 = 0
    replace v1=1 if days >300&days< 500 & condition ==1


    v1 is byte, days is continuous (number of days), condition is categorical, with 5 levels (0, 1, 2, 3, 4)

    However, gen egen / replace doesn't accept weights and declaring survey is not working since svy: doesn't work with gen egen.

    Generating v1 from unweighted days and condition doesn't seem right.

    Is there a way around? Or something more general like "weight cases" in SPSS (turning weight on/off)? Or am I missing something?

    Thank you!
    Cristian


  • #2
    Usually, you weight cases/observations not variables. Therefore, you use weights for analyses not for data management tasks.

    In your example, neither the values of days nor that of condition should depend on the sampling weights. If you think this is not so, please explain why; say more about your dataset or better yet, provide an example.

    Best
    Daniel

    Comment


    • #3
      Who said "values"? Not the "values" but the (i.e.) frequencies. And, yes, the frequencies [of the cases within those variables] do affect the output variable.

      PS what do you mean by "data management"? Weighting per-se and reporting weighted frequencies isn't a case of "data management"?

      Regards,
      C.
      Last edited by Cristian Popa; 26 Sep 2019, 04:34.

      Comment


      • #4
        I am sorry, I do not understand what you mean. When you

        Code:
        gen byte v1 = 0
        replace v1=1 if days >300&days< 500 & condition ==1
        you are creating a new variable, v1, that holds values (0 and 1 up to that point).

        In fact, variables always hold values. In general, those values could be frequencies but if this is so, you need to explain that. Moreover, frequencies are the result of analyses and the latter should be weighted. I do not see how the code would create frequencies.

        Best
        Daniel

        Comment


        • #5
          Not the values are the problem, they are exactly the same, weighted or unweighted; But their frequency. A variable generated with a Boolean condition between two unweighted variables, and then weighted will might slightly differ [in term of it's values frequencies] from one generated from the same variables prior- adjusted by frequency weight. Which I am looking for is the later approach - this wants to be the topic about...
          However the question is: Is it a way to generate a new variable from existing variables using weight? (or a way around).
          Thank you!

          Comment


          • #6
            Again, I am sorry but I do not follow. Please provide a simple example that illustrates what you have and what you (think) you want.

            Here is an example that (hopefully) illustrates what I do not understand

            Code:
            // example data
            clear
            input days condition weight
            365 1 .42
            365 0 .73
              1 0 .42
              1 0 .73
             end
            
            // your code
            gen byte v1 = 0
            replace v1=1 if days >300&days< 500 & condition ==1
            
            // the results
            list
            Here is the output

            Code:
            . list
            
                 +-------------------------------+
                 | days   condit~n   weight   v1 |
                 |-------------------------------|
              1. |  365          1      .42    1 |
              2. |  365          0      .73    0 |
              3. |    1          0      .42    0 |
              4. |    1          0      .73    0 |
                 +-------------------------------+
            How would you want v1 to look instead? How would it depend on weight? And, why do you think that it should?

            Best
            Daniel

            Comment


            • #7
              Daniel, clearly we're talking completely different things, please re-read my initial entry, I think it would be clear. If not, my bad. Thank you for your efforts!

              Comment


              • #8
                It's not clear.
                Daniel's suggestion that you "provide a simple example that illustrates what you have and what you (think) you want" really is the best way forward.
                An actual data example is almost always more clear then lengthy descriptions.

                You could also help clarify by turning the statement:
                Code:
                However, gen egen / replace doesn't accept weights and declaring survey is not working since svy: doesn't work with gen egen.
                Generating v1 from unweighted days and condition doesn't seem right.
                Into code that you tried, even if that didn't work. That would narrow down or eliminate a lot of guesswork for people trying to help answer this.

                Comment


                • #9
                  Thank you, too, Jorrit, I'm afraid an example wouldn't be much clearer than this or the fact that gen, egen and svy are not accepting weights. The rest is just around this problem.

                  I'll let this one here just for a bit, maybe someone considered at one point this problem and makes some sense for her/him, otherwise I will deleting this post entirely as useless.

                  Best,
                  C.

                  Comment


                  • #10
                    I'm afraid deleting a post won't dovetail with the didatic approach of this forum. What is more, it cannot be done in the majority of circunstances.

                    Since you're new in the Stata Forum/Statalist, I kindly suggest to read the FAQ carefully. They will convey the, well, spirit of the forum. What is more, there we find excellent tips on how to share data/command/output/pictures.

                    Meawhile, I share an excerpt of the FAQ, exactly the one about deleting posts:

                    16.2 What can you delete

                    Starting a thread does not convey ownership of that thread. Re-opening a thread by yourself or others is always allowed, and encouraged when any one has something relevant to add, say by reporting another solution, an update of a program, or a very similar question. Lapse of time is often not important: for example, it's fine to announce an update of a program in the same thread a few years after the original post. A new post always bumps a thread temporarily to the top of the list, so that additions can be noticed and read in context.

                    You cannot delete a thread you started. Please don't mangle your own posts starting a thread, even if you solved your problem yourself or realised that the question was silly. Explain the solution, even if it was trivial. Often someone else will have the same problem.

                    You can edit posts within a hour of posting. This allows fixes of many kinds, such as typo corrections, extra detail, or improved wording. Such edits within an hour include being able to delete any post that does not start a thread.
                    Best regards,

                    Marcos

                    Comment


                    • #11
                      I don't understand what is being sought here either but the implication that generate and egen do not accept weights is only correct in the syntactic sense that qualifiers of the form [weightword=exp] are not allowed. A very good reason for that is that many if not most needs under that heading are likely to be soluble with statements such as


                      Code:
                      generate newvar = weightvar * oldvar
                      and it's not a great problem that some calculations may require two or more steps, say to scale weights to some sum.

                      Comment


                      • #12
                        Hello, Nick,

                        Thanks for the input.
                        Code:
                         generate newvar = weightvar * oldvar
                        is just multiplying the value of the weight variable with the numeric code of the variable (or the value)

                        example:
                        Code:
                        gen newvar = sample * wt
                        list in 7/11
                        
                             +------------------------------------+
                             | sample   pop         wt     newvar |
                             |------------------------------------|
                          7. |      1     2   .5555556   .5555556 |
                          8. |      1     2   .5555556   .5555556 |
                          9. |      1     2   .5555556   .5555556 |
                         10. |      2     2   1.363636   2.727273 |
                         11. |      2     2   1.363636   2.727273 |
                             +------------------------------------+
                        Considering the mean frequency, it work, but its' not what am I looking for.
                        I will try a different strategy, to sub-sample to match the population.

                        Best regards,
                        Cristian
                        Last edited by Cristian Popa; 26 Sep 2019, 11:25.

                        Comment

                        Working...
                        X