Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generate weighted variables

    Hi everyone,

    I need some help.
    I want to analyze the difference/change of life satisfaction between the years 2019 and 2020. Therefore I want to create a new variable based on the subtraction of life satisfaction in 2020 – life satisfaction in 2019, but with weights for both years.
    Weights are not allowed in the commands gen, egen and clone.
    How can I create a weighted life satisfaction variable for 2020 and 2019?
    I also tried this command: gen newvar_2019= var2019 * w2019, but it didn´t work.
    Life satisfaction is measured from 0 – 10 and my weight variables are w2019 and w2020.

    Thank you
    Kim

  • #2
    About the least effective way to ask for help is to say that something "didn't work." Stealing from Tolstoy, one might say that all working programs are the same, each failing program fails in its own way. What happened? Did Stata crash? Did it give you an error message--if so, what was the error message?" Did it execute the command without any error messages but produce results different from what you hoped for? If so, how did they differ? Did something else happen?

    The most effective way to ask for help is to show the actual code you ran and all of the output, including any error messages, that Stata gave you. In addition, if it produced results that differed from what you wanted, use the -dataex- command to show an example of the starting data and results and then, unless it's completely obvious even to people who know nothing about your project, explain what the difference was. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you for your answer. I will try to explain my problem more clearly
      First of all, here is a part of the output of the command dataex:

      input float(life_sat19 life_sat20 w2019 w2020)
      7 8 647.2 779.5
      5 6 1362.5 1122.8
      8 9 779.7 1349.7
      8 9 1164.1 1267.3
      9 6 997.1 930.4
      7 4 1001.6 764.1
      6 7 869.3 888
      5 5 1016.4 1097.6
      9 5 679.8 1222.8
      10 8 1132.7 907.8
      end

      I thought I can creat a weighted life satisfaction variable for the year 2019 with the following command:

      gen life_sat19_w = w2019 * life_sat19
      tab life_sat19_w in 1/10

      life_sat19_ |
      w | Freq. Percent Cum.
      ------------+-----------------------------------
      4530.4 | 1 10.00 10.00
      5082 | 1 10.00 20.00
      5215.8 | 1 10.00 30.00
      6118.2 | 1 10.00 40.00
      6237.6 | 1 10.00 50.00
      6812.5 | 1 10.00 60.00
      7011.2 | 1 10.00 70.00
      8973.899 | 1 10.00 80.00
      9312.8 | 1 10.00 90.00
      11327 | 1 10.00 100.00
      ------------+-----------------------------------
      Total | 10 100.00

      But this is not what I want. I need a Variable that looks like the following output, but I do not know how to transform this into a new variable:

      tab life_sat19 [aweight=w2019]

      life_sat19 | Freq. Percent Cum.
      ------------+-----------------------------------
      2 | .874568445 4.37 4.37
      5 | 3.47826542 17.39 21.76
      6 | 3.39014508 16.95 38.71
      7 | 3.53134491 17.66 56.37
      8 | 3.7440778 18.72 75.09
      9 | 3.8073171 19.04 94.13
      10 |1.174281251 5.87 100.00
      ------------+-----------------------------------
      Total | 20 100.00

      Comment


      • #4
        I think you are misconceptualizing what weights are and how they work. Weights do not work by modifying the values a variable takes on. Weights work by modifying how the individual values the variable takes on are used in the algorithms applied to those variables. You cannot emulate a weighted analysis by doing an unweighted analysis of modified values of the variable.

        So my question becomes, what is it you were hoping to do with your new variables? If you explain that, we can probably identify a way to get you those results. But it almost certainly will not involve modifying the values of the life_sat variables.

        Comment


        • #5
          okay, I will try to explain it differently.

          For my Analysis, I need a variable that indicates the life satisfaction change between 2019 and 2020.Therefore I have used the followings commands:

          gen life_sat_diff = life_sat20 - life_sat19
          tab life_sat_diff
          |
          ff | Freq. Percent Cum.
          ------------+-----------------------------------
          -7 | 1 5.00 5.00
          -4 | 2 10.00 15.00
          -3 | 3 15.00 30.00
          -2 | 2 10.00 40.00
          -1 | 1 5.00 45.00
          0 | 2 10.00 55.00
          1 | 6 30.00 85.00
          2 | 1 5.00 90.00
          4 | 1 5.00 95.00
          7 | 1 5.00 100.00
          ------------+-----------------------------------
          Total | 20 100.00

          Negative Values imply a reduction of life satisfaction and positiv values an increase of life-saticfaction. For my whole Analysis, I used weights to transform the results to the main population.
          It is possible to weight my life_sat_diff variable?

          Comment


          • #6
            If you are looking to get an estimate of the change in satisfaction, and you want your estimate to reflect sampling differences in the two years, then I think you have to do something a bit more complicated.

            Code:
            clear
            input float(life_sat19 life_sat20 w2019 w2020)
            7 8 647.2 779.5
            5 6 1362.5 1122.8
            8 9 779.7 1349.7
            8 9 1164.1 1267.3
            9 6 997.1 930.4
            7 4 1001.6 764.1
            6 7 869.3 888
            5 5 1016.4 1097.6
            9 5 679.8 1222.8
            10 8 1132.7 907.8
            end
            
            gen long obs_no = _n
            rename w20* w*
            reshape long life_sat w, i(obs_no) j(year)
            svyset [pweight = w]
            svy: mean life_sat, over(year)
            
            lincom _b[[email protected]] - _b[[email protected]]
            But I'm not sure this is right, because I find your data itself confusing. You have observations containing a value for both life_sat19 and life_sat20, which is only appropriate if they refer to the same person. But if the data really are paired measurements on the same person, then I do not understand why the sampling weights differ in 2019 and 2020: if the 2019 cohort is followed up in 2020 for repeat measure, the sampling weights should not change. This leads me to think that the life_sat19 and life_sat20 variables are not, in fact, measured on the same person. But if that's the case, they shouldn't be in the same observations the way you have them--it's a data layout that misrepresents the structure of the data itself. And you will have no end of difficulty doing correct analyses with the data arrayed this way. That's why the code above -reshape-s to long before proceeding, and from that point on treats the observations from 2019 as being independent of those from 2020.

            Anyway, you need to be clear whether this data represents repeated measurements in 2 years of the same cohort, or if this is 2 different cross sections. If it's the former, then my code here is incorrect, but I cannot comprehend why there would be differenrt weights for the two years. If it's the latter, then my code will do what you need (though it is not what you asked for), and you should keep the data in the long layout for all your analyses.

            Comment

            Working...
            X