Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help needed with weights and graphs

    Hello everyone,

    I am having an issue. I have a dataset of around 500,000 obs from SOEP for the period 1990-2020. I have to do some descriptive analysis of the average annual working hours for the period of 1990-2020. I am looking at the overtime difference (trend) of annual working hours between East and West for women. The variable I have for annual working hours is for individuals, however I need the average. I tried to generate the average by using the egen command and bysorting by year and region and I created a twoway line graph based on the new variable that I generated. However, I need to include weights in this graph. Stata does not allow me to use weights when generating the new variable, and nothing changes in the graph when I include weights in the code for my graph. In other words, I need to look at the weighted average annual working hours for the period 1990-2020 in East and West for women.

    I know there is another way to do this without generating a new variable which allows me to look at the average overtime and also allows me to include weights, but I cannot figure out the alternative.

    Would appreciate any help.

  • #2
    If I understand this correctly you need to generate a new variable.

    The lack of direct support for weights in egen is a minor barrier, as any weighted mean can be got as the ratio of

    total(value * weight)

    and

    total(weight)


    Using weights within a graph command is indeed not the answer here. However, I can’t follow what would be your weights so that is as far as I can go with advice.

    Comment


    • #3
      Hello,

      Thank you for your feedback.
      I will try to explain my problem into more details:
      As I said, I have data available for the annual work hours of individuals for the period 1990-2020. I am trying to analyze the difference in the annual work hours of women between East and West Germany during that time period. Therefore, I need to show a two-way line graph to see whether annual work hours in East and West Germany were different for mothers. I also need to include in my analysis the cross-sectional individual weight to compensate for unequal probabilities of selection and sample attrition.

      What I did is I generated a new variable (averageannualworkhours) using the following command: bysort year region: avannualworkhours= mean(annualworkhours) if sex==2 & annualworkhours>0.
      When I use the sum command for the avannualworkhours for a specific time period (.e.g.2010) and include weights, the mean does not change and there is no standard deviation present. Weights change the results only when I summarize the avannualworhours for the whole timeframe (1990-2020).
      So, when I try to create a graph, irrespective of whether I use weights or not, the graph lines do not change.

      I need to find a solution where I can look at the average or mean of the annual work hours for women during the period 1990-2020 in East and West Germany separately and create a graph that shows that there is a difference between these two regions and how the difference changed overtime.

      Hoping for an answer.

      Comment


      • #4
        Thanks for the reply but you're restating what is already clear from #1. Note for the record that

        Code:
        bysort year region: avannualworkhours= mean(annualworkhours) if sex==2 & annualworkhours>0.
        should be

        Code:
        bysort year region: egen avannualworkhours= mean(annualworkhours) if sex==2 & annualworkhours>0.
        although you should know that already as the first command would not work.

        What you need to tell us is how

        the cross-sectional individual weight to compensate for unequal probabilities of selection and sample attrition
        is held.

        Here is a dopey example of a weighted mean as compared with an unweighted mean:

        Code:
        . sysuse auto, clear
        (1978 automobile data)
        
        . egen numer = total(price * mpg), by(foreign rep78)
        
        . egen denom = total(price), by(foreign rep78)
        
        .
        . gen wtmean = numer / denom
        
        . egen mean = mean(mpg) , by(foreign rep78)
        
        . tabdisp rep78 foreign, c(mean wtmean)
        
        ------------------------------
        Repair    |
        record    |     Car origin    
        1978      | Domestic   Foreign
        ----------+-------------------
                1 |       21          
                  | 20.75715          
                  |
                2 |   19.125          
                  | 17.81852          
                  |
                3 |       19  23.33333
                  | 17.73544  23.21352
                  |
                4 | 18.44444  24.88889
                  | 18.07611  24.45236
                  |
                5 |       32  26.33333
                  | 32.10489  23.90356
                  |
                . |    23.25        14
                  | 23.59086        14
        ------------------------------
        Last edited by Nick Cox; 19 Feb 2023, 11:04.

        Comment


        • #5
          Hello again,

          Thank you for your reply.

          The weights are already defined as part of the dataset and included and calculated by SOEP.
          Sorry there was a typo and I did not include the egen command, but the code I ran does include it and everything runs smoothly.

          Since I am looking at the descriptive statistics when I run the following command for instance: sum annualworkhours if annualworkhours>0 & sex==2 & region==1 & year==1993, I get the mean of the variable based on the characteristics I have included in the command. However, when I add [aweight=weight], then the mean changes ( as we would expect).

          My issue is that when I generate the new variable ( the average of the annual work hours) and create the graph, then including the weight [aw=weight] in my command does not do anything- meaning that the mean of the new variable does not change.

          Maybe my approach with generating the averageannualworkhours as a new variable is not the right one for what I want to achieve since the generated variable is the same for a particular year and region and including the weight does not result in any changes.

          Therefore, I need to find a way to create a graph that shows the average of annual work hours ( with weights) for all women for the timeframe I mentioned.

          Thank you for your feedback thusfar. I apologize since I am fairly new at Stata.

          Kind regards,

          Comment


          • #6
            Sorry, but I don't think I can help further. I've already explained what to do as best I can.

            Comment

            Working...
            X