Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: generate yearly average variable

    Hello there,

    I am currently working with a panel dataset, and am having trouble with an issue I hope some of you will have an asnwer to.
    First of all, my id variable is called pidp and my time variable is wave. I have data for 27 years (The wave variable goes from wave = 1 to wave = 27).
    I am working with a further variable, fimnlabgrs_dv, which expresses each individual's labour income.

    Also, I have another variable, called yrbracket, which expresses many age brackets. It basically tells us if the individual is of age under 16, between 16 and 24, between 25 and 34, between 35 and 49, between 50 and 64 or if the person is 65 or older.

    I am interested in the yearly average labour income for every yrbracket category; that is, I want to work with the yearly average labour income for people between 16 and 24 years of age; the yearly average labour income for people between 25 and 44 years of age and so on.
    What I'm looking to do is to have a variable which expresses the average labour income for every category, for every year. That is, I want a variable with 27 observations, where every observation expresses the yearly average labour income for a specific category and a specific year (the observations need to be 27 as I have data for 27 years). Of course, this means that there will be as many new variables as there are yrbracket categories, which therefore means 6 variables.In this way, I will then be able to plot these new variables and compared them overtime.

    Please do let me know if something was not clear in the description of my problem.

    Thank you in advance
    Last edited by Marco Sarandrea; 19 Aug 2021, 10:03.

  • #2
    If you have panel data and your goal is to graph mean income over time, you do not need to create extra variables. Stata has a command to create panel data line graphs. Otherwise, collapse will give you means by group. Here is an example.

    Code:
    webuse nlswork, clear
    gen wage= exp(ln_wage)
    *GRAPH MEAN WAGE OVER TIME BY RACE
    collapse wage, by(race year)
    xtset race year
    xtline wage, overlay scheme(s1mono) plot2opts(lp(-)) plot3opts(lp(-.-)) leg(row(1))
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	63.8 KB
ID:	1624131

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      If you have panel data and your goal is to graph mean income over time, you do not need to create extra variables. Stata has a command to create panel data line graphs. Otherwise, collapse will give you means by group. Here is an example.

      Code:
      webuse nlswork, clear
      gen wage= exp(ln_wage)
      *GRAPH MEAN WAGE OVER TIME BY RACE
      collapse wage, by(race year)
      xtset race year
      xtline wage, overlay scheme(s1mono) plot2opts(lp(-)) plot3opts(lp(-.-)) leg(row(1))
      [ATTACH=CONFIG]n1624131[/ATTACH]
      Thank you so much for your reply Andrew!

      Comment


      • #4
        Andrew Musau

        I was now wondering something else; I also have the variable gor_dv, expressing where a certain individual lives. Would it be possible to generate the same graph you showed me for the early average labour income for every yrbracket category and for where each person lives? So I would have the yearly average labour income for people between 16 and 24 years of age for Londoners, the yearly average labour income for people between 16 and 24 years of age for people in Manchester and so on.

        I tried running the following code
        Code:
        collapse fimngrs_dv [pweight = weight], by( yrbracket year gor_dv )
        but when I go to xtset the id variable with the following code

        Code:
        xtset yrbracket year, yearly
        it gives me the following error: "repeated time values within panel r(451)"
        Last edited by Marco Sarandrea; 20 Aug 2021, 10:10.

        Comment


        • #5
          Your panel identifier is a combination of two variables, so after the collapse, you want

          Code:
          egen pid = group(yrbracket gor_dv), label
          xtset pid year

          Comment


          • #6
            I realize that the proposal in #5 will still throw in an error. The sequence should be

            Code:
            preserve
            egen pid = group(yrbracket gor_dv), label
            collapse fimngrs_dv [pweight = weight], by(pid year)
            xtset pid year
            xtline fimngrs_dv, overlay
            restore

            Comment


            • #7
              Andrew Musau

              Thank you again for your replies! The code works perfectly fine.
              However, I realised that, since I have many categories for the variable gor_dv, the graph generated by your suggested code is too full of lines to be interpreted.

              My goal is to ultimately generate one specific graph for every gor_dv category, showing how the yearly average labour income for the different age categories evolved over time for that specific gor_dv category (where gor_dv categories indicate where a person lives).

              The only way I can think of to generate these graphs is to extract smaller datasets from the main dataset, one for every gor_dv category. Then, I would be able to just run the code you suggested in #2 for every single smaller dataset.

              Do you think this would work? How should I generate these dataset? Or do you think there is just an easier and better way to get around this problem?


              Comment


              • #8
                You want to omit the -overlay- option in xtline to get separate graphs.

                Code:
                xtline fimngrs_dv
                An alternative design is to plot each group with the other groups as backdrop. You can do this using fabplot from the Stata Journal, by Nick Cox.

                Code:
                search fabplot
                From the example in #2:

                Code:
                webuse nlswork, clear
                gen wage= exp(ln_wage)
                *GRAPH MEAN WAGE OVER TIME BY RACE
                collapse wage, by(race year)
                fabplot line wage year, by(race) frontopts(lw(thick) lc(black)) xtitle("") scheme(s1color)
                Click image for larger version

Name:	Graph.png
Views:	1
Size:	78.0 KB
ID:	1624320


                My goal is to ultimately generate one specific graph for every gor_dv category, showing how the yearly average labour income for the different age categories evolved over time for that specific gor_dv category (where gor_dv categories indicate where a person lives).
                You choose groups using the -if- qualifier. So keeping the variable gor_dv in the dataset, the code could be, e.g.,

                Code:
                xtline fimngrs_dv if gor_dv==1
                fabplot line fimngrs_dv year if gor_dv==1
                
                *OR
                xtline fimngrs_dv if gor_dv==1, overlay
                Last edited by Andrew Musau; 21 Aug 2021, 07:07.

                Comment


                • #9
                  Thank you so much! You really helped me a huge deal

                  Comment

                  Working...
                  X