Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a change variable

    Hello,

    I have entered macro level data relating to the employment rates by occupation and sex as a percentage of the total employment level in the economy via the data editor in Stata. The variables that I have now are: the survey year, sex, occupation and the employment rates by occupation and sex. I now want to create a new variable that will calculate the change in this rate since last year for each sex and occupation. For example, in 2005 what is the % change in the employment level of women working in teaching since 2004? Is there a way that Stata can do this calculation for me so I don't have to do it manually?
    Thanks in advance

  • #2
    So, this is time series (not panel) data? I think you want something like

    Code:
    tsset year
    gen pctchange = emplwomen/L.emplwomen - 1
    Once you tsset the data, the L. notation lets you access the lagged values of variables. So, for 2005, it would take the 2005 value of the variabgle divided by the 2004 variable,

    If, say, you had this kind of information for multiple countries, it would be a panel study, and the first command would be something like xtset country year.

    Looking at your question, it occurs to me that you might have multiple records for each year, e.g. 1 for women in farming, another for men in farming, etc. So, you may need to create a panel id, i.e. a unique value for each combination of gender and occupation. The code might be something like

    Code:
    egen panelid = group(sex occupation)
    xtset panelid year
    gen...
    If this doesn't work, then maybe show a few lines of your data and what you want to have happen.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Many thanks for this Richard- I am not sure if it applies in my case though. Basically, the reason I am creating a new macro spreadsheet in Stata and the reason I want to generate this new macro variable is because I want to merge this change in the employment rate by sex and occupation into my individual level panel dataset. Therefore, this data that I have now are not yet in the panel data set I will be using for my analysis- I am not going to do any analysis as such using this dataset, I just want to create this change variable in order to merge it in my individual level dataset and use it as an independent variable in my individual level regression. Here is an example of the data I have in my new spreadsheet from which I want to derive this new change variable
      Code:
      isco        yearofsurvey     sex       employmentrate_occupation_sex
      0              2005       1                                        .9343995
      0              2005       2                                        .0861992
      1              2005       1                                         2.271349
      1              2005       2                                         .3933558
      2              2005       1                                          6.39397
      2              2005       2                                          6.612916
      0              2006       1                                         .9093683
      0              2006       2                                          .1181143
      1              2006       1                                          2.730904
      1              2006       2                                          .5259166
      2              2006       1                                           6.349624
      2              2006       2                                           7.250315
      thanks again

      Comment


      • #4
        It still looks to me like my suggestion would work. The code would be

        Code:
        gen nrec = _n
        egen panelid = group(isco sex)
        bysort panelid yearofsurvey
        xtset panelid yearofsurvey
        gen pctchange = employmentrate_occupation_sex/L.employmentrate_occupation_sex - 1
        sort nrec
        This will add pctchange to each of your current records, and you can then remerge with your original panel data. It probably isn't necessary, but I added nrec so you can get back to your original sort order.

        Anyway, why don't you think it will work? Maybe I am missing something. You could also just try it and see if it works.

        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          If you are going to merge with an individual level data set, my guess is you will do something like

          Code:
          use myindivdata
          merge m:1 isco sex yearofsurvey using sexoccyeardata
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Hi Richard and thanks again.
            I tried your code and I did not seem to be getting the correct rate (I was double checking by manually deducting the 2005 from the 2006 rate etc). I then noticed that you divide the current rate of employment by occupation and sex by its lag minus one by doing
            Code:
             
             gen pctchange = employmentrate_occupation_sex/L.employmentrate_occupation_sex - 1
            and I am confused as to why. Doesn't that mean I am taking the difference between 2007 and 2005 instead of 2007 and 2006 (i.e. going back 2 periods-the lag minus 1 instead of just 1)? Instead, I tried doing
            Code:
            . gen pctchange = employmentrate_occupation_sex- L.employmentrate_occupation_sex
            instead and I seem to be getting the correct numbers. Does this make sense? (i mean what I did) Thanks again

            Comment


            • #7
              No, I am just going back one time period. The minus 1 literally means to subtract 1 from the calculated value, e.g. if the current and lagged values were 63 and 60, the ratio would be 1.05, and 1.05 - 1 = .05, which (when multipled by 100) is the percent increase in the value across time.

              The main difference is that I thought your question implied forming a ratio between two values, whereas you thought subtracting one from the other was the right thing to do. You said originally that "in 2005 what is the % change in the employment level of women working in teaching since 2004" There is some ambiguity in what that means, and I have been told that I should make statements like "percentage point change" rather than "% change". (e.g. if employment went from 60% to 63%, you could say that there was a 5% increase (63/60). Or, you could say that the employment rate went up by three percentage points.

              In any event, you know what you want, so if you are happy with the results then great.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Incidentally, if for some reason you did want to do a two year lag, you would use L2. instead, e.g.

                [CODEgen pctchange = employmentrate_occupation_sex- L2.employmentrate_occupation_sex][/CODE]
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Many thanks Robert it is clear now

                  Comment


                  • #10
                    Richard Williams
                    Hi Richard,

                    I have a panel data with similar situation here.
                    The data set contains company identifiers, fiscal year end and total assets.
                    I'm looking to create a change variable indicating the change of total assets from year t-1 to year t.

                    an extract of the data looks like this:
                    isin total_assets fye
                    "DE0003304002" 564254 Dec 31 05
                    "DE0003304002" 621866 17166 Dec 31 06
                    "DE0003304002" 967840 17531 Dec 31 07
                    "DE0003304002" 1044262 17897 Dec 31 08
                    "DE0003304101" 232795 17105 Oct 31 06
                    "DE0003304101" 270419 17470 Oct 31 07
                    "DE0003304101" 296583 17836 Oct 31 08
                    "DE0003304101" 290596 18201 Oct 31 09
                    Given your solution to Christiana,I create code as follows:

                    egen long panelid = group(isin)
                    xtset panelid fye
                    bysort panelid fye:gen chTA = total_assets-L.total_assets

                    it shows: "1,034 missing values generated",which suggests the code was unsuccessful.

                    Do you have any suggestion?

                    Thank you in advance
                    Last edited by Lang Ding; 26 May 2017, 13:46.

                    Comment


                    • #11
                      How many panels do you have? The first record for each panel should be missing because there is no lagged value for it.

                      Also i don't think you want the bysort part. In my example I did a bysort before the xtset, not after.
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      StataNow Version: 19.5 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://www3.nd.edu/~rwilliam

                      Comment


                      • #12
                        The question in #10 was re-asked a half hour later, and received an answer shortly after that, at http://www.statalist.org/forums/foru...-between-years.

                        Bumping is discouraged here. This is not a help-desk with employees paid to answer user questions. This is a forum where people come and spend as much or as little time as they please. Patience is a virtue; and re-asking within a half hour is simply unreasonable in this setting. If several hours were to go by with no response, then asking again might make sense. But even then, before just re-posting the same question, it would be better to consider why no response has been forthcoming. That might lead you to revise the question when re-posting. The FAQ are a must-read and contain excellent advice on how to pose questions that have a good probability of drawing a timely and helpful response.

                        Comment

                        Working...
                        X