Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating the percentage changes with longitudinal data (not a panel)

    Hi,

    My name is Jiwan. I'm currently working on Korean Household Income and Expenditure Survey, to see the effect of COVID-19 on households' changes in their income and public transfer by comparing 2019 vs. 2020 and 1Q vs. 2Q, with subgroups of income quantile and employment status.
    It is longitudinal data, but not panel data, with long-form.
    I would like to see the descriptive statistics of how their income has changed, calculating the percentage changes.

    First, I would like to make a table for the percentage changes in income for 2020-1Q(compared with 2019-1Q) and 2020-2Q(compared with 2019-2Q) separately, with a subgroup of 20-quantile. I used this code:

    Code:
    bys year qt: sum pincome if q20_all==1
    bys year qt: sum pincome if q20_all==2
    bys year qt: sum pincome if q20_all==3
    bys year qt: sum pincome if q20_all==4
    bys year qt: sum pincome if q20_all==5
    but I need to do extra work in excel to calculate the percentage changes.
    I'm wondering if there is a more effective way to get what I want within Stata.

    Second, I would like to examine how the stimulus payment was effective to compensate for the income loss.
    So I want to calculate Compensation=benefit/income loss.

    Then I need to calculate the decrease in income and the increase in social benefit, each amount compared to the previous year, 2019.
    That is, I want to calculate: [Social benefit(2020)-Social benefit(2019)]/[Income(2020)-Income(2019)]
    Also, I'd like to see how this figures differ by the subgroups: employment status and 20-quantile.

    Code:
    bys year q20_all: sum pincome if empstat1==1
    bys year q20_all: sum benefit if empstat1==1
    
    bys year q20_all: sum pincome if empstat1==2
    bys year q20_all: sum benefit if empstat1==2
    
    bys year q20_all: sum pincome if empstat1==3
    bys year q20_all: sum benefit if empstat1==3
    
    bys year q20_all: sum pincome if empstat1==4
    bys year q20_all: sum benefit if empstat1==4
    
    bys year q20_all: sum pincome if empstat1==5
    bys year q20_all: sum benefit if empstat1==5
    Again, I wonder if there exists a way to get this number without doing extra works in Excel.

    Thank you so much for your time and consideration.
    I'm looking forward to hearing from anyone who would kindly help me!

    Jiwan

  • #2
    but I need to do extra work in excel to calculate the percentage changes.
    Yes, you can calculate percentage changes in Stata. And you should NEVER do data analysis in Excel, except for playing around. Any serious work should be done in software that leaves an audit trail, so that others can verify what you did (and so that you, yourself, can explain and reproduce the work should the need arise at some future time when you no longer remember the details. You should never use Excel for data analysis: it is fine for sharing data among people who do not use the same statistical package, and it is good for creating certain kinds of data displays. But it is not an appropriate analysis tool. Not only should you never use it for that yourself, you should never accept or rely on work produced by others in Excel.

    With that out of my system, I would be happy to show you how to do this, but I don't understand the question. Percentage change between what and what? You're calculating a bunch of summary statistics for variables pincome and benefit in a large number of subsets of the data, but you do not explain which ones to pair up for the purpose of calculating a percentage change. Is it year on year? or q20_all on q20_all? Or empstat1 on empstat1? Or qt on qt? Or perhaps something else?

    Also, as you are asking for help with code, to avoid wasting your time and mine, show example data. That way, I can write code that will work with the data you have, rather than the data I imagine you might have. To show example data, use the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi Clyde,

      I appreciate your comments and advice. I understand what you warn about using Excel and I totally get it.

      For further explains the variables,

      year: either 2019 or 2020
      qt: quarter (there are either 1st quarter (Jan-Mar) and 2nd quarter (April - June)
      q20_all : 5 quantile based on income for all households
      empstat1: 1(regular worker), 2(temporary worker), 3(daily worker), 4(self-employe with employee), 5(self-employee without employee)
      pincome: primary income
      benefit: social benefit

      To share the example data, it is as follows,

      I used the code
      Code:
      dataex year qt q20_all empstat1 pincome benefit
      and get this:

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(year qt) byte(q20_all empstat1) float pincome long benefit
      2019 1 1 .   233333       0
      2019 1 3 2  3645513       0
      2019 1 1 .        0       0
      2019 1 3 1  2619490  566667
      2019 1 1 1  1539387       0
      2019 1 1 .  1200000       0
      2019 1 2 .  1500000       0
      2019 1 5 1  7079654  500000
      2019 1 3 1  3085333       0
      2019 1 3 1  3385364       0
      2019 1 1 5  1100000       0
      2019 1 4 .  5000000  100000
      2019 1 5 1  7527653  100000
      2019 1 5 5 10483333       0
      2019 1 5 1  6997680       0
      2019 1 4 1  4830507       0
      2019 1 2 5  2270000       0
      2019 1 2 3  2261523  250000
      2019 1 3 1  3152387       0
      2019 1 1 .  1300000       0
      2019 1 3 5  3586500       0
      2019 1 5 .  6051654 1597820
      2019 1 4 1  4980000       0
      2019 1 5 1 10364918  302137
      2019 1 3 5  2923334       0
      2019 1 1 .   450000       0
      2019 1 3 .        0       0
      2019 1 5 2 11530000       0
      2019 1 2 3  1733333       0
      2019 1 3 5  3183333       0
      2019 1 5 3  8433333       0
      2019 1 1 .        0  264450
      2019 1 1 2   898333       0
      2019 1 1 .   200000  607980
      2019 1 1 3  1586667       0
      2019 1 5 1 10855284  100000
      2019 1 1 .        0       0
      2019 1 5 1  9524372       0
      2019 1 5 5 1.21e+07       0
      2019 1 5 1  8200000       0
      2019 1 5 1 12514927       0
      2019 1 2 5  1283333       0
      2019 1 4 5  4646666       0
      2019 1 2 3  2500000       0
      2019 1 1 .        0       0
      2019 1 5 1 11588484       0
      2019 1 5 1 11277657       0
      2019 1 1 .   900000       0
      2019 1 4 2  5093957       0
      2019 1 5 1 1.00e+07       0
      2019 1 1 .   866667       0
      2019 1 2 3  1600000       0
      2019 1 3 1  2000000       0
      2019 1 5 1  7722733       0
      2019 1 3 2  3800000       0
      2019 1 3 5  3500000       0
      2019 1 1 2  1200000       0
      2019 1 3 1  3981667       0
      2019 1 3 5  4083667       0
      2019 1 1 .   266667       0
      2019 1 1 3   763380       0
      2019 1 1 .        0       0
      2019 1 2 2  1760018       0
      2019 1 1 3   505000       0
      2019 1 3 1  3479820  199500
      2019 1 4 1  4749500       0
      2019 1 3 5  2100000 1121320
      2019 1 1 .   700000       0
      2019 1 2 5  2550000       0
      2019 1 2 5  2441867       0
      2019 1 4 2  3750000       0
      2019 1 2 .  1390000  730617
      2019 1 5 1 13016667       0
      2019 1 5 5  6285000       0
      2019 1 5 1  6118141       0
      2019 1 2 2  2197500       0
      2019 1 3 2  2139367 1114560
      2019 1 1 .        0       0
      2019 1 1 .        0       0
      2019 1 1 3   800000       0
      2019 1 1 .   166667       0
      2019 1 1 .        0 1122843
      2019 1 2 2  1616667  200490
      2019 1 1 3  1030000       0
      2019 1 4 2  5040000       0
      2019 1 2 .    78067       0
      2019 1 1 7   892500       0
      2019 1 1 5   766667       0
      2019 1 1 .   500000  245473
      2019 1 1 2  1349700       0
      2019 1 4 1  4623607       0
      2019 1 2 5  2328300       0
      2019 1 4 1  4889000       0
      2019 1 5 1  7840000       0
      2019 1 2 2  2000000       0
      2019 1 1 .        0       0
      2019 1 5 1 16716667       0
      2019 1 4 5  4716667       0
      2019 1 5 1 16666670       0
      2019 1 2 1  2073333       0
      end
      label values q20_all v108CD
      label def v108CD 1 "1st quantile", modify
      label def v108CD 2 "2nd quantile", modify
      label def v108CD 3 "3rd quantile", modify
      label def v108CD 4 "4th quantile", modify
      label def v108CD 5 "5th quantile", modify
      label values empstat1 v20CD
      label def v20CD 1 "regular worker", modify
      label def v20CD 2 "temporary worker", modify
      label def v20CD 3 "daily worker", modify
      label def v20CD 4 "self-employee with employee", modify
      label def v20CD 5 "self-employee without employee", modify
      ------------------ copy up to and including the previous line ------------------

      To further specify what I want to calculate, I show the table that I want to make:
      for the percentage changes for pincome values of each period (comparing 2019-1Q to 2020-1Q and 2019-2Q to 2020-2Q)
      Click image for larger version

Name:	table format..jpg
Views:	1
Size:	14.9 KB
ID:	1579578



      For the second question, I would like to calculate changes in pincome and benefit compared 2020 to 2019, with subgroups by employment status.
      Again, the table that I would like to make is as follows:
      Click image for larger version

Name:	table format2..jpg
Views:	1
Size:	43.8 KB
ID:	1579579


      I apologize if there are still some parts that I didn't articulate and would be happy to clarify more on my data or question.
      I sincerely appreciate your help!

      Jiwan

      Comment


      • #4
        OK. Your example data includes only year 2019 and qt 1, so in order to develop and test the code, I changed those variables so I would have both years and 2 quarters to work with. I also noticed that your empstat1 variable has a number of missing values, and also there is one observation where it is coded as "7", which seems odd. But I didn't change any of those things.

        The hardest part of this is re-arranging the data layout so as to get the tables displayed the way you want. The actual calculation of the percent change and compensation rate variables is very simple, as you will see below.

        Code:
        assert inlist(year, 2019, 2020)
        //  FIRST TABLE
        preserve
        collapse (mean) pincome, by(qt q20_all year)
        by qt q20_all (year): gen pct_change = 100*(pincome[2]-pincome[1])/pincome[1]
        reshape wide pincome, i(qt q20_all) j(year)
        rename pct_change pincomepct_change
        reshape long pincome, i(qt q20_all) j(stub) string
        tabdisp stub qt q20_all, c(pincome)
        
        restore, preserve
        //  SECOND TABLE
        collapse (mean) pincome benefit, by(year empstat1)
        by empstat1 (year), sort: gen benefit_change = benefit[2] - benefit[1]
        by empstat1 (year): gen pincome_change = pincome[2] - pincome[1]
        gen compensation_rate = benefit_change/pincome_change
        rename (benefit* pincome* compensation_rate ) cell=
        reshape long cell, i(empstat1 year) j(stub) string
        replace cell = . if year == 2020 & !inlist(stub, "pincome", "benefit")
        label define stub 1 "benefit"   2   "pincome"   3   "benefit_change"    ///
            4   "pincome_change"    5   "compensation_rate"
        encode stub, gen(stub1) label(stub1)
        tabdisp stub1 year empstat1, c(cell)
        restore
        Note: the formatting of the numbers in these tables is left to Stata's default. You can specify a -format()- option in the -tabdisp- commands to get them to look nicer.

        Comment


        • #5
          Clyde,

          Thank you so much for such incredible advice!

          While I was waiting for your reply, I tried putexcel to get what I want:

          Code:
          *1st quantile
          quietly: sum p_inc [aw=wgt] if yearqt==201914 & q20_all==1
          scalar mean1=r(mean)
          putexcel B4=(mean1)
          
          quietly: sum p_inc [aw=wgt] if yearqt==202014 & q20_all==1
          scalar mean2=r(mean)
          putexcel B5=(mean2)
          
          putexcel B6=((mean2-mean1)/mean1*100)
          
          quietly: sum p_inc [aw=wgt] if yearqt==201924 & q20_all==1
          scalar mean1=r(mean)
          putexcel C4=(mean1)
          
          quietly: sum p_inc [aw=wgt] if yearqt==202024 & q20_all==1
          scalar mean2=r(mean)
          putexcel C5=mean2
          
          putexcel C6=((mean2-mean1)/mean1*100)
          Code:
          quietly: sum benefit [aw=wgt] if year==2019 & worktype==1
          scalar mean1=r(mean)
          putexcel D8=(mean1)
          
          quietly: sum benefit [aw=wgt] if year==2020 & worktype==1
          scalar mean2=r(mean)
          putexcel E8=(mean2)
          
          putexcel D10=(mean2-mean1)
          
          quietly: sum p_inc [aw=wgt] if year==2019 & worktype==1
          scalar mean3=r(mean)
          putexcel D9=(mean3)
          
          quietly: sum p_inc [aw=wgt] if year==2020 & worktype==1
          scalar mean4=r(mean)
          putexcel E9=(mean4)
          
          putexcel D11=(mean4-mean3)
          
          putexcel D12=((mean2-mean1)/(mean4-mean3))
          But of course, it is also very inefficient, as I need to name all the cells.
          So I was wondering how to make it more efficient, and what you have just shared is exactly what I needed.
          You just saved so much of my efforts and time, including not only this analysis but also all my future works.

          I appreciate your consideration and kindness to share your knowledge!

          Jiwan

          Comment


          • #6
            I appreciate your consideration and kindness to share your knowledge!
            I don't normally respond to a thank-you remark, but I was struck by your mention of sharing knowledge. Sharing knowledge is exactly what this Forum, and others like it elsewhere, are about.

            Every day, I am reminded that if I have a fund of useful knowledge to share, it is because others have been willing to share their knowledge with me.

            Comment

            Working...
            X