Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping a dataset to be able to draw a bar graph of the evolution of a number between 10 years for countries

    Hi Stata people;

    I'm working with the 13.1 version of Stata, and currently, I have this data:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str11 Country int Year long Working_Population_Size
    "Austria"     2025  5484805
    "Austria"     2030  5355853
    "Austria"     2035  5240943
    "Belgium"     2025  6818769
    "Belgium"     2030  6826206
    "Belgium"     2035  6862112
    "Bulgaria"    2025  4002863
    "Bulgaria"    2030  3797467
    "Bulgaria"    2035  3622577
    "Czechia"     2025  6393110
    "Czechia"     2030  6277926
    "Czechia"     2035  6186985
    "Germany"     2025 49745126
    "Germany"     2030 47791925
    "Germany"     2035 46073050
    "Denmark"     2025  3429683
    "Denmark"     2030  3411768
    "Denmark"     2035  3350290
    "Estonia"     2025   793657
    "Estonia"     2030   780194
    "Estonia"     2035   772710
    "Greece"      2025  5976428
    "Greece"      2030  5697049
    "Greece"      2035  5330785
    "Spain"       2025 29364596
    "Spain"       2030 29366370
    "Spain"       2035 28877088
    "Finland"     2025  3156273
    "Finland"     2030  3131609
    "Finland"     2035  3126658
    "France"      2025 37683453
    "France"      2030 37588765
    "France"      2035 37460086
    "Croatia"     2025  2185274
    "Croatia"     2030  2081006
    "Croatia"     2035  2001075
    "Hungary"     2025  5723325
    "Hungary"     2030  5659115
    "Hungary"     2035  5506887
    "Ireland"     2025  3091817
    "Ireland"     2030  3211759
    "Ireland"     2035  3315087
    "Italy"       2025 34393569
    "Italy"       2030 33625169
    "Italy"       2035 32478515
    "Lithuania"   2025  1697684
    "Lithuania"   2030  1575211
    "Lithuania"   2035  1481324
    "Latvia"      2025  1063358
    "Latvia"      2030   982122
    "Latvia"      2035   919676
    "Netherlands" 2025 10548429
    "Netherlands" 2030 10452819
    "Netherlands" 2035 10296713
    "Poland"      2025 22599106
    "Poland"      2030 21864912
    "Poland"      2035 21367713
    "Portugal"    2025  5998876
    "Portugal"    2030  5771592
    "Portugal"    2035  5513315
    "Romania"     2025 10909511
    "Romania"     2030 10601538
    "Romania"     2035  9996723
    "Sweden"      2025  6018870
    "Sweden"      2030  6176791
    "Sweden"      2035  6319983
    "Slovenia"    2025  1231363
    "Slovenia"    2030  1205780
    "Slovenia"    2035  1189029
    "Slovakia"    2025  3325406
    "Slovakia"    2030  3200343
    "Slovakia"    2035  3121427
    end
    It is a data showing the size of the working population (people aged from 20 years to 64 years), in millions, for each country (variable "Country"), and for each year for that country (variable "Year"). My goal is to have a bar graph showing the evolution of that size of the population between 2025 and 2035, in percentage (wether it was an increase or a decline) for each country, I want to calculate the evolution rate of that figure for each country and put that on a graph bar. I do believe that having the figure for the year 2030 could also help in having better calculations.

    Any help please? With many thanks!

  • #2
    There is much not explained here about exactly what you want, but this may help. It seems that you may seek help with the % change calculations too. Here % changes are from 2025 to 2030 and from 2025 to 2035. Change from 2030 to 2035 would be a different calculation.

    I have as often great difficulty in imagining that a bar chart would be a good idea here. Nor can I see that a reshape of the dataset is needed; as it is already in ideal long layout.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str11 Country int Year long Working_Population_Size
    "Austria"     2025  5484805
    "Austria"     2030  5355853
    "Austria"     2035  5240943
    "Belgium"     2025  6818769
    "Belgium"     2030  6826206
    "Belgium"     2035  6862112
    "Bulgaria"    2025  4002863
    "Bulgaria"    2030  3797467
    "Bulgaria"    2035  3622577
    "Czechia"     2025  6393110
    "Czechia"     2030  6277926
    "Czechia"     2035  6186985
    "Germany"     2025 49745126
    "Germany"     2030 47791925
    "Germany"     2035 46073050
    "Denmark"     2025  3429683
    "Denmark"     2030  3411768
    "Denmark"     2035  3350290
    "Estonia"     2025   793657
    "Estonia"     2030   780194
    "Estonia"     2035   772710
    "Greece"      2025  5976428
    "Greece"      2030  5697049
    "Greece"      2035  5330785
    "Spain"       2025 29364596
    "Spain"       2030 29366370
    "Spain"       2035 28877088
    "Finland"     2025  3156273
    "Finland"     2030  3131609
    "Finland"     2035  3126658
    "France"      2025 37683453
    "France"      2030 37588765
    "France"      2035 37460086
    "Croatia"     2025  2185274
    "Croatia"     2030  2081006
    "Croatia"     2035  2001075
    "Hungary"     2025  5723325
    "Hungary"     2030  5659115
    "Hungary"     2035  5506887
    "Ireland"     2025  3091817
    "Ireland"     2030  3211759
    "Ireland"     2035  3315087
    "Italy"       2025 34393569
    "Italy"       2030 33625169
    "Italy"       2035 32478515
    "Lithuania"   2025  1697684
    "Lithuania"   2030  1575211
    "Lithuania"   2035  1481324
    "Latvia"      2025  1063358
    "Latvia"      2030   982122
    "Latvia"      2035   919676
    "Netherlands" 2025 10548429
    "Netherlands" 2030 10452819
    "Netherlands" 2035 10296713
    "Poland"      2025 22599106
    "Poland"      2030 21864912
    "Poland"      2035 21367713
    "Portugal"    2025  5998876
    "Portugal"    2030  5771592
    "Portugal"    2035  5513315
    "Romania"     2025 10909511
    "Romania"     2030 10601538
    "Romania"     2035  9996723
    "Sweden"      2025  6018870
    "Sweden"      2030  6176791
    "Sweden"      2035  6319983
    "Slovenia"    2025  1231363
    "Slovenia"    2030  1205780
    "Slovenia"    2035  1189029
    "Slovakia"    2025  3325406
    "Slovakia"    2030  3200343
    "Slovakia"    2035  3121427
    end
    
    local w Working_Population_Size
    bysort Country (Year) : gen pc_change = 100 * (`w' - `w'[1]) / `w'[1]
    separate pc_change, by(Year) veryshortlabel 
    
    graph dot (mean) pc_change203?, over(Country, label(labsize(small)) sort(pc_change2035)) ///
    marker(1, ms(Oh) msize(medlarge)) marker(2, ms(+) msize(medlarge)) ///
    legend(row(1) pos(6) order(1 "2030" 2 "2035")) yli(0, lp(solid) lc(gs12) lw(vthin)) ysc(alt) ///
    ytitle("% change in working population, 2025 to 2030 and 2035")
    Click image for larger version

Name:	change.png
Views:	1
Size:	93.1 KB
ID:	1779795

    Comment


    • #3
      Another take is just to show line graphs scaling by 2025 value. But then 24 superimposed lines would just be spaghetti, so you need something else, either simply a panel (facet) for each country or more subtly a front-and-back plot in which each series is shown in turn in front with the others as backdrop.

      The code here uses myaxis and fabplot as written up in the Stata Journal.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str11 Country int Year long Working_Population_Size
      "Austria"     2025  5484805
      "Austria"     2030  5355853
      "Austria"     2035  5240943
      "Belgium"     2025  6818769
      "Belgium"     2030  6826206
      "Belgium"     2035  6862112
      "Bulgaria"    2025  4002863
      "Bulgaria"    2030  3797467
      "Bulgaria"    2035  3622577
      "Czechia"     2025  6393110
      "Czechia"     2030  6277926
      "Czechia"     2035  6186985
      "Germany"     2025 49745126
      "Germany"     2030 47791925
      "Germany"     2035 46073050
      "Denmark"     2025  3429683
      "Denmark"     2030  3411768
      "Denmark"     2035  3350290
      "Estonia"     2025   793657
      "Estonia"     2030   780194
      "Estonia"     2035   772710
      "Greece"      2025  5976428
      "Greece"      2030  5697049
      "Greece"      2035  5330785
      "Spain"       2025 29364596
      "Spain"       2030 29366370
      "Spain"       2035 28877088
      "Finland"     2025  3156273
      "Finland"     2030  3131609
      "Finland"     2035  3126658
      "France"      2025 37683453
      "France"      2030 37588765
      "France"      2035 37460086
      "Croatia"     2025  2185274
      "Croatia"     2030  2081006
      "Croatia"     2035  2001075
      "Hungary"     2025  5723325
      "Hungary"     2030  5659115
      "Hungary"     2035  5506887
      "Ireland"     2025  3091817
      "Ireland"     2030  3211759
      "Ireland"     2035  3315087
      "Italy"       2025 34393569
      "Italy"       2030 33625169
      "Italy"       2035 32478515
      "Lithuania"   2025  1697684
      "Lithuania"   2030  1575211
      "Lithuania"   2035  1481324
      "Latvia"      2025  1063358
      "Latvia"      2030   982122
      "Latvia"      2035   919676
      "Netherlands" 2025 10548429
      "Netherlands" 2030 10452819
      "Netherlands" 2035 10296713
      "Poland"      2025 22599106
      "Poland"      2030 21864912
      "Poland"      2035 21367713
      "Portugal"    2025  5998876
      "Portugal"    2030  5771592
      "Portugal"    2035  5513315
      "Romania"     2025 10909511
      "Romania"     2030 10601538
      "Romania"     2035  9996723
      "Sweden"      2025  6018870
      "Sweden"      2030  6176791
      "Sweden"      2035  6319983
      "Slovenia"    2025  1231363
      "Slovenia"    2030  1205780
      "Slovenia"    2035  1189029
      "Slovakia"    2025  3325406
      "Slovakia"    2030  3200343
      "Slovakia"    2035  3121427
      end
      
      local w Working_Population_Size
      
      bysort Country (Year) : gen scaled = 100 * `w' / `w'[1]
      
      myaxis Country2=Country, sort(mean scaled) subset(Year==2035)
      
      twoway connect scaled Year, msize(medlarge) by(Country2, compact note("")) xsc(r(2024.5 2035.5))  ytitle(Working population as % of 2025) name(line1, replace)
      
      fabplot line scaled Year, by(Country2, compact) frontopts(recast(connect) msize(medlarge) lw(thick)) ytitle(Working population as % of 2025) lc(gs12) xsc(r(2024.5 2035.5)) name(line2, replace)
      Click image for larger version

Name:	line1.png
Views:	1
Size:	147.2 KB
ID:	1779800

      Click image for larger version

Name:	line2.png
Views:	1
Size:	438.4 KB
ID:	1779801

      Comment


      • #4
        Nick Cox Thanks for the help. To respond to what you've suggested in #2, I'm actually interested in studying the evolution from 2025 to 2035, it is about the change of the size of the working class population, so studying that for a short period (5 years from 2025 to 2030 or from 2030 to 2035) won't give that much of significant information, so I'm interested in studying a change for 10 years from 2025 to 2035. Yet, I do believe that considering the 2030 value in the calculations could give more accurate values for the rates of changes for the population from 2025 to 2035 (and correct me on this if I'm wrong), I do think it is possible to incorporate the 2030 figures in the calculation (perhaps, it is goin to be through a mean calculation or something like that).
        Still, the dot chat you've suggested is great, I do believe that having a detailed idea about the changes from 2025 to 2030 and from 2030 to 2035 gives more information (even though that the trend didn't change from 2030 to 2035 for all the countries), yet, don't you think that such a dot chart could be hard to read and understand for the common people? That's why I always prefer bar charts, I do believe they are simpler to read, but it is just a thought.

        Comment


        • #5
          I am not an economist or demographer but have hung around Statalist enough to know some basics here, or so I think.

          Many of your questions are about your data which is fair enough. But the calculations have already been made. I have no idea exactly how they were made but presumably using some model or systematic method.

          So I have no idea why using 2030 values would make anything more "accurate" as the calculations have already been made. FWIW, calculating next year's working population reliably seems to me pretty hard for anywhere, and more generally 6 or 7 or 8 digit precision for projections over a decade seems pretty silly, but we are presumably all expected to know that too.

          Similarly I am at a loss to know what other calculations you are interested in, Scaling by population to relative values apart, there is just a choice on whether to use in graphs 2030 values, 2035 values, or both, and all my suggestions use both.

          Working class population and working population would in English-language social science terms mean quite different things.

          I don't see why dot charts should be hard for common people to read. I would like them to have a chance. In this thread the difficulty lies elsewhere. I don't think you have explained how a bar chart might or should be drawn at all, let alone one that showed clearly changes over two time periods for 24 countries.
          Last edited by Nick Cox; 13 Jul 2025, 07:25.

          Comment

          Working...
          X