Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting the column percentages from two tab

    Dear Statalist, I need to calculate the difference between two percentages coming from two different tables for two years (see below). My aim is to get for instance: 50.21-43.48 which is the difference between the percentages of positions 11 (table 2020) and 11 (table 2017). The same for 63.92-57.74, and so on...

    Is there an easy way (avoiding to calculate first the variables for the 6 column totals for each table one by one, then building the count for each category and dividing all this. And thus making the difference between such a variables) for obtaining such a table of those differences in Stata? I give you a dataset example below for if it helps.

    Thanks in advance.

    Code:
               |                                x2
            x1 |         1          2          3          4          5          6 |     Total
    -----------+------------------------------------------------------------------+----------
             1 |       427      2,325        594        837        684          5 |     4,872 
               |     43.48      57.74      55.93      61.27      60.32      33.33 |     56.74 
    -----------+------------------------------------------------------------------+----------
             2 |       386      1,319        352        469        376          5 |     2,907 
               |     39.31      32.75      33.15      34.33      33.16      33.33 |     33.86 
    -----------+------------------------------------------------------------------+----------
             3 |       169        383        116         60         74          5 |       807 
               |     17.21       9.51      10.92       4.39       6.53      33.33 |      9.40 
    -----------+------------------------------------------------------------------+----------
         Total |       982      4,027      1,062      1,366      1,134         15 |     8,586 
               |    100.00     100.00     100.00     100.00     100.00     100.00 |    100.00
    Code:
               |                                x2
            x1 |         1          2          3          4          5          6 |     Total
    -----------+------------------------------------------------------------------+----------
             1 |       363      2,927        614        964        851          5 |     5,724 
               |     50.21      63.92      60.37      60.02      63.18      35.71 |     61.64 
    -----------+------------------------------------------------------------------+----------
             2 |       279      1,339        344        590        435          6 |     2,993 
               |     38.59      29.24      33.82      36.74      32.29      42.86 |     32.23 
    -----------+------------------------------------------------------------------+----------
             3 |        81        313         59         52         61          3 |       569 
               |     11.20       6.84       5.80       3.24       4.53      21.43 |      6.13 
    -----------+------------------------------------------------------------------+----------
         Total |       723      4,579      1,017      1,606      1,347         14 |     9,286 
               |    100.00     100.00     100.00     100.00     100.00     100.00 |    100.00
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(x1 x2 year)
    2 1 2017
    2 1 2020
    1 5 2020
    1 4 2020
    1 4 2020
    1 4 2020
    1 1 2017
    2 2 2017
    1 3 2020
    1 4 2020
    3 4 2020
    3 1 2020
    1 5 2017
    1 3 2020
    . 3 2017
    1 3 2017
    1 4 2017
    . 1 2017
    2 3 2017
    1 3 2017
    3 2 2020
    1 5 2017
    3 1 2017
    1 3 2020
    1 5 2020
    2 4 2020
    2 4 2020
    . 1 2017
    2 4 2017
    1 2 2020
    1 4 2020
    . 1 2020
    2 5 2017
    1 4 2020
    1 3 2020
    2 2 2020
    1 4 2017
    3 4 2020
    3 3 2017
    1 2 2017
    1 5 2017
    1 3 2020
    . 1 2017
    1 3 2017
    3 1 2020
    1 4 2020
    1 4 2020
    1 4 2020
    1 4 2017
    2 1 2020
    1 5 2020
    1 3 2017
    1 3 2017
    2 3 2017
    1 2 2017
    1 5 2017
    2 3 2020
    1 1 2017
    1 4 2017
    3 1 2017
    2 2 2020
    2 1 2017
    1 5 2017
    1 4 2020
    end

  • #2
    To get the difference between vectors, it helps if they are in aligned variables, so for that reason alone this benefits from a different layout. In what follows I ignore missing values, just as your tabulations do, and I arrange for implied zeros to be explicit. That seemingly won't bite with your full dataset, but does no harm otherwise. I don't do the subtraction here, but it is an easy next step.

    If I have got the roles of x1 and x2 the wrong way round, you just need to swap them in the calculation of totals.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(x1 x2 year)
    2 1 2017
    2 1 2020
    1 5 2020
    1 4 2020
    1 4 2020
    1 4 2020
    1 1 2017
    2 2 2017
    1 3 2020
    1 4 2020
    3 4 2020
    3 1 2020
    1 5 2017
    1 3 2020
    . 3 2017
    1 3 2017
    1 4 2017
    . 1 2017
    2 3 2017
    1 3 2017
    3 2 2020
    1 5 2017
    3 1 2017
    1 3 2020
    1 5 2020
    2 4 2020
    2 4 2020
    . 1 2017
    2 4 2017
    1 2 2020
    1 4 2020
    . 1 2020
    2 5 2017
    1 4 2020
    1 3 2020
    2 2 2020
    1 4 2017
    3 4 2020
    3 3 2017
    1 2 2017
    1 5 2017
    1 3 2020
    . 1 2017
    1 3 2017
    3 1 2020
    1 4 2020
    1 4 2020
    1 4 2020
    1 4 2017
    2 1 2020
    1 5 2020
    1 3 2017
    1 3 2017
    2 3 2017
    1 2 2017
    1 5 2017
    2 3 2020
    1 1 2017
    1 4 2017
    3 1 2017
    2 2 2020
    2 1 2017
    1 5 2017
    1 4 2020
    end
    
    drop if missing(x1, x2)
    contract x1 x2 year, zero 
    egen _total = total(_freq), by(x1 year)
    gen _wanted = 100 * _freq / _total
    reshape wide _*, i(x1 x2) j(year)
    
    list x? _f* _t* _w*, sepby(x1)
    
         +---------------------------------------------------------------------------+
         | x1   x2   _fr~2017   _fr~2020   _to~2017   _to~2020   _wa~2017   _wa~2020 |
         |---------------------------------------------------------------------------|
      1. |  1    1          2          0         18         19   11.11111          0 |
      2. |  1    2          2          1         18         19   11.11111   5.263158 |
      3. |  1    3          5          5         18         19   27.77778   26.31579 |
      4. |  1    4          4         10         18         19   22.22222   52.63158 |
      5. |  1    5          5          3         18         19   27.77778   15.78947 |
         |---------------------------------------------------------------------------|
      6. |  2    1          2          2          7          7   28.57143   28.57143 |
      7. |  2    2          1          2          7          7   14.28571   28.57143 |
      8. |  2    3          2          1          7          7   28.57143   14.28571 |
      9. |  2    4          1          2          7          7   14.28571   28.57143 |
     10. |  2    5          1          0          7          7   14.28571          0 |
         |---------------------------------------------------------------------------|
     11. |  3    1          2          2          3          5   66.66666         40 |
     12. |  3    2          0          1          3          5          0         20 |
     13. |  3    3          1          0          3          5   33.33333          0 |
     14. |  3    4          0          2          3          5          0         40 |
     15. |  3    5          0          0          3          5          0          0 |
         +---------------------------------------------------------------------------+
    Code:
    
    

    Comment


    • #3
      Dear Nick, this solve the problem. I assume changing the x1 by x2 in the total command gives row instead of col total. Thanks a lot for the help.

      Comment


      • #4
        Correct.

        Code:
        egen _total2 = total(_freq), by(x2 year)
        gives you the other marginal totals and indeed
        Code:
        egen _total_all = total(_freq), by(year)
        would give you the table totals.

        Comment

        Working...
        X