Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge the frequencies of 2 variables into 1?

    Hello everybody,
    in advance: I'm relatively unexperienced - so sorry in case of dumb questions ;-)

    I want to add the frequencies of 2 variables (using the same categories) (preferabily by creating a new Variable in order to calculate outliers).

    These are the Variables:

    DE_ETHN1

    Country specific ethnic group 1: |
    Germany | Freq. Percent Cum.
    ----------------------------------------+-----------------------------------
    4. Afghanistan | 1 0.06 0.06
    40. Austria | 7 0.43 0.49
    100. Bulgaria | 1 0.06 0.55
    150. Europe | 2 0.12 0.68
    156. China | 3 0.18 0.86
    170. Colombia | 1 0.06 0.92
    191. Croatia | 3 0.18 1.11
    250. France | 1 0.06 1.17
    276. Germany | 1,573 96.80 97.97
    356. India | 1 0.06 98.03
    380. Italy | 5 0.31 98.34
    528. Netherlands | 2 0.12 98.46
    616. Poland | 6 0.37 98.83
    643. Russian Federation | 6 0.37 99.20
    688. Serbia | 1 0.06 99.26
    724. Spain | 1 0.06 99.32
    756. Switzerland | 1 0.06 99.38
    760. Syrian Arab Republic | 2 0.12 99.51
    792. Turkey | 5 0.31 99.82
    804. Ukraine | 1 0.06 99.88
    960. Other (former nations, specific re | 2 0.12 100.00
    ----------------------------------------+-----------------------------------
    Total | 1,625 100.00

    DE_ETHN2

    Country specific ethnic group 2: |
    Germany | Freq. Percent Cum.
    ----------------------------------------+-----------------------------------
    12. Algeria | 1 0.70 0.70
    40. Austria | 16 11.19 11.89
    51. Armenia | 1 0.70 12.59
    56. Belgium | 1 0.70 13.29
    150. Europe | 9 6.29 19.58
    156. China | 3 2.10 21.68
    191. Croatia | 1 0.70 22.38
    246. Finland | 1 0.70 23.08
    250. France | 2 1.40 24.48
    300. Greece | 8 5.59 30.07
    348. Hungary | 1 0.70 30.77
    380. Italy | 15 10.49 41.26
    410. Korea, Republic of | 1 0.70 41.96
    499. Montenegro | 1 0.70 42.66
    528. Netherlands | 9 6.29 48.95
    586. Pakistan | 1 0.70 49.65
    616. Poland | 20 13.99 63.64
    643. Russian Federation | 19 13.29 76.92
    688. Serbia | 1 0.70 77.62
    724. Spain | 11 7.69 85.31
    760. Syrian Arab Republic | 1 0.70 86.01
    792. Turkey | 13 9.09 95.10
    840. United States | 1 0.70 95.80
    891. Yugoslavia | 1 0.70 96.50
    915. Kosovo | 1 0.70 97.20
    960. Other (former nations, specific re | 4 2.80 100.00
    ----------------------------------------+-----------------------------------
    Total | 143 100.00

    .

    I'd be very glad if anybody could help

  • #2
    Code:
    contract DE_ETHN1 DE_ETHN2
    gen `c(obs_t)' obs_no = _n
    reshape long DE_ETHN, i(obs_no)
    tab DE_ETHN [fweight = _freq]
    While the -tabulate- outputs you show were, I think, enough to guide a solution to this problem, it would have been easier if you had also shown example data. Please do so in the future when requesting help with code, and use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you very much Clyde- you are of great help
      Next time I'll use the dataex command!

      In case you have time left would you mind to explain the code you gave me? (But of course no worries if no, the code alone is very nice to have)

      + Is there any possibility to do this without kicking out all other data?
      Last edited by Clara Eul; 03 Aug 2023, 03:17.

      Comment


      • #4
        Code:
        contract DE_ETHN1 DE_ETHN2
        The contract command reduces the data set to one observation for each combination of DE_ETYHN1 and DE_ETHN2, along with a new variable, _freq, that gives the count of how many observations in the original data set have that combination.
        Code:
        gen `c(obs_t)' obs_no = _n
        This introduces a new variable, obs_no which simply counts down the observations starting from 1 to however many there are. This isn't really integral to the solution of your problem--it is just necessary because the -reshape- command that comes next requires something to put in the -i()- option, and this is it.
        Code:
        reshape long DE_ETHN, i(obs_no)
        Instead of having DE_ETHN1 DE_ETHN2 pairs, this converts the data to a different layout in which DE_ETHN1 and DE_ETHN2 are "stacked on top of each other" in a single variable DE_ETHN, along with which observation they came from (obs_no), the value of _freq from that obs_no, and a new variable _j which will be 1 or 2, distinguishing those things that came from DE_ETHN1 from those that came from DE_ETHN2.
        Code:
        tab DE_ETHN [fweight = _freq]
        Tabulates the frequency of the combined DE_ETHN variable, taking into account the fact that each current observation represents _freq observations from the original data.

        As this code does not involve any local macros, if you would like to "see it in action" you can run it one line at a time and see how each of the commands does what I have explained. My explanations may be clearer if you do that.

        Is there any possibility to do this without kicking out all other data?
        I can't think of a way to do this that does not remove the original data. However, you can -preserve- the data before you run the code and -restore- it afterward. If you are notn familiar with these commands, read -help preserve-.


        Comment


        • #5
          Thank you so much!!!

          Comment

          Working...
          X