Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Frequencies of Multiple Variables in one Table

    Hello everyone,

    I have a problem concerning displaying frequencies of different variables in a table. The problem sounds rather straightforward, but I haven't found a way to solve it yet. The (simplified) setting is as follows: I have two variables (and a person identifier) which contain the industry affiliation of a person before and after a certain event. Thus each line of the dataset contains the person identifier, a number (industry-code) indicating the industry the person was affiliated with before the event and a number indicating the industry a person was affiliated with after a certain event. What I would like to have now is a table which shows me how the people were distributed accross industries before and after the event, i.e. a table which shows me the share of people that was affiliated with the different industries before and after the event. To make it a little difficult, there might also be industries that no person was affiliated with before the event but there are persons affiliated with the industry after the event or the other way around. In a second step i would then also see the difference between the shares before and after the event for each industry. I have tried multiple things with tabulate and tabstat, but was not able to produce the desired result. If anyone has an idea or if anything is unclear I'm happy to clarify. Thank you for your help,

    cheers,
    J. Emmler

  • #2
    Only a moment ago I posted (almost) this in another thread.

    Your description will make sense to you, but it's much harder work for anyone else to read it, absorb it and imagine your situation.

    One good data example is worth a volume of exegesis.

    Please do read the FAQ Advice, as you were asked to do before posting, and come back with a data example.

    Comment


    • #3
      Sorry, for the confusing post. The data I'm working on is confidential so I can't post an excerpt but I will provide a "fake example". I hope the following example will clarify.

      The data that I have looks like this:
      Pers (personal identifier) industry_b industry_a
      5001 4 3
      5002 3 3
      5003 4 2
      5004 1 5
      5005 6 7
      5006 6 2
      5007 4 2
      5008 4 4
      5009 9 8
      5010 3 3
      - tabulate industry_b - would produce:
      Frequency Percent
      1 1 0.1
      3 2 0.2
      4 4 0.4
      6 2 0.2
      9 1 0.1
      - tabulate industry_a - would produce:
      Frequency Percent
      2 3 0.3
      3 3 0.3
      4 1 0.1
      5 1 0.1
      7 1 0.1
      8 1 0.1
      What I would like now is a table like this: The first row displays the industry code, the second row displays how many individuals worked in the industry before the event and the third row shows how many individuals worked in the industry after the event and the fourth column the difference between the two shares (this is optional, just for ease of seeing how the shares evolved):

      Industry Before After Difference
      1 0,1 0 0,1
      2 0 0,3 -0,3
      3 0,2 0,3 -0,1
      4 0,4 0,1 0,3
      5 0 0,1 -0,1
      6 0,2 0 0,2
      7 0 0,1 -0,1
      8 0 0,1 -0,1
      9 0,1 0 0,1

      I tried -tabstat- with the count option, but I can't handle the fact the there are industrie codes in "industry_b" that aren't in "industry_a" and vice versa. This is a lot of tables, I hope it became clearer what I want to accomplish.

      Comment


      • #4


        A fake example is fine. This works. There may well be a more direct way. (What you are labelling percents I see as proportions.)

        Code:
        . clear
        
        . input Pers industry_b   industry_a
        
                  Pers  industr~b  industr~a
          1. 5001    4       3
          2. 5002    3       3
          3. 5003    4       2
          4. 5004    1       5
          5. 5005    6       7
          6. 5006    6       2
          7. 5007    4       2
          8. 5008    4       4
          9. 5009    9       8
         10. 5010    3       3
         11. end
        
        .
        . reshape long industry_, i(Pers) j(type) string
        (note: j = a b)
        
        Data                               wide   ->   long
        -----------------------------------------------------------------------------
        Number of obs.                       10   ->      20
        Number of variables                   3   ->       3
        j variable (2 values)                     ->   type
        xij variables:
                          industry_a industry_b   ->   industry_
        -----------------------------------------------------------------------------
        
        . contract industry type
        
        . reshape wide _freq, i(industry) j(type) string
        (note: j = a b)
        
        Data                               long   ->   wide
        -----------------------------------------------------------------------------
        Number of obs.                       11   ->       9
        Number of variables                   3   ->       3
        j variable (2 values)              type   ->   (dropped)
        xij variables:
                                          _freq   ->   _freqa _freqb
        -----------------------------------------------------------------------------
        
        . mvencode _*, mv(0)
              _freqa: 3 missing values recoded
              _freqb: 4 missing values recoded
        
        . rename (_freq*) (freq_*)
        
        . egen pr_a = pc(freq_a) , prop
        
        . egen pr_b = pc(freq_b) , prop
        
        . gen diff = pr_b - pr_a
        
        . l
        
             +-------------------------------------------------+
             | indust~_   freq_a   freq_b   pr_a   pr_b   diff |
             |-------------------------------------------------|
          1. |        1        0        1      0     .1     .1 |
          2. |        2        3        0     .3      0    -.3 |
          3. |        3        3        2     .3     .2    -.1 |
          4. |        4        1        4     .1     .4     .3 |
          5. |        5        1        0     .1      0    -.1 |
             |-------------------------------------------------|
          6. |        6        0        2      0     .2     .2 |
          7. |        7        1        0     .1      0    -.1 |
          8. |        8        1        0     .1      0    -.1 |
          9. |        9        0        1      0     .1     .1 |
             +-------------------------------------------------+
        Here is the code in one:

        Code:
        clear
        input Pers industry_b    industry_a
        5001    4    3
        5002    3    3
        5003    4    2
        5004    1    5
        5005    6    7
        5006    6    2
        5007    4    2
        5008    4    4
        5009    9    8
        5010    3    3
        end
        
        reshape long industry_, i(Pers) j(type) string
        contract industry type
        reshape wide _freq, i(industry) j(type) string
        mvencode _*, mv(0)
        rename (_freq*) (freq_*)
        egen pr_a = pc(freq_a) , prop
        egen pr_b = pc(freq_b) , prop
        gen diff = pr_b - pr_a
        l, sep(0)

        Comment


        • #5
          Added in edit: crossed with Nick's post #4, a complete solution, but do see the advice in the last paragraph below.

          Here is some technique that may start you toward the solution you want. The key is using the reshape command to convert your data from a wide layout to a long one. This makes the tabulation straightforward, and will almost certainly make other analysis with this data easier. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(pers ind_b ind_a)
          5001 4 3
          5002 3 3
          5003 4 2
          5004 1 5
          5005 6 7
          5006 6 2
          5007 4 2
          5008 4 4
          5009 9 8
          5010 3 3
          end
          dataex
          exit
          reshape long ind_, i(pers) j(period) string
          rename ind_ industry
          list, sepby(pers)
          tabulate industry period, column
          You should run this example in your copy of Stata to understand what is happening. The output from the tabulate command is as follows.
          Code:
          . tab industry period, column
          
          +-------------------+
          | Key               |
          |-------------------|
          |     frequency     |
          | column percentage |
          +-------------------+
          
                     |        period
            industry |         a          b |     Total
          -----------+----------------------+----------
                   1 |         0          1 |         1
                     |      0.00      10.00 |      5.00
          -----------+----------------------+----------
                   2 |         3          0 |         3
                     |     30.00       0.00 |     15.00
          -----------+----------------------+----------
                   3 |         3          2 |         5
                     |     30.00      20.00 |     25.00
          -----------+----------------------+----------
                   4 |         1          4 |         5
                     |     10.00      40.00 |     25.00
          -----------+----------------------+----------
                   5 |         1          0 |         1
                     |     10.00       0.00 |      5.00
          -----------+----------------------+----------
                   6 |         0          2 |         2
                     |      0.00      20.00 |     10.00
          -----------+----------------------+----------
                   7 |         1          0 |         1
                     |     10.00       0.00 |      5.00
          -----------+----------------------+----------
                   8 |         1          0 |         1
                     |     10.00       0.00 |      5.00
          -----------+----------------------+----------
                   9 |         0          1 |         1
                     |      0.00      10.00 |      5.00
          -----------+----------------------+----------
               Total |        10         10 |        20
                     |    100.00     100.00 |    100.00
          Obviously more work is needed, and perhaps a different command than tabulate.

          With that said, let me recommend that before your next post you review the Statalist FAQ linked to from the top of the page, as Nick recommended. See especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, and to use the dataex command to provide sample data, as described in section 12 of the FAQ. In particular, dataex ensures that the data as read faithfully reproduce the data types, formats, labels, and value labels in your data.

          Comment


          • #6
            Hi, I have a problem with percetages. This is my first post.

            How can I get row percentages with the contract command, similar to the obtained with the tab command??

            In the auto dataset, I try "contract rep78 foreign , percent(percent)" but the result is very similar to "tab rep78 foreign, col", and I need the row percentages to store them in a new dataset.

            I hope somebody can help me.

            Thank you!!

            Comment


            • #7
              You could do something like the following.
              Code:
              contract rep78 foreign, freq(count)
              reshape wide count, i(rep78) j(foreign)
              forvalues i = 0/1 {
                  generate double pct`i' = 100 * count`i' / (count0 + count1)
              }
              There might be an official or user-written command that will do what you want in one step, but I can't think of any at the moment.

              Comment


              • #8
                The example given happens to have a binary column variable, so it can be done directly:

                Code:
                . sysuse auto, clear
                (1978 Automobile Data)
                
                . egen pcforeign = mean(100 * foreign), by(rep78) 
                
                . tabdisp rep78, c(pcforeign) format(%4.2f)
                
                ----------------------
                Repair    |
                Record    |
                1978      |  pcforeign
                ----------+-----------
                        1 |       0.00
                        2 |       0.00
                        3 |      10.00
                        4 |      50.00
                        5 |      81.82
                        . |      20.00
                ----------------------
                Not a general solution, but worth a mention.

                Comment


                • #9
                  Thank you very much Nick and William, that was exactly what i wanted.

                  Comment


                  • #10
                    Note that in #4 using

                    Code:
                    contract industry type, zero
                    would make the mvencode statement unnecessary.

                    Comment


                    • #11
                      castro rene at post #6: Welcome to Statalist.

                      Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question.

                      In particular, while this topic did discuss the contract command, your question appears to have been overlooked in the discussion of the question that initiated this topic.

                      Instead of posting a new question in an existing topic, you should start a new topic with a descriptive title so that you attract members to read it who may be able to offer advice.

                      With that said, I do not think contract offers row percentages. Perhaps the table command with the replace option can do what you seek. Or perhaps more easily, you can add code to calculate the row percentages yourself following the contract command.

                      Comment


                      • #12
                        #7 and #8 are answers to #6.

                        Comment


                        • #13
                          Thanks!!! @Nick Cox and @Joseph Coveney

                          I resolved the problem temporary, creating two new variables with "collapse (count)" for the main variable and his category of interest respectively, and then dividing both into a new third variable.

                          But your suggestions looks simpler and faster. I will try them.

                          Comment

                          Working...
                          X