Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Distribution of values among multiple variables

    Hello there,

    I have a question concerning my data set. I want to find out the total distribution of values which occur in multiple variables.

    My sample is structured as the following:
    A participant can have up to 4 different roles, which are represented in 4 different variables (RC1 (rolecode1) RC2 RC3 RC4). In addition, the participant can choose his position from 30 different rolecodes (e.g. Secretary, President, Vice President and so on). These values can also change from year to year, depending on changes of employment / promotion.

    Therefore the dataset looks something like this:

    participantid year RC1 RC2 RC3 RC4

    1 2017 S VP P .
    1 2018 S P . .
    1 2019 S P . .
    2 2017 CB EVP P D
    2 2018 CB EVP P .
    2 2019 CEO D . .


    Now my problem is that I can look at the distribution of only 1 varibale by using: tabulate rolecode1
    But I don't know how I can find out the total distribution of for example the rolecode "P" in the entire dataset. In my short example for rolecode2 it would be 0.4, for rolecode3 it would be 1 and for the entire example it would be 0.3125.

    Does anybody have an idea how I can create an overview like the command 'tabulate' does, but for the distribution of all rolecodes in the 4 distinctive variables?

    Like this:
    Rolecodes Frequency Percent
    S 3 0.1875
    CB 2 0.125
    CEO 1 0,0625
    VP 1 0.0625
    P 5 0.3125
    EVP 2 0.125
    D 2 0.125



    Your help is very much appreciated.

  • #2
    Kerstin:
    welcome to this forum.
    This seems a job for -collapse-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Please read and act on https://www.statalist.org/forums/help#stata in giving data examples.

      I don't quite agree with Carlo Lazzaro. At the simplest you want to pool RC? in a table, and for that tabm offers one basic method.

      Code:
      ssc install tab_chi 
      help tabm
      tabm is thus a search term to find many similar questions here on Statalist.

      Comment


      • #4
        Nick:
        thanks for the hint.
        Admittedly, I was not aware of (your) community-contributed programme -tabm-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you both very much. It worked perfectly with the tabm command!

          Nick Cox Sorry, for not providing a perfect data sample (I hoped my example was sufficient and I didn't realize the spaces in my exemplary tables disappeared and made them confusing).

          If I may add one question?
          After seeing the total distribution I can now see that about half of the 30 rolecodes appear less than 1% of the time. Is it possible to unite these (string) values from the 4 variables into 1 value (something like other job titles) if the total frequency is less than 1%?
          Because they are spreaded across the 4 variables I wouldn't know how to tag and rename them.

          Comment


          • #6
            it's not the spacing that is problematic in #1. It's the use of undefined value ;labels.

            tabm has a documented replace option. You may ,most easily aggregate categories in the version of the data it produces.

            Comment

            Working...
            X