Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • % over total from multiple variables

    Hello, I am very new to Stata.

    I have a database in excel which I imported to Stata, and need to do some analysis.
    I need to know the % of the total over multiple variables:
    Year Location S.16 S.17 S.18 S.19 Total S.
    2022 A 5 0 0 0 5
    2023 A 6 0 7 0 13
    2022 B 0 5 0 9 14
    2023 B 30 0 0 0 30
    % over total S. 66% 8% 11% 15% 100%
    Total S. 41 5 7 9 62
    From the excel calculation above I know that I have 66% of S.16, etc. by dividing the total of S. by the total of each S.XX

    I can't figure how can I do it in Stata. I was thinking that tabout would help but in vain

    I would also need to analyse based on other variables for example location and/or Year. for example: What is my % of S.16...S.19 in location A in 2022?

    Many thanks in advance and thanks for the support.

    Jorge Bica


  • #2
    Do you want to get a table from existing variables -- or new variables from existing variables?

    Please give a data example and clarify precisely what you want. https://www.statalist.org/forums/help#stata is essential reading.

    Comment


    • #3
      Your display does not come from a Stata data set. It cannot possibly because Stata variable names cannot contain periods, and in a numeric variable you cannot have % signs. Moreover, you should not want to create such a data set in Stata. Stata is not a spreadsheet, and trying to use it as if it were usually ends in tears.

      If you don't actually have this data (that is the data itself, without the % over total S and total S rows) in a Stata data set it is premature to ask for help with code. I will proceed on the assumption that you have a Stata data set that looks like the results of running this code, which is -dataex- output:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year str2 location byte(s16 s17 s18 s19 totals)
      2022 "A "  5 0 0 0  5
      2023 "A "  6 0 7 0 13
      2022 "B "  0 5 0 9 14
      2023 "B " 30 0 0 0 30
      end
      Then to get the percentages and totals you want:
      Code:
      collapse (sum) s16-s19 totals
      foreach v of varlist s16-s19 {
          gen `v'_pct = 100*`v'/totals
      }
      To do the same separately for each location, just change the -collapse- command to
      Code:
      collapse (sum) s16-s19 totals, by(location)
      Analogous considerations apply to calculating separately for each year.

      As for doing it separately for each year and location, -by(location year)- will do that, but at least based on the data shown in your post, it would not make much sense to do that as there is only one observation for each location-year combination, so the totals are just the same as the original data.

      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Do you want to get a table from existing variables -- or new variables from existing variables?

        Please give a data example and clarify precisely what you want. https://www.statalist.org/forums/help#stata is essential reading.
        Many thanks for the quick reply!

        The table I added is an example, given the dataset I have is much bigger.

        I would like to have a table with the results in red (which I calculated on excel):
        Year Location S.16 S.17 S.18 S.19 Total S.
        2022 A 5 0 0 0 5
        2023 A 6 0 7 0 13
        2022 B 0 5 0 9 14
        2023 B 30 0 0 0 30
        % over total S. 66% 8% 11% 15% 100%
        Total S. 41 5 7 9 62
        Many thanks again!
        JB

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Your display does not come from a Stata data set. It cannot possibly because Stata variable names cannot contain periods, and in a numeric variable you cannot have % signs. Moreover, you should not want to create such a data set in Stata. Stata is not a spreadsheet, and trying to use it as if it were usually ends in tears.

          If you don't actually have this data (that is the data itself, without the % over total S and total S rows) in a Stata data set it is premature to ask for help with code. I will proceed on the assumption that you have a Stata data set that looks like the results of running this code, which is -dataex- output:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int year str2 location byte(s16 s17 s18 s19 totals)
          2022 "A " 5 0 0 0 5
          2023 "A " 6 0 7 0 13
          2022 "B " 0 5 0 9 14
          2023 "B " 30 0 0 0 30
          end
          Then to get the percentages and totals you want:
          Code:
          collapse (sum) s16-s19 totals
          foreach v of varlist s16-s19 {
          gen `v'_pct = 100*`v'/totals
          }
          To do the same separately for each location, just change the -collapse- command to
          Code:
          collapse (sum) s16-s19 totals, by(location)
          Analogous considerations apply to calculating separately for each year.

          As for doing it separately for each year and location, -by(location year)- will do that, but at least based on the data shown in your post, it would not make much sense to do that as there is only one observation for each location-year combination, so the totals are just the same as the original data.

          In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
          Many many thanks Clyde!

          The the % over total S and total S rows was my calculations on excel. The data set is much bigger and does not have it.

          Your guidance worked perfectly!

          Is there a way to do it as a table instead?
          Sorry to bother.

          Many thanks again,
          JB

          Comment


          • #6
            Here's how I would do it for, as an example, the by-location calculations:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input int year str2 location byte(s16 s17 s18 s19 totals)
            2022 "A " 5 0 0 0 5
            2023 "A " 6 0 7 0 13
            2022 "B " 0 5 0 9 14
            2023 "B " 30 0 0 0 30
            end
            
            collapse (sum) s16-s19 totals, by(location)
            foreach v of varlist s16-s19 {
                gen `v'_pct = 100*`v'/totals
            }
            
            rename (s16-s19) =_total
            
            reshape long s16_ s17_ s18_ s19_, i(location) j(statistic) string
            replace totals = 100 if statistic == "pct"
            rename *_ *
            list, noobs clean

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              Here's how I would do it for, as an example, the by-location calculations:
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int year str2 location byte(s16 s17 s18 s19 totals)
              2022 "A " 5 0 0 0 5
              2023 "A " 6 0 7 0 13
              2022 "B " 0 5 0 9 14
              2023 "B " 30 0 0 0 30
              end
              
              collapse (sum) s16-s19 totals, by(location)
              foreach v of varlist s16-s19 {
              gen `v'_pct = 100*`v'/totals
              }
              
              rename (s16-s19) =_total
              
              reshape long s16_ s17_ s18_ s19_, i(location) j(statistic) string
              replace totals = 100 if statistic == "pct"
              rename *_ *
              list, noobs clean
              Many many thanks!

              I would never do it myself

              It worked perfect!

              Thanks again,
              JB

              Comment

              Working...
              X