Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating summed variable in panel (New User Question)

    I have a panel dataset with 37 counties and quarterly data for 13 years which includes the number of new jobs created by sector. I would like to sum each quarter for each county for each year in the data set and then create a new variable that divides each county's proportion of jobs by the overall total by sector. I think this is a two-step process (one to sum each job sector by year and one to sum each county by year) but have never run a loop in Stata before. I want to use the egen function but collapse as well (example:
    collapse (sum) x1 x2, by(id date) so that I can retain the raw data but also these collapsed totals for each year. Apologies for the very basic question, I'm a bit of a novice with Stata when it comes to creating variables and using loops and searching the archives didn't help very much.

    Here's a sample of the data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str16 county int year byte quarter int(agriculturefore miningquarrying)
    "Acadia" 1998 1 178 542
    "Acadia" 1998 2 193 539
    "Acadia" 1998 3 206 533
    "Acadia" 1998 4 231 489
    "Acadia" 1999 1 164 216
    "Acadia" 1999 2 194 198
    "Acadia" 1999 3 213 413
    "Acadia" 1999 4 261 409
    "Acadia" 2000 1 150 497
    "Acadia" 2000 2 188 485
    "Acadia" 2000 3 196 499
    "Acadia" 2000 4 274 501
    "Acadia" 2001 1 154 521
    "Acadia" 2001 2 178 536
    "Acadia" 2001 3 165 462
    end

    Thanks in advance for your assistance.


    Last edited by Davia Downey; 09 Jul 2018, 12:09.

  • #2
    I'm not sure I understand what you want. It might be this:

    Code:
    collapse (sum) agriculturefore miningquarrying, by(county year)
    gen agri_share = agriculturefore/(agriculturefore+miningquarrying)
    gen mining_share = miningquarrying/(agriculturefore+miningquarrying)
    If that's not it, please post back with a clearer explanation, or perhaps an example in which you have hand-worked the results.

    Note, by the way, that this code does not involve any explicit loops. One distinctive aspect of Stata is that many things that you would do using loops in other packages can be done without explicit looping.

    Comment


    • #3
      I also was not clear on what was desired. I thought he wanted county level proportions of across county totals for each sector.

      Code:
      egen tot_ag = total(ag)
      egen tot_min = total(min)
      collapse (sum) ag min (mean) tot*, by(county)
      gene prop_ag = ag/tot_ag
      generate prop_min = min/tot_min

      Comment


      • #4
        I got around the problem by using the following:


        //Summing Each Sector by Year//
        bysort year: egen yr_agri = sum(agriculturefore)
        bysort year: egen yr_mining = sum(miningquarrying)

        and then

        //Summing Each Sector by Year and County//
        bysort county year: egen agrisum = sum(agriculturefore)
        bysort county year: egen miningsum = sum(miningquarrying)


        The first set of variables provides the sum total of new jobs in each sector by the number of years in the dataset. The second provides the sum of job in each sector for each county. Does this make sense?

        Comment


        • #5
          I do have a followup question, however. I would like to now create a proportion variable dividing the county totals by the overall. I used the following commands but the results aren't what I expected.

          Code:

          generate test1 = (agrisum / yr_agri )*100

          by county, sort : egen float test2 = pc(agrisum/yr_agri)

          When I do the math, however the result isn't what I expected. See below:

          input str16 county int year float(yr_agri agrisum pctagribycountyyr test1 test2)

          "Acadia" 1998 23747 808 .014883712 3.402535 1.488371
          "Acadia" 1998 23747 808 .014883712 3.402535 1.488371
          "Acadia" 1998 23747 808 .014883712 3.402535 1.488371
          "Acadia" 1998 23747 808 .014883712 3.402535 1.488371
          "Acadia" 1999 25145 832 .014473725 3.308809 1.4473726
          "Acadia" 1999 25145 832 .014473725 3.308809 1.4473726
          "Acadia" 1999 25145 832 .014473725 3.308809 1.4473726
          "Acadia" 1999 25145 832 .014473725 3.308809 1.4473726
          end
          [/CODE]


          The second to last column includes the actual hand calculation of 808/23747 which is the 3.402535 column. The only thing I can figure out is that in the first two calculations, Stata is taking the entire sum of the yr_agri and agrisum columns and then dividing when what I want is the by year and by county proportions.

          Comment


          • #6
            I think you misunderstand how -egen, pc()- works. In any case, you have no need to use it.

            You can get each county's proportion for the year by:

            Code:
            gen wanted = agrisum/yr_agri
            I find myself very confused by your explanations, and I'm not sure this is what you are looking for, but since you yourself hand-calculated it as 808/23747, I think it is. If it's not, please post back with a clearer explanation.

            Comment


            • #7
              Apparently, the egen pc doesn't work the way I expected even though I read the stata pdf thoroughly . I'll go ahead with the hand calculation then. Thanks for your response.

              Comment


              • #8
                When you right -by x, sort: egen y = pc(z)-, Stata, working separately in each group of observations corresponding to a single value of x, calculates the total of z in that group, and then sets y = 100*z/that total.

                Comment


                • #9
                  I was the original author of egen, pc()

                  STB-50 dm70 . . . . . . . . . . . . . . . . Extensions to generate, extended
                  (help egenodd if installed) . . . . . . . . . . . . . . . . N. J. Cox
                  7/99 pp.9--17; STB Reprints Vol 9, pp.34--45
                  24 additional egen functions presented; includes various string,
                  data management, and statistical functions;
                  many of the egen functions added to Stata 7

                  The problem here is that many different calculations match the summary of wanting percentages as results and egen, pc() can't support them all! But Clyde is right: it was never intended specifically to match the calculation 100 * component/total where the component and total are existing variables, because you can do that directly.

                  That said, feeding pc() an expression is explicitly legal syntax and you can get results that make sense that way. This example may be a better explanation than any word summary.

                  Code:
                  . clear
                  
                  . set obs 7
                  obs was 0, now 7
                  
                  . gen one = 1
                  
                  . gen total = 7
                  
                  . list, sep(0)
                  
                       +-------------+
                       | one   total |
                       |-------------|
                    1. |   1       7 |
                    2. |   1       7 |
                    3. |   1       7 |
                    4. |   1       7 |
                    5. |   1       7 |
                    6. |   1       7 |
                    7. |   1       7 |
                       +-------------+
                  
                  . egen pc = pc(one/total)
                  
                  . list, sep(0)
                  
                       +------------------------+
                       | one   total         pc |
                       |------------------------|
                    1. |   1       7   14.28571 |
                    2. |   1       7   14.28571 |
                    3. |   1       7   14.28571 |
                    4. |   1       7   14.28571 |
                    5. |   1       7   14.28571 |
                    6. |   1       7   14.28571 |
                    7. |   1       7   14.28571 |
                       +------------------------+
                  Nevertheless, a verbal summary: pc() calculates totals and then each contribution as a percent of that total. If by: or by() is specified, that is the framework for the calculation.
                  Last edited by Nick Cox; 10 Jul 2018, 03:24.

                  Comment

                  Working...
                  X