Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Clyde,

    As regards to your post #14, if psic is not always the lowest number among all sic,

    e.g. for firm X, psic=3500, and the other segment sic=3100, 3200.

    can I stiil use

    Code:
    // ASSIGN SEQUENCE NUMBERS TO EACH SEGMENT WITHIN A FIRM
    by gvkey sic, sort: gen int seq = 1 if _n == 1 & sic != psic
    by gvkey (sic): replace seq = sum(seq)
    instead of using
    isid gvkey sic gen not_psic = (sic != psic) // 0 FOR PSIC, 1 FOR OTHERS by gvkey (not_psic sic), sort: gen seq = _n-1 // SORT THE PSIC TO THE TOP OF THE LIST drop not_psic

    Comment


    • #17
      Hi Clyde ,

      Given that psic is not always the lowest number, would you still advise replace the code as you indicated in #14.

      Thanks,

      Rochelle

      Comment


      • #18
        Dear all,

        I have to change my data a little- leave out segments with missing sales. This will reduce my data size , so when i do reshape, the computer will not freeze.

        data have
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(gvkey sic psic) byte segsale
        1000 3500 3500 40 
        1000 3200 . 10 
        2000 3500 3500 15 
        1200 3300 . 12 
        1200 3500 3500 49 
        end
        my desired output is

        Code:
        clear
        input int(gvkey sic psic) byte segsale int(totsale_psic sic1) byte totsale_sic1 int sic2 byte totsale_sic2
        1000 3500 3500 40 104 . . . . 
        1000 3200 . 10 . 3200 10 . . 
        2000 3500 3500 15 104 . . . . 
        1200 3300 . 12 . . . 3300 12 
        1200 3500 3500 49 104 . . . . 
        end

        gvkey is firm identifier.
        psic is the primary segment id - segment with largest sales.
        each firm could operates in one segment, or multiple segment , e.g. firm 1000 operates in 2 segments 3500, 3200
        segsale is sales in each segment

        desired output shows new variables
        totsale_psic : this is the sum of segsale across the same psic
        totsale_sic1: the sum of segsale across the first non psic segment (in my case it is 3200, given only 1 firm has has segment in 3200 with sales 10, so total is 10)
        totsale_sic2: the sum of segsale across the second non psic segment (in my case it is 3300, given only 1 firm has segment 3300 with sales 12, so total is 12)

        my code (Clyde wrote the code for me for a different data), I want to ask you help to edit it to produce the output above.

        Code:
        use have,clear
        
        * combine segments if they have the same sic
        collapse (sum) segsale=segsale, by(gvkey sic) 
        * this above adds up sales, which means I need to redefine pri seg 
        
        *drop psic
        
        gen sale=float(segsale)
        bysort gvkey : egen maxsale=max(segsale)
        gen psic=sic if sale==maxsale
        drop maxsale sale
        
        // VERIFY EACH GVKEY HAS ONLY ONE NON-MISSINSG VALUE OF PSIC
        // AND SPREAD THAT TO ALL OBSERVATIONS OF THAT GVKEY
        by gvkey (psic), sort: assert psic == psic[1] if !missing(psic)
        by gvkey (psic): replace psic = psic[1]
        Code:
        // CALCULATE Across each PSIC cohort - TOTAL SALES IN EACH SEGMENT
        egen tot_sales_sic = total(segsale), by(psic sic)
        
        isid gvkey sic
        gen not_psic = (sic != psic) // 0 FOR PSIC, 1 FOR OTHERS
        by gvkey (not_psic sic), sort: gen seq = _n-1 // SORT THE PSIC TO THE TOP OF THE LIST
        drop not_psic
        
        * count # of segments
        summ seq, meanonly
        local seq_max = r(max)
        
        // RESHAPE WIDE
        drop segsale
        reshape wide tot_sales_sic sic, i(gvkey) j(seq)
        
        // MERGE BACK THE ORIGINAL DATA
        merge 1:m gvkey using UnStretch, assert(match) nogenerate
        sort gvkey sic
        order sic psic, after(gvkey)
        
        
        forvalues i = 0/`seq_max' {
            replace tot_sales_sic`i' = . if sic != sic`i'
            replace sic`i' = . if sic != sic`i'
        }

        Best,
        Rochelle

        Comment


        • #19
          The code you sent does not quite run with the example data. The -merge- command refers to a file UnStretch, which you do not create or show us in this post. Based on earlier posts in the thread, I'm guessing that UnStretch is a copy of the original data, so I've modified your code accordingly to experiment with it.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int(gvkey sic psic) byte segsale
          1000 3500 3500 40 
          1000 3200 . 10 
          2000 3500 3500 15 
          1200 3300 . 12 
          1200 3500 3500 49 
          end
          
          tempfile holding
          save `holding'
          
          * combine segments if they have the same sic
          collapse (sum) segsale=segsale, by(gvkey sic) 
          * this above adds up sales, which means I need to redefine pri seg 
          
          *drop psic
          
          gen sale=float(segsale)
          bysort gvkey : egen maxsale=max(segsale)
          gen psic=sic if sale==maxsale
          drop maxsale sale
          
          // VERIFY EACH GVKEY HAS ONLY ONE NON-MISSINSG VALUE OF PSIC
          // AND SPREAD THAT TO ALL OBSERVATIONS OF THAT GVKEY
          by gvkey (psic), sort: assert psic == psic[1] if !missing(psic)
          by gvkey (psic): replace psic = psic[1]
          
          // CALCULATE Across each PSIC cohort - TOTAL SALES IN EACH SEGMENT
          egen tot_sales_sic = total(segsale), by(psic sic)
          
          isid gvkey sic
          gen not_psic = (sic != psic) // 0 FOR PSIC, 1 FOR OTHERS
          by gvkey (not_psic sic), sort: gen seq = _n-1 // SORT THE PSIC TO THE TOP OF THE LIST
          stop
          drop not_psic
          
          * count # of segments
          summ seq, meanonly
          local seq_max = r(max)
          
          // RESHAPE WIDE
          drop segsale
          reshape wide tot_sales_sic sic, i(gvkey) j(seq)
          assert sic0 == psic
          drop sic0
          rename tot_sales_sic0 tot_sales_psic
          
          // MERGE BACK THE ORIGINAL DATA
          merge 1:m gvkey using `holding', assert(match) nogenerate
          sort gvkey sic
          order sic psic, after(gvkey)
          
          
          forvalues i = 0/`seq_max' {
              replace tot_sales_sic`i' = . if sic != sic`i'
              replace sic`i' = . if sic != sic`i'
          }
          is my starting point.

          Getting to the desired output you show is not just a matter of editing. You're asking for different results. In the data in #18, firm 1200 only has two segments: 3500 (its psic) and 3300. The code was designed to create sic1 total_sales_sic1 sic2 total_sales_sic2, etc. based on the number of segments within firms. So there is no reason why firm 1200 (or any other firm in your example) should have an sic2 and total_sales_sic2, as firm has only one segment other than its psic. Based on your original problem description, the output you are getting now is correct and the output you are now requesting would be wrong..

          Now maybe you actually want the sic1 sic2 etc., to refer to sic's within psic, not within firm. If that's what you want, the code can be changed to produce that. But before I do it, I want to be sure that's really what you're asking for, because it is different from what you originally described in the early posts.

          Comment


          • #20
            Dear Clyde,

            You are correct, the unstrech refers to data have in my post #18.

            data have has 3 firms identified by gvkey. these 3 firms form a cohort identified by psic. If I may, let us present 3500 is plastic toy manufacturer, so the three firms primary business is making plastic toys, but each of them also do other lines of business as denoted by 3200, 3300.

            data want in #18:

            totsale_psic is the sum of primary segment sale for this cohort of 3 firms
            totsale_sic1: is the sum of sales for 3200 segment or this cohort of 3 firms = (10+0+0)
            totsale_sic2: is the sum of sales for 3300 segment or this cohort of 3 firms = (0+0+12)

            so from data have to the data want , I need to add on 2 new sic variable , and the corresponding sum of sales across all three firms.

            My real data has other cohorts, e.g. another primary segment cohort psic=6781, it has 100 firms in this cohort, and they have 20 distinct sic, and I need to create the columns of 20 sic , and the total sales across the cohort. Later I would adapt this code to all other cohorts with different psic.

            so, data want is a data structure that list all sic in columns for each of the firm in the cohort. I hope this makes sense, and data want in #18 is the desired outcome.

            I want to thank you for again for reaching my long posts and help me continuously !!!

            p.s. I hope it makes sense to show the new data (different from post #1), which delete missing sales rows in data want , and help with reshape later on.

            Rochelle

            Comment


            • #21
              OK. The code below will do it. The only aspect of the results data set that I did not replicate is the order of observations. You have gvkey 2000 in the middle as observation #3, whereas I have it at the end because I have sorted on gvkey within psic. It isn't clear to me what the pattern is that puts 2000 in the middle. Except for that, however, this code produces your output exactly as you wanted.

              The code has changed from earlier versions in several places, so I have not highlighted them. Just treat this as an entirely new file of code, even though much of it is as before.

              Code:
              set more off
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(gvkey sic psic) byte segsale
              1000 3500 3500 40 
              1000 3200 . 10 
              2000 3500 3500 15 
              1200 3300 . 12 
              1200 3500 3500 49 
              end
              
              tempfile holding
              save `holding'
              
              * combine segments if they have the same sic
              collapse (sum) segsale=segsale, by(gvkey sic) 
              * this above adds up sales, which means I need to redefine pri seg 
              
              gen sale=float(segsale)
              bysort gvkey : egen maxsale=max(segsale)
              gen psic=sic if sale==maxsale
              drop maxsale sale
              
              // VERIFY EACH GVKEY HAS ONLY ONE NON-MISSINSG VALUE OF PSIC
              // AND SPREAD THAT TO ALL OBSERVATIONS OF THAT GVKEY
              by gvkey (psic), sort: assert psic == psic[1] if !missing(psic)
              by gvkey (psic): replace psic = psic[1]
              
              // CALCULATE Across each PSIC cohort - TOTAL SALES IN EACH SEGMENT
              egen tot_sales_sic = total(segsale), by(psic sic)
              
              isid psic gvkey sic
              gen not_psic = (sic != psic) // 0 FOR PSIC, 1 FOR OTHERS
              by psic not_psic sic, sort: gen seq = 1 if not_psic // SORT THE PSIC TO THE TOP OF THE LIST
              by psic: replace seq = sum(seq)
              drop not_psic
              
              * count # of segments
              summ seq, meanonly
              local seq_max = r(max)
              
              // RESHAPE WIDE
              drop segsale
              reshape wide tot_sales_sic sic, i(psic gvkey) j(seq)
              assert sic0 == psic
              
              // MERGE BACK THE ORIGINAL DATA
              merge 1:m gvkey using `holding', assert(match) nogenerate
              sort gvkey sic
              order sic psic, after(gvkey)
              
              
              forvalues i = 0/`seq_max' {
                  replace tot_sales_sic`i' = . if sic != sic`i'
                  replace sic`i' = . if sic != sic`i'
              }
              drop sic0
              rename tot_sales_sic0 tot_sales_psic
              order gvkey sic psic segsale tot_sales_psic, first
              gen byte not_psic = (sic != psic)
              sort psic gvkey not_psic sic 
              drop not_psic

              Comment


              • #22
                It works great for my reduced size data

                Sincerely,

                Rochelle

                Comment

                Working...
                X