Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subgroup count (Excel's minifs function)


    Dear Stata Forumers,


    I want to make a variable called experience, which counts all the previous investments made in the "company group" that came before the first_investment_date of a particular observation. In addition, experience should only be counted if the investment_amount per firm is unique.

    Thank you in advance for your help!

    Ben James



    *--------------------------start_inputs-----------*
    clear all
    input company fund investment_amount investment_date first_investment_date
    1 1 2000 10 2000
    1 2 2001 9 2001
    1 2 2002 8 2001
    2 1 2003 4 2003
    2 1 2004 3 2003
    3 1 2003 12 2003
    3 3 2004 14 2004
    4 1 2000 15 2000
    4 2 2000 6 2000
    end

    *--------------------------end_inputs-----------*


    *--------------------------start_desired_outcome-----------*
    clear all
    input company fund investment_date investment_amount first_investment_date experience
    1 1 2000 10 2000 0
    1 2 2001 9 2001 1
    1 2 2002 8 2001 1
    2 1 2003 4 2003 0
    2 1 2004 3 2003 0
    3 1 2003 12 2003 0
    3 3 2004 14 2004 1
    4 1 2000 15 2000 0
    4 2 2000 6 2000 0
    end
    *--------------------------end_desired_outcome-----------*
    Last edited by Ben James; 13 May 2016, 13:46.

  • #2
    There's an error in your initial data example, the investment_amount investment_date variables are inverted. Also, I don't understand the logic of only counting if the investment_amount per firm is unique (there are no such cases in your sample data).

    I'll assume that you want to count past experience in terms of first investment date and that a given first investment date counts only once. Then

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(company fund idate iamount first_idate)
    1 1 2000 10 2000
    1 2 2001  9 2001
    1 2 2002  8 2001
    2 1 2003  4 2003
    2 1 2004  3 2003
    3 1 2003 12 2003
    3 3 2004 14 2004
    4 1 2000 15 2000
    4 2 2000  6 2000
    end
    
    * we assume that first_idate is constant per company fund
    bysort company fund (first_idate): assert first_idate == first_idate[1] 
    
    * we assume that idate == first_idate for the first obs of a company fund
    by company fund: assert idate == first_idate if _n == 1
    
    * tag the first obs of a company on a given first_idate
    * this is the obs that will add experience
    bysort company first_idate: gen tocount = _n == 1
    
    * use a running sum to track past experience
    by company: gen iexp = sum(tocount) - 1
    
    sort company fund idate

    Comment


    • #3

      Thank you Robert!

      The reason I am asking is my question really consists of three experience variables (e.g. iexp1 iexp2 iexp3), of which:

      Output variables:
      iexp1: is what you showed above
      iexp2: is similar to iexp1 only then only counts unique projects (a new variable, see below) made
      iexp3: is the aggregate of all the iamounts made instead of the count at iexp1


      Input variables:
      * Example (with project)
      clear
      input float(company fund idate project iamount first_idate)
      1 1 2000 1 10 2000
      1 2 2001 2 9 2001
      1 2 2002 2 8 2001
      2 1 2003 1 4 2003
      2 1 2004 1 3 2003
      3 1 2003 1 12 2003
      3 3 2004 2 14 2004
      3 4 2006 3 22 2006
      4 1 2000 5 15 2000
      4 2 2000 6 6 2000
      End
      Last edited by Ben James; 14 May 2016, 02:10.

      Comment


      • #4
        I'm not sure I understand what you are expecting in iexp3. Since you refer to iexp1, I assume that you want a running sum of the total of iamount by first_idate, in the same way that iexp1 was calculated. That would look like:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(company fund idate project iamount first_idate)
        1 1 2000 1 10 2000
        1 2 2001 2  9 2001
        1 2 2002 2  8 2001
        2 1 2003 1  4 2003
        2 1 2004 1  3 2003
        3 1 2003 1 12 2003
        3 3 2004 2 14 2004
        3 4 2006 3 22 2006
        4 1 2000 5 15 2000
        4 2 2000 6  6 2000
        end
        
        * tag first_idate obs and count
        bysort company first_idate: gen tocount1 = _n == 1
        by company: gen iexp1 = sum(tocount1) - 1
        
        * tag first project obs and count
        bysort company project: gen tocount2 = _n == 1
        by company: gen iexp2 = sum(tocount2) - 1
        
        * aggretate of all iamounts on the same first_idate
        bysort company first_idate: egen fdatot = total(iamount)
        by company: gen iexp3 = sum(tocount1 * fdatot) - fdatot
        
        sort company fund idate

        Comment


        • #5
          Iexp3 was exactly what I was looking for.

          With iexp2 I meant to say that project is an addition to the contraints from variable iexp1, so iexp2 becomes: "only count all the previous investments made in the company group that came before the first_investment_date of a particular observation" if the project is also unique on top of that. (iexp2 by definition is always equal to or lower than the value of iexp2, due to an extra contraint)

          I have renamed iexp2 as calculated in your code above to iexp2_old, and named the variable I would like to calculate iexp_new:


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(company fund idate project iamount first_idate iexp_old iexp_new)
          1 1 2000 1 10 2000 0 0
          1 2 2001 2  9 2001 1 1
          1 2 2002 2  8 2001 0 0
          2 1 2003 1  4 2003 0 0
          2 1 2004 1  3 2003 0 0
          3 1 2003 1 12 2003 0 0
          3 3 2004 2 14 2004 1 1
          3 4 2006 3 22 2006 2 2
          4 1 2000 5 15 2000 0 0
          4 2 2000 6  6 2000 1 0
          end

          Comment


          • #6
            Do you have the answer already, maybe somehow include first_idate in the bysort or by command of iexp2?
            Last edited by Jimmy Chung; 16 May 2016, 06:55.

            Comment


            • #7
              This does not seem to help either. Does anybody know the answer to my question below?


              Originally posted by Ben James View Post
              With iexp2 I meant to say that project is an addition to the contraints from variable iexp1, so iexp2 becomes: "only count all the previous investments made in the company group that came before the first_investment_date of a particular observation" if the project is also unique on top of that. (iexp2 by definition is always equal to or lower than the value of iexp2, due to an extra contraint)

              I have renamed iexp2 as calculated in your code above to iexp2_old, and named the variable I would like to calculate iexp_new:


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(company fund idate project iamount first_idate iexp_old iexp_new)
              1 1 2000 1 10 2000 0 0
              1 2 2001 2 9 2001 1 1
              1 2 2002 2 8 2001 0 0
              2 1 2003 1 4 2003 0 0
              2 1 2004 1 3 2003 0 0
              3 1 2003 1 12 2003 0 0
              3 3 2004 2 14 2004 1 1
              3 4 2006 3 22 2006 2 2
              4 1 2000 5 15 2000 0 0
              4 2 2000 6 6 2000 1 0
              end

              Comment

              Working...
              X