Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Total over values of a row up to a certain value (long format)

    Hi there,

    I have panel data set in long format with company data containing funding data over 72 periods. So the data looks something like that:

    Company, funding1, funding2, ..., funding72
    1, 0, 0, ..., 2000000
    2, 0, 100000, ..., 0
    .
    .
    .

    For each of the companies an event happened at a certain period [1 -72]. I'd like to calculate the total funding before that event happened. For convenience reasons I'd like to keep the long format if possible. My problem is that I can't extract the information of the period from the variable name funding*. Among others I have tested the following approach but obviously the value of the variable is taken and not the variable name.

    Code:
    gen priorFunding = 0
    foreach x of varlist funding* {
        local xyz substr(`x',7,.)
        destring `xyz', replace force
        if `xyz' < eventgperiod {
            priorFunding == priorFunding + `x'
            }        
        }

    Is there a way to extract the period of funding out of the variable name without converting to wide format? I could not find any solution in Statalist nor in other forums, but probably (or hopefully) I just did not know what to search for.

    Thanks a lot in advance!

    P.S. I'm using Stata/MP 13.0

  • #2
    In Stata parlance what you have is called wide (not long) format.

    Check the help for -dataex- and provide some minimal data example we can work with.

    I do not understand how the event is defined from your description.

    Comment


    • #3
      If your variables are indeed named funding1 through funding72 then the following variation on your code should start you in a useful direction.
      Code:
      generate priorFunding = 0
      forvalues pd = 1/72 {
          replace priorFunding = priorFunding + funding`pd' if `pd' < eventgperiod
          }
      I feel obliged to add that the experienced users here generally agree that, with few exceptions, Stata makes it much more convenient to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. You may think a wide layout will be more convenient, but you have little experience with Stata. You should try to achieve what you need with the data organized in a long layout, and seek the help of Statalist in doing so.
      Code:
      reshape long funding, i(Company) j(period)
      after which this problem could be solved without looping.
      Code:
      by Company (period), sort: egen priorFunding = total(cond(period<eventgperiod,funding,0))
      Last edited by William Lisowski; 16 Jan 2019, 14:43.

      Comment


      • #4
        EDIT: This crossed with Joro & William's posts. Let me echo William in that it is usually far easier to work with data in long format in Stata. Also, his solution looks to be much easier than mine. :-)

        Hi Tim,

        So I would reshape it to long (your data are currently wide format), create the calculation, and then reshape back to wide if that's how you want it.

        Code:
        dataex id round_date round_number round_amt event  // data shared via -dataex-. To install: ssc install dataex
        clear
        input int id float(round_date1 round_amt1 event1 round_date2 round_amt2 event2 round_date3 round_amt3 event3 round_date4 round_amt4 event4 round_date5 round_amt5 event5)
        1 18274 12.9 0     .    . .     .    . .     .   . .     .   . .
        2 13226   18 0 13270    2 0 13271    2 1 13510 2.8 0     .   . .
        3 15519  2.4 0 16094 19.8 0 16584 64.2 0     .   . .     .   . .
        4 16649   .6 0 16833    1 0 17075  1.7 0 17257 2.1 1 17366 3.1 0
        5 16824  2.1 0 17170  2.1 0 17590   .8 1 18302  .5 0     .   . .
        end
        format %tdN/D/CY round_date1
        format %tdN/D/CY round_date2
        format %tdN/D/CY round_date3
        format %tdN/D/CY round_date4
        format %tdN/D/CY round_date5
        Code:
        *** Just listing first 3 rounds
        . list id round_date1 round_amt1 event1 round_date2 round_amt2 event2 round_date3 round_amt3 event3 , noobs abbrev(12)
        
          +----------------------------------------------------------------------------------------------------------------+
          | id   round_date1   round_amt1   event1   round_date2   round_amt2   event2   round_date3   round_amt3   event3 |
          |----------------------------------------------------------------------------------------------------------------|
          |  1    01/12/2010         12.9        0             .            .        .             .            .        . |
          |  2    03/18/1996           18        0    05/01/1996            2        0    05/02/1996            2        1 |
          |  3    06/28/2002          2.4        0    01/24/2004         19.8        0    05/28/2005         64.2        0 |
          |  4    08/01/2005           .6        0    02/01/2006            1        0    10/01/2006          1.7        0 |
          |  5    01/23/2006          2.1        0    01/04/2007          2.1        0    02/28/2008           .8        1 |
          +----------------------------------------------------------------------------------------------------------------+
        
        reshape long round_date round_amt event, i(id) j(round_number)
        . list, noobs sepby(id) abbrev(12)
        
          +----------------------------------------------------+
          | id   round_number   round_date   round_amt   event |
          |----------------------------------------------------|
          |  1              1   01/12/2010        12.9       0 |
          |  1              2            .           .       . |
          |  1              3            .           .       . |
          |  1              4            .           .       . |
          |  1              5            .           .       . |
          |----------------------------------------------------|
          |  2              1   03/18/1996          18       0 |
          |  2              2   05/01/1996           2       0 |
          |  2              3   05/02/1996           2       1 |
          |  2              4   12/27/1996         2.8       0 |
          |  2              5            .           .       . |
          |----------------------------------------------------|
          |  3              1   06/28/2002         2.4       0 |
          |  3              2   01/24/2004        19.8       0 |
          |  3              3   05/28/2005        64.2       0 |
          |  3              4            .           .       . |
          |  3              5            .           .       . |
          |----------------------------------------------------|
          |  4              1   08/01/2005          .6       0 |
          |  4              2   02/01/2006           1       0 |
          |  4              3   10/01/2006         1.7       0 |
          |  4              4   04/01/2007         2.1       1 |
          |  4              5   07/19/2007         3.1       0 |
          |----------------------------------------------------|
          |  5              1   01/23/2006         2.1       0 |
          |  5              2   01/04/2007         2.1       0 |
          |  5              3   02/28/2008          .8       1 |
          |  5              4   02/09/2010          .5       0 |
          |  5              5            .           .       . |
          +----------------------------------------------------+
        
        drop if round_date==. & round_amt==.   // don't need these
        bysort id (round_number): gen cum_total = sum( round_amt)  // adds up amount raised to date (for all rounds)
        clonevar total_to_event = cum_total
        bysort id (round_number): egen has_event = total(event)  // i if firm ever had event
        gsort id -event
        by id: gen event_round = round_number[1] if has_event==1  // what round did firm have event
        sort id round_number
        by id: replace total_to_event = total_to_event[event_round] if has_event==1 & round_number > event_round
        format cum_total total_to_event %10.1fc
        
        . list, noobs abbrev(14) sepby(id)
        
          +-----------------------------------------------------------------------------------------------------------+
          | id   round_number   round_date   round_amt   event   cum_total   total_to_event   has_event   event_round |
          |-----------------------------------------------------------------------------------------------------------|
          |  1              1   01/12/2010        12.9       0        12.9             12.9           0             . |
          |-----------------------------------------------------------------------------------------------------------|
          |  2              1   03/18/1996          18       0        18.0             18.0           1             3 |
          |  2              2   05/01/1996           2       0        20.0             20.0           1             3 |
          |  2              3   05/02/1996           2       1        22.0             22.0           1             3 |
          |  2              4   12/27/1996         2.8       0        24.8             22.0           1             3 |
          |-----------------------------------------------------------------------------------------------------------|
          |  3              1   06/28/2002         2.4       0         2.4              2.4           0             . |
          |  3              2   01/24/2004        19.8       0        22.2             22.2           0             . |
          |  3              3   05/28/2005        64.2       0        86.4             86.4           0             . |
          |-----------------------------------------------------------------------------------------------------------|
          |  4              1   08/01/2005          .6       0         0.6              0.6           1             4 |
          |  4              2   02/01/2006           1       0         1.6              1.6           1             4 |
          |  4              3   10/01/2006         1.7       0         3.3              3.3           1             4 |
          |  4              4   04/01/2007         2.1       1         5.4              5.4           1             4 |
          |  4              5   07/19/2007         3.1       0         8.5              5.4           1             4 |
          |-----------------------------------------------------------------------------------------------------------|
          |  5              1   01/23/2006         2.1       0         2.1              2.1           1             3 |
          |  5              2   01/04/2007         2.1       0         4.2              4.2           1             3 |
          |  5              3   02/28/2008          .8       1         5.0              5.0           1             3 |
          |  5              4   02/09/2010          .5       0         5.5              5.0           1             3 |
          +-----------------------------------------------------------------------------------------------------------+
        And that point you could collapse to get a list of how much each of these firms raised until the event, or reshape back to wide. Also note, if this funding takes place over different years, and it is spread out in your data, you will probably want to create a round_year variable and adjust your dollars for inflation.

        You can download a GDP deflator (for the US) here

        Code:
        reshape wide round_date round_amt event cum_total total_to_event, i(id) j(round_number)
        Last edited by David Benson; 16 Jan 2019, 14:54.

        Comment


        • #5
          William, David and Joro thank you all for your quick help!!! And sorry for mixing up long and wide format. I went for Williams approach and it works as desired. I agree with you, that calculating in long format is much easier. I'm transforming to long format in my do file in a later point but I needed the results in wide at that stage of the do file. Anyway as David mentioned, I could have transformed into long format and than transform it back.
          Also your suggestion to adjust for inflation was really helpful, David.
          Thanks again!

          Comment

          Working...
          X