Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditional sum on STATA

    Hi,

    I have a dataset with the following variables:
    - Brand (41 brands in total)
    - product identifier (identifies unique product and unique brand)
    - Date started
    - Date ended
    - Brand start
    - Brand end

    ex:
    product identifier / brand / date started / date ended / Brand start / Brand end
    101-------------------- A--------- 2001 ----------2002 ---------1990 ---------2010
    102-------------------- A ---------2008 ----------2010 ---------1990 ---------2010

    I would like to analyze how many different products are produced by a given brand, year by year (generate a new dataset).

    My initial idea was to create a duplicate my observations so that I now have a row for each year between brand start and brand end. I then wanted to sum columns X conditional on the brand name, and date started being before the year corresponding to the row, and date ended being after the year corresponding to the row.

    product identifier / brand / date started / date ended / Brand start / Brand end / Year / X / Value of interest
    101 ----------------------A---- 2001------------- 2002 -------------1990 ---------2010 ------1990-- 1-------------- ?
    ----------------------------A--------------------------------------------------------------------- ------1990-- 0-------------- ?
    ...

    102 ----------------------A---- 1008------------- 2010 -------------1990 ---------2010 ------1990-- 1-------------- ?
    ----------------------------A--------------------------------------------------------------------- ------1991-- 0-------------- ?
    ----------------------------A--------------------------------------------------------------------- ------1992-- 0-------------- ?
    ----------------------------A--------------------------------------------------------------------- ------1993-- 0-------------- ?
    ----------------------------A--------------------------------------------------------------------- ------1994-- 0-------------- ?
    ----------------------------A--------------------------------------------------------------------- ------1995-- 0-------------- ?

    Value of interest would be calculated by adding the value in the X column for each row that meet the criteria: for the value of interest, brand has to be A, date started has to be before 1990, date ended has to be after 1990.
    I did this without any problems in Excel but I have no idea of how to do it with STATA. Any suggestions?

    Thank you very much in advance.

    Josh.
    Last edited by Josh Ellon; 16 Feb 2019, 19:18.

  • #2
    I have a dataset with the following variables:
    - Brand (41 brands in total)
    - product identifier (identifies unique product and unique brand)
    - Date started
    - Date ended
    - Brand start
    - Brand end
    No doubt you do. But it is not a Stata data set, because none of those are legal variable names in Stata. So first import your data into Stata. Once you have done that, use the -dataex- command to post a sample of your data so that those who want to help you have something to work with. I would also suggest a clearer explanation of your desired result since, to be honest, I can't follow your description.

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Hi Josh, and welcome to Statalist!

      1) So this sounds pretty straightforward to do in Stata, although you will probably need to share more about your data. As Clyde mentioned, the easiest way to do that would be to use Stata's dataex command. If you're not familiar with dataex (and most Stata users aren't) I created a Youtube tutorial here. (I made it too long--feel free to watch at 2x speed, and you may only need the first 6 minutes)

      2) You don't need to add a row between obs in Stata (you do in Excel to have room for your subtotals). You'll just use tabulate, table, or tabstat. See here

      Code:
      * I added some additional data to your original post
      dataex brand product_id year_start year_end brand_start brand_end // data shared via -dataex-. To install: ssc install dataex
      clear
      input str1 brand int(product_id year_start year_end brand_start brand_end)
      "A" 101 2001 2002 1990 2010
      "A" 102 2008 2010 1990 2010
      "A" 103 2001 2005 1990 2010
      "B" 104 2003 2008 1991 2010
      "B" 105 2006 2010 1991 2010
      "C" 106 2005 2006 1995 2010
      "C" 107 2002 2004 1995 2010
      "C" 108 2006 2009 1995 2010
      "C" 109 2006 2009 1995 2010
      "D" 110 2005 2008 2000 2010
      "D" 111 2006 2009 2000 2010
      "D" 112 2002 2005 2000 2010
      "D" 113 2006 2009 2000 2010
      end
      ------------------ copy up to and including the previous line ------------------
      
      order brand, first
      sort brand product_id
      . list brand product_id year_start year_end brand_start brand_end, noobs abbrev(14)
      
        +----------------------------------------------------------------------+
        | brand   product_id   year_start   year_end   brand_start   brand_end |
        |----------------------------------------------------------------------|
        |     A          101         2001       2002          1990        2010 |
        |     A          102         2008       2010          1990        2010 |
        |     A          103         2001       2005          1990        2010 |
        |     B          104         2003       2008          1991        2010 |
        |     B          105         2006       2010          1991        2010 |
        |----------------------------------------------------------------------|
        |     C          106         2005       2006          1995        2010 |
        |     C          107         2002       2004          1995        2010 |
        |     C          108         2006       2009          1995        2010 |
        |     C          109         2006       2009          1995        2010 |
        |     D          110         2005       2008          2000        2010 |
        |----------------------------------------------------------------------|
        |     D          111         2006       2009          2000        2010 |
        |     D          112         2002       2005          2000        2010 |
        |     D          113         2006       2009          2000        2010 |
        +----------------------------------------------------------------------+
      
      * NOTE: This is missing 2004 and 2007 because no products were started in those years in my toy data
      tabulate brand year_start
      
                 |                            year_start
           brand |      2001       2002       2003       2005       2006       2008 |     Total
      -----------+------------------------------------------------------------------+----------
               A |         2          0          0          0          0          1 |         3
               B |         0          0          1          0          1          0 |         2
               C |         0          1          0          1          2          0 |         4
               D |         0          1          0          1          2          0 |         4
      -----------+------------------------------------------------------------------+----------
           Total |         2          2          1          2          5          1 |        13
      
      . tabulate year_start brand
      
                 |                    brand
      year_start |         A          B          C          D |     Total
      -----------+--------------------------------------------+----------
            2001 |         2          0          0          0 |         2
            2002 |         0          0          1          1 |         2
            2003 |         0          1          0          0 |         1
            2005 |         0          0          1          1 |         2
            2006 |         0          1          2          2 |         5
            2008 |         1          0          0          0 |         1
      -----------+--------------------------------------------+----------
           Total |         3          2          4          4 |        13
      
      gen before_2005 = (year_start < 2005)  // 1 if product started before 2005, 0 otherwise
      gen after_2008   = (year_end > 2008) & year_end!=.
      
      . list, noobs sepby(brand) abbrev(12)
      
        +-------------------------------------------------------------------------------------------------+
        | brand   product_id   year_start   year_end   brand_start   brand_end   before_2005   after_2008 |
        |-------------------------------------------------------------------------------------------------|
        |     A          101         2001       2002          1990        2010             1            0 |
        |     A          102         2008       2010          1990        2010             0            1 |
        |     A          103         2001       2005          1990        2010             1            0 |
        |-------------------------------------------------------------------------------------------------|
        |     B          104         2003       2008          1991        2010             1            0 |
        |     B          105         2006       2010          1991        2010             0            1 |
        |-------------------------------------------------------------------------------------------------|
        |     C          106         2005       2006          1995        2010             0            0 |
        |     C          107         2002       2004          1995        2010             1            0 |
        |     C          108         2006       2009          1995        2010             0            1 |
        |     C          109         2006       2009          1995        2010             0            1 |
        |-------------------------------------------------------------------------------------------------|
        |     D          110         2005       2008          2000        2010             0            0 |
        |     D          111         2006       2009          2000        2010             0            1 |
        |     D          112         2002       2005          2000        2010             1            0 |
        |     D          113         2006       2009          2000        2010             0            1 |
        +-------------------------------------------------------------------------------------------------+
      Hopefully this will help you get started. It doesn't solve the SUMIF() part, but post a sample of your data and we can help you go from there.
      Last edited by David Benson; 16 Feb 2019, 19:43.

      Comment

      Working...
      X