Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • descriptive stats by nested group

    Attached is the format of my data.(SFQ2.dta)

    I have n 'customers' who buy one of the 5 'type of food' products multiple 'times' at a different 'price'. I want to be able to tabulate the following:

    What is the count of customers and their average price per customer paid for 5 type of food products? I have added an excel sheet example table of what I want.

    Attached Files

  • #2
    Guneet, your Excel table is probably wrong for item #4.

    Code:
    use "http://www.statalist.org/forums/filedata/fetch?id=1306758", clear
    destring customer foodtype price, replace
    
    tempfile tmp
    preserve
      sort foodtype customer
      by foodtype customer: keep if _n==1
      collapse (count)ncust=customer, by(foodtype)
      save `"`tmp'"'
    restore
    sort foodtype
    merge foodtype using `"`tmp'"', keep(ncust)
    drop _merge
    
    rename customer purchase
    table foodtype, c(mean price min ncust count purchase)
    Produces:
    Code:
    -------------------------------------------------
     foodtype | mean(price)   min(ncust)  N(purchase)
    ----------+--------------------------------------
            1 |       17.75            2            4
            2 |        9.75            2            4
            3 |          10            2            4
            4 |   17.666666            2            6
            5 |         9.5            2            4
    -------------------------------------------------
    I don't understand what variable times is. If this is the time indicator (period) then perhaps you want the table separately for different time periods. If it is multiple purchases, then perhaps you want to use this as weights.

    Best, Sergiy Radyakin

    Comment


    • #3
      Thank you so much Sergiy!!! Yeah entry for customer 4 was wrong.

      Can you please tell me why we had to use tempfile function? And when do we use ` sign instead of ' ?

      Comment


      • #4
        Originally posted by Guneet Kaur View Post
        Thank you so much Sergiy!!! Yeah entry for customer 4 was wrong.

        1) Can you please tell me why we had to use tempfile function?
        2) And when do we use ` sign instead of ' ?
        1) I wanted to reduce data from purchase level (customer-item pair) to item level. Tempfile reserves a temporary filename, which can be used by subsequent save and merge commands. Command tempfile allows programs to work on any computer, regardless of where Stata is installed, where your do file is or where your data is. Tempfiles are erased on completion of your program and signify to the person reading your program that the file being saved will in the end be discarded, it is an intermediate product only. Use of tempfiles is a good practice when you need to save something temporarily to the disk (to overcome the present limitation of Stata of 1 dataset in memory). But I envision the same can be done without the use of tempfiles or merges using observation subscripts. There are probably other ways as well.

        2) use of quotes is explained here: http://www.stata.com/help.cgi?quotes

        Note that your customer ID variable may remain a string, but in general I would advise to use the most basic types for storage, and your customer ID is all numeric. Hence destring at the beginning.

        Comment


        • #5
          Thank you Sergiy. I am getting an error at merge stage

          merge dnd_bin using `"`tmp'"', keep(ncust)
          (note: you are using old merge syntax; see [D] merge for new syntax)
          variable _merge already defined


          I tried using new syntax for merge but it does not seem to work. I took 1:1 merge syntax and got below error

          merge 1:1 dnd_bin using `"`tmp'"', keep(ncust)
          ncust: invalid resulttype
          results, specified in options assert() and keep(), are the integers 1 through 5, or master (equivalent to 1),
          using (2), match (3), match_update (4), or match_conflict (5). The last two arise only when option update is
          specified.
          r(198);


          Will you be able to help? Thanks!!

          Comment


          • #6
            Here's another way of doing this

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str8 customer str4 foodtype str5 price float times
            "1" "1" "13" 1 
            "1" "1" "14" 2 
            "2" "2" "14" 1 
            "2" "2" "16" 2 
            "3" "3" "2" 1 
            "3" "3" "4" 2 
            "4" "4" "21" 1 
            "4" "4" "22" 2 
            "4" "4" "31" 3 
            "5" "5" "8" 1 
            "5" "5" "9" 2 
            "6" "1" "21" 1 
            "6" "1" "23" 2 
            "7" "2" "4" 1 
            "7" "2" "5" 2 
            "8" "3" "12" 1 
            "8" "3" "22" 2 
            "9" "4" "9" 1 
            "9" "4" "11" 2 
            "9" "4" "12" 3 
            "10" "5" "10" 1 
            "10" "5" "11" 2 
            end
            
            * tag first purchase of foodtype by customer
            bysort foodtype customer: gen first = _n == 1
            
            * number of customer per food type
            by foodtype: egen ncustomer = total(first)
            
            * average price
            bysort foodtype: egen avprice = mean(real(price))
            
            * reduce to one observation per foodtype
            by foodtype: keep if _n == 1
            keep foodtype avprice ncustomer
            list
            
            * to get the table in the format in the spreadsheet
            destring foodtype, replace
            xpose, clear varname
            list

            Comment


            • #7
              Thank You sooooo much Robert!! very intuitive...It worked. Can you tell me a way in which I do not loose my original database and still get outputs in spreadsheet?

              Comment


              • #8
                I'm not sure I understand what you want here. You do not loose the original dataset, you can always reload it from where you started from. If you are asking for a way to combine two datasets the way you can have multiple spreadsheets in Excel, then you can't do that in Stata as there is only one dataset in memory at a time. This does not stop you from combining the information in one dataset however. If your original dataset is called "mydata.dta", then you can do the following, using an unmatched merge:

                Code:
                use "mydata.dta", clear
                
                * tag first purchase of foodtype by customer
                bysort foodtype customer: gen first = _n == 1
                
                * number of customer per food type
                by foodtype: egen ncustomer = total(first)
                
                * average price
                bysort foodtype: egen avprice = mean(real(price))
                
                * reduce to one observation per foodtype
                by foodtype: keep if _n == 1
                keep foodtype avprice ncustomer
                list
                
                * to get the table in the format in the spreadsheet
                destring foodtype, replace
                xpose, clear varname
                list
                
                * combine table with original data (save under new name!)
                merge 1:1 _n using "mydata.dta", nogen
                save "mydata+table.dta", replace

                Comment

                Working...
                X