No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating average variable


    I have a dataset recording all transaction at a market in a certain time period, including the quantity sold of a certain item per transaction. Also recorded is a unique customer ID for returning customers (‘cusn’) (1-258, with a lot of transactions where cusn is not specified). Now I need a new variable AVQUAN which gives the average quantity sold to that specific person in the entire time period. Thus, first I need to generate the sum of all quantitities sold per customer, and then average this over the number of occurrences of that ‘cusn’. Can anyone help with this? I've tried some options, but most require too much computing power.

    The relevant available variables are:

    ‘cusn’ a customer number for each unique customer (1-258) (with a lot of missing values)
    ‘quan’ is the quantity sold to the customer in that particular transaction

    As I see it; I first need to count the number of times a specific ‘cusn’ appears, then aggregate the total ‘quan’ over all these appearences, and then divide TOTALQUAN by the number of appearances. Can anyone help me with the quickest and most efficient code for this?

    I am fairly new to stata, but eager to learn. Excuse me if it's too simple for this forum, or if I should clarify.


  • #2
    It's easier than that:

    by cusn, sort: egen avg_quan = mean(quan)
    Note: This will treat all of the observations with missing values of cusn as if they were all one customer. But there is no approach to this problem that would not suffer the same limitation. If you prefer not to even calculate it for those observations, just stick -if !missing(cusn)- on the end of that line.

    -egen- is a basic data management command in Stata. Do read -help egen- to see the many things it can do for you. In addition, familiarize yourself with the entire Stata Users's Guide in the on-line manuals to learn other commands that come up all the time and are fundamental to the effective use of Stata. You won't be able to remember everything in it, but next time something like this comes up, it will "ring a bell" and you will know where in the help files and manuals to turn for the details.


    • #3
      Give an example of your data as urged in the FAQ Advice #12. It need not be real, but it should be realistic.


      • #4
        Thanks a lot, Clyde, seems to work perfectly. I encountered the egen command, but couldn't yet figure out how to use it in this context. Indeed, I need to familiarize myself with the user's guide as I go. Here, I just couldn't see the proper code combination yet, so thanks again. This does mean that with the following command I get the amount of transactions per customer, right?

        by cusn, sort: egen reg = count(cusn)
        @Nick, you're right of course. The relevant variables are like this;

        * Example generated by -dataex-. To install: ssc install dataex
        input int(cusn quan)
         78 300
         42 100
         57 400
        246 600
        128  60
        190 120
         37 100
        246 570
         11 100
        215 360
          7  60
        178  35
         11 120
         65  60
        238  60
        236  50
        176  60
         52 600
        195 100
          6 240
        Where cusn may appear several times, depending on the amount of transactions made by that particular customer, and 'quan' specifies the amount sold in that particular transaction. Thus; for each transaction, I need the total quantity sold by that customer divided by the total amount of transactions by that customer, as specified above.
        Last edited by Dennis Ferrer; 11 Jan 2017, 14:52.