Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Smart way to identify strange observations?

    Hi, I am using Stata 13 and I have some sales data from a few stores. I scaled the sales data by store size and plotted it over time. I just wanted to get a feel for the data. It looks a bit funny (see the graph below).
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	100.4 KB
ID:	1391124



    There are a few stores that do not have any (what seems like) seasonal fluctuation over time. I wonder if there is any smart way to identify these stores systematically? My initial idea was to identify them by variance. Here is what I did:

    Code:
    gen year_month = mofd(Date)
        format year_month %tm
        
    
            
    *scale monthly sales by size
    gen monthly_sales_by_size = monthly_sales/Size
            
            
    
            
    *Some graphs to get a feel for the data
    line monthly_sales_by_size year_month
        
        
    *get variance by store
    bysort Store: egen sales_variance = sd(monthly_sales_by_size)
    sum sales_variance
        
    drop if sales_variance <0.005
    line monthly_sales_by_size year_month, legend(size(medsmall))
    But that is such a crude way of doing it. Any idaes? I think it is a very interesting problem

    Thank you in advance! /R
    Last edited by Rachel Sleeps; 05 May 2017, 08:39. Reason: forgot the tags

  • #2
    I think that you don't actually have a problem with your stores, but rather a problem with your plot.

    Are you inferring lack of seasonality from the straight lines that run all the way from the left edge to the right?

    The problem is that what you wanted was a plot that overlaid several stores on the same plot. What you have is a plot that is one long line for all the stores. The lines you see as running straight from left to right are actually running from the (right) end of one plot to the (left) beginning of the next.

    Without seeing your plot command I will not speculate further, but I think you're missing a by() option to produce separate plots for each store.

    Comment


    • #3
      Code:
      sort Store year_month  
       line monthly_sales_by_size year_month, legend(size(medsmall)) c(L)
      would probably remove the spurious connections. Otherwise what's especially odd? I would probably also try

      Code:
      gen Month = month(Date) 
      
      line monthly_sales_by_size Month, by(Store) c(L)

      Comment


      • #4
        Wow....what a rookie mistake....Thank you both!

        Comment

        Working...
        X