Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting something that isn't the unit of analysis?

    Relatively new Stata user. Please excuse if this question is too elementary, but I haven't been able to find the answer online in any Stata guidance surrounding the count function by group.

    Doing some analysis of a retail audit. The unit of analysis in the dataset is the outlet (6,000+ cases) with variables related to product availability for each outlet. However, what I am trying to answer is availability of a product by a geographic area, not outlet. The specific question is, how many towns have at least three outlets with X product, by region. So the variables in question are presence of a product dichotomous Y/N, town (hundreds of towns in the dataset, each with anywhere from 1-40 or so outlets associated with it), and region (25 total). Let's call these productavail (0s recoded to missing), town, and region.

    I've used the syntax:
    gsort town
    by town: egen newvaroutletcount = count (productavail)

    and that gives me the number of outlets in the town that have the product. By using the syntax:
    tab town newvaroutletcount if region=-1

    I can easily see the answer manually of how many towns within region 1 have the product and how many outlets within that town carry it. However, given that there are hundreds of towns, it is onerous for me to count how many of the towns in each region have 3+ outlets.

    I've tried creating a new variable that counts the towns that have newvaroutletcount>2, sometimes sorting by region, but when I do any counts of that newvar it is still counting the number of outlets, not the number of towns that meet the criteria of 3+. I want to be able to generate a simple crosstab that tells me how many towns meet that criteria by region.

    Please help! Thanks!

  • #2
    I do think these problems are challenging.

    Despite its name, the egen function count() is not the best tool here. It counts non-missing values and 0s and 1s qualify automatically.

    Counting the 1s of a binary indicator coded 1 or 0 is the same as getting their total, and its sibling total() is friendlier for your problem. (Missings are automatically ignored.)

    The problem of (not!) multiple counting can be addressed with yet another sibling tag(), introduced for that purpose in STB-50 (1999), but long since part of the official repertoire.

    Code:
    egen noutlets = total(productavail) , by(town)
    gen n3ormore = noutlets >= 3
    egen town_tag = tag(town)
    egen ninregion = total(tag & n3more), by(region)
    egen region_tag = tag(region)
    list region ninregion if region_tag
    should give you some helpful technique. See also http://www.stata-journal.com/sjpdf.h...iclenum=dm0042
    Last edited by Nick Cox; 29 Oct 2015, 08:32.

    Comment


    • #3
      This has solved my problem AND made my life magically easy. I cannot thank you enough! Meg

      Comment


      • #4
        See also http://www.stata-journal.com/sjpdf.h...iclenum=dm0033 for a wider discussion.

        Comment

        Working...
        X