Relatively new Stata user. Please excuse if this question is too elementary, but I haven't been able to find the answer online in any Stata guidance surrounding the count function by group.
Doing some analysis of a retail audit. The unit of analysis in the dataset is the outlet (6,000+ cases) with variables related to product availability for each outlet. However, what I am trying to answer is availability of a product by a geographic area, not outlet. The specific question is, how many towns have at least three outlets with X product, by region. So the variables in question are presence of a product dichotomous Y/N, town (hundreds of towns in the dataset, each with anywhere from 1-40 or so outlets associated with it), and region (25 total). Let's call these productavail (0s recoded to missing), town, and region.
I've used the syntax:
gsort town
by town: egen newvaroutletcount = count (productavail)
and that gives me the number of outlets in the town that have the product. By using the syntax:
tab town newvaroutletcount if region=-1
I can easily see the answer manually of how many towns within region 1 have the product and how many outlets within that town carry it. However, given that there are hundreds of towns, it is onerous for me to count how many of the towns in each region have 3+ outlets.
I've tried creating a new variable that counts the towns that have newvaroutletcount>2, sometimes sorting by region, but when I do any counts of that newvar it is still counting the number of outlets, not the number of towns that meet the criteria of 3+. I want to be able to generate a simple crosstab that tells me how many towns meet that criteria by region.
Please help! Thanks!
Doing some analysis of a retail audit. The unit of analysis in the dataset is the outlet (6,000+ cases) with variables related to product availability for each outlet. However, what I am trying to answer is availability of a product by a geographic area, not outlet. The specific question is, how many towns have at least three outlets with X product, by region. So the variables in question are presence of a product dichotomous Y/N, town (hundreds of towns in the dataset, each with anywhere from 1-40 or so outlets associated with it), and region (25 total). Let's call these productavail (0s recoded to missing), town, and region.
I've used the syntax:
gsort town
by town: egen newvaroutletcount = count (productavail)
and that gives me the number of outlets in the town that have the product. By using the syntax:
tab town newvaroutletcount if region=-1
I can easily see the answer manually of how many towns within region 1 have the product and how many outlets within that town carry it. However, given that there are hundreds of towns, it is onerous for me to count how many of the towns in each region have 3+ outlets.
I've tried creating a new variable that counts the towns that have newvaroutletcount>2, sometimes sorting by region, but when I do any counts of that newvar it is still counting the number of outlets, not the number of towns that meet the criteria of 3+. I want to be able to generate a simple crosstab that tells me how many towns meet that criteria by region.
Please help! Thanks!
Comment