Calculating Frequencies of Already Used

Darren Johnson

Join Date: Sep 2019

Posts: 1
#1

Calculating Frequencies of Already Used

11 Sep 2019, 03:07

Simply put - I have a dataset in csv form (well, lots of them, but let's assume one) which I am planning to run through a Gini calculator - Fastgini or Ineqdeco, probably.

Within this data set are two particular variables of interest to me - zone and count. Count is my "income" variable (it's taken from light data so it looks a little strange, but let's call it income all the same) - and the other is my "count" variable. This is the number of times that specific "income" value has appeared across my data-set.

(links to images showing what this data set looks like, and what it looks like in data editor once I have highlighted only the relevant variables are below)

https://puu.sh/EfTH4/dc88ad17ba.png

https://puu.sh/EfTHh/cfba24e3aa.png

So, what I'd like to do before going any further is find a method to give me take the number in "count" as an indicator of the number of times the zone variable the count is describing actually appeared in the data-set. For clarity, here's a "fake" example.

Zone Count Frequency Percent

1 3 3 0.3

2 7 7 0.7

So in my data zone 1 appeared 3 times, Zone 2 7 times, and I get my frequencies from that.

The problem with the existing methods I have tried using tabulate - so

Code:

tab count

Or anything similar is it only returns a frequency that the count value itself appears in the dataset - so in the above example Stata is obviously going to return.

Zone Count Frequency Percent

1 3 3 .5

2 7 7 .5

Because when I tell it to tabulate for counts, there's only one 3 and one 7 appearing.

I feel there's something incredibly obvious I keep missing here - searching for keywords tabulate hasn't turned up much, and all frequency table searches lead to use of tabulate or tabstats - but so far I have been unable to accomplish the above with either.

I also understand that the way I've actually produced this csv data for Stata is anything but ideal (there would be no problem if my dataset simply had 3 individual intr, but unfortunately it's the only way I have been able to export data out of the program I am using, so I just have to accept the limitation.

I apologise again if I've missed something quite simple and/or any of this is in error, and thanks for your time.

Dropbox - statshazHaz1992NAME_EN_Hong Kong.csv - Simplify your life

https://www.dropbox.com

Dropbox is a free service that lets you bring your photos, docs, and videos anywhere and share them easily. Never email yourself a file again!
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

12 Sep 2019, 12:45

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Many of us won't open files from folks we don't know.

I don't fully understand your problem and data, but you might try frequency weights in tabulate.
Comment

Zone	Count	Frequency	Percent
1	3	3	0.3
2	7	7	0.7

Zone	Count	Frequency	Percent
1	3	3	.5
2	7	7	.5

Announcement

Calculating Frequencies of Already Used

Comment