Simply put - I have a dataset in csv form (well, lots of them, but let's assume one) which I am planning to run through a Gini calculator - Fastgini or Ineqdeco, probably.
Within this data set are two particular variables of interest to me - zone and count. Count is my "income" variable (it's taken from light data so it looks a little strange, but let's call it income all the same) - and the other is my "count" variable. This is the number of times that specific "income" value has appeared across my data-set.
(links to images showing what this data set looks like, and what it looks like in data editor once I have highlighted only the relevant variables are below)
https://puu.sh/EfTH4/dc88ad17ba.png
https://puu.sh/EfTHh/cfba24e3aa.png
So, what I'd like to do before going any further is find a method to give me take the number in "count" as an indicator of the number of times the zone variable the count is describing actually appeared in the data-set. For clarity, here's a "fake" example.
So in my data zone 1 appeared 3 times, Zone 2 7 times, and I get my frequencies from that.
The problem with the existing methods I have tried using tabulate - so
Or anything similar is it only returns a frequency that the count value itself appears in the dataset - so in the above example Stata is obviously going to return.
Because when I tell it to tabulate for counts, there's only one 3 and one 7 appearing.
I feel there's something incredibly obvious I keep missing here - searching for keywords tabulate hasn't turned up much, and all frequency table searches lead to use of tabulate or tabstats - but so far I have been unable to accomplish the above with either.
I also understand that the way I've actually produced this csv data for Stata is anything but ideal (there would be no problem if my dataset simply had 3 individual intr, but unfortunately it's the only way I have been able to export data out of the program I am using, so I just have to accept the limitation.
I apologise again if I've missed something quite simple and/or any of this is in error, and thanks for your time.
Within this data set are two particular variables of interest to me - zone and count. Count is my "income" variable (it's taken from light data so it looks a little strange, but let's call it income all the same) - and the other is my "count" variable. This is the number of times that specific "income" value has appeared across my data-set.
(links to images showing what this data set looks like, and what it looks like in data editor once I have highlighted only the relevant variables are below)
https://puu.sh/EfTH4/dc88ad17ba.png
https://puu.sh/EfTHh/cfba24e3aa.png
So, what I'd like to do before going any further is find a method to give me take the number in "count" as an indicator of the number of times the zone variable the count is describing actually appeared in the data-set. For clarity, here's a "fake" example.
Zone | Count | Frequency | Percent |
1 | 3 | 3 | 0.3 |
2 | 7 | 7 | 0.7 |
The problem with the existing methods I have tried using tabulate - so
Code:
tab count
Zone | Count | Frequency | Percent |
1 | 3 | 3 | .5 |
2 | 7 | 7 | .5 |
I feel there's something incredibly obvious I keep missing here - searching for keywords tabulate hasn't turned up much, and all frequency table searches lead to use of tabulate or tabstats - but so far I have been unable to accomplish the above with either.
I also understand that the way I've actually produced this csv data for Stata is anything but ideal (there would be no problem if my dataset simply had 3 individual intr, but unfortunately it's the only way I have been able to export data out of the program I am using, so I just have to accept the limitation.
I apologise again if I've missed something quite simple and/or any of this is in error, and thanks for your time.
Comment