Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Frequencies of Already Used

    Simply put - I have a dataset in csv form (well, lots of them, but let's assume one) which I am planning to run through a Gini calculator - Fastgini or Ineqdeco, probably.

    Within this data set are two particular variables of interest to me - zone and count. Count is my "income" variable (it's taken from light data so it looks a little strange, but let's call it income all the same) - and the other is my "count" variable. This is the number of times that specific "income" value has appeared across my data-set.

    (links to images showing what this data set looks like, and what it looks like in data editor once I have highlighted only the relevant variables are below)

    https://puu.sh/EfTH4/dc88ad17ba.png

    https://puu.sh/EfTHh/cfba24e3aa.png

    So, what I'd like to do before going any further is find a method to give me take the number in "count" as an indicator of the number of times the zone variable the count is describing actually appeared in the data-set. For clarity, here's a "fake" example.
    Zone Count Frequency Percent
    1 3 3 0.3
    2 7 7 0.7
    So in my data zone 1 appeared 3 times, Zone 2 7 times, and I get my frequencies from that.

    The problem with the existing methods I have tried using tabulate - so

    Code:
    tab count
    Or anything similar is it only returns a frequency that the count value itself appears in the dataset - so in the above example Stata is obviously going to return.
    Zone Count Frequency Percent
    1 3 3 .5
    2 7 7 .5
    Because when I tell it to tabulate for counts, there's only one 3 and one 7 appearing.

    I feel there's something incredibly obvious I keep missing here - searching for keywords tabulate hasn't turned up much, and all frequency table searches lead to use of tabulate or tabstats - but so far I have been unable to accomplish the above with either.

    I also understand that the way I've actually produced this csv data for Stata is anything but ideal (there would be no problem if my dataset simply had 3 individual intr, but unfortunately it's the only way I have been able to export data out of the program I am using, so I just have to accept the limitation.

    I apologise again if I've missed something quite simple and/or any of this is in error, and thanks for your time.
    Dropbox is a free service that lets you bring your photos, docs, and videos anywhere and share them easily. Never email yourself a file again!

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Many of us won't open files from folks we don't know.

    I don't fully understand your problem and data, but you might try frequency weights in tabulate.

    Comment

    Working...
    X