Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I combine/collapse/contract/aggregate/append my data to merge (groups of) observations, with different conditions for different variables?

    In my data (example from other post) I have multiple entries for the same population ("entity") but also for different years ("year") with different values in other variables.

    1. With "egen tag" and "egen ndistinct" I can find out how many different "firms" each combination of "entity" and "year" (group?) has.
    ( I will have to do this not only for "firm" but also for "value" and other variables).

    2. Now I would like to know the frequencies for each value in "ndistinct" over all the combinations of "entity" and "year".
    (To get an idea of the variation within different variables and think about how to merge them in the most sensible way).

    3. Last I need to make sure there is only one observation per combinations of "entity" and "year" and therefore combine (merge, collapse, append...?) the values of other variables. If there is a most frequent value, choose that one, if there is not, choose the highest or first, or in some other cases the mean or sum (to be defined for each variable).

    I really struggle to find a way how to do that. I hope my example is clear.

    Original data
    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte entity int year str1 firm float value
    1 2010 "A" 15
    1 2010 "A"  8
    1 2010 "B" 12
    1 2011 "B" 25
    1 2012 "B"  8
    2 2010 "A"  7
    2 2011 "A"  5
    2 2011 "A" 12
    2 2011 "C" 13
    2 2012 "A" 19
    2 2012 "B" 25
    2 2011 "B" 14
    2 2012 "C" 18
    2 2012 "D" 16
    end
    sort entity year
    list, sepby(entity year)
    1. Question
    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte entity int year str1 firm float value
    1 2010 "A" 15
    1 2010 "A"  8
    1 2010 "B" 12
    1 2011 "B" 25
    1 2012 "B"  8
    2 2010 "A"  7
    2 2011 "A"  5
    2 2011 "A" 12
    2 2011 "C" 13
    2 2012 "A" 19
    2 2012 "B" 25
    2 2011 "B" 14
    2 2012 "C" 18
    2 2012 "D" 16
    end
    egen tag = tag(firm entity year)
    egen ndistinct = total(tag), by(entity year)
    
    sort entity year
    list, sepby(entity year)
    2. Question -> This is the table I would like to obtain
    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte ndistinct freq
    1 3
    2 1
    3 1
    4 1
    end
    sort ndistinct freq
    list, sepby(ndistinct freq)
    3. Question?
    Last edited by rezza cav; 08 Jul 2022, 07:54.

  • #2
    This is one of 6 threads on which this question was posted today. It is answered at https://www.statalist.org/forums/for...iven-condition.

    Comment

    Working...
    X