Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assessing variation of different type of cars between 2 groups


    Trying to assess the variation of the NUMBER of different type of cars used by high vs low income people. I do not have a threshold of what is accetable and what isn't. For eg perhaps high income will use just 1 car vs low income who more commonly will use 3 cars

    Therefore I first, For each ID person, I would like to calculate the total number different cars used . I used the following, which gave me an error:

    dataset set:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(type car_model id volvo fiat mercedes renault)
    1 1 12 1 0 0 0
    1 1 12 1 0 0 0
    1 2 17 0 1 0 0
    2 1 16 1 0 0 0
    2 2 19 0 1 0 0
    2 3 20 0 0 1 0
    2 4 21 0 0 0 1
    end
    label values type m
    label def m 1 "high income", modify
    label def m 2 "low income", modify
    label values car_model q
    label def q 1 "volvo", modify
    label def q 2 "fiat", modify
    label def q 3 "mercedes", modify
    label def q 4 "renault", modify


    Code:
    egen total_used = total(volvo fiat mercedes renault), by(id)
    Error:
    //egen total_used = count(volvo fiat mercedes renault), by(id)
    volvofiatmercedesrenault not found
    r(111);

    I would like to then do the following - any recommendations/advice welcome

    Code:
    tab type, sum(total_used)
    
    regress total_used i.type

  • #2
    The easy part is explaining the error. The -egen, count()- function does not add things up. It allows only a single variable argument, not a list of variables, and it returns the count of non-missing value of that variable (within id, given that you specified -by(id)-). So it is not at all what you want. You want to add some things up.

    But your explanation of what you want does not tell me clearly what you want to add up. For example, in your data, id 1 has two different observations, both of which have a Volvo. So what is the correct answer for id 1? Is it 2 (2 cars) or is it 1 (1 distinct model of care)? If the correct answer is 2, then I think the simplest way to get this is:
    Code:
    by id, sort: gen number_of_cars = _N
    If, however, you want the number of different types of car, it would be this:
    Code:
    by id (model), sort: gen number_of_different_cars = sum(model != model[_n-1]
    As an aside, I notice that you have separate indicator variables for the different models. Do you have a specific reason for using those variables? They are completely redundant of the information provided by the nice variable car_model. Most likely you can do everything you need with just the car_model variable.

    Added:
    I would like to then do the following - any recommendations/advice welcome

    Code:
    tab type, sum(total_used)

    regress total_used i.type
    This is going to produce incorrect results because you have multiple observations per id, so somebody like id 1 will get double-counted. For this, you need to reduce the data set to one observation per id.
    Code:
    by id, sort: keep if _n == 1
    tab type, sum(number_of_cars) // OR number_of_different_cars, WHICHEVER IS WHAT YOU NEED
    regress number_of_cars i.type // OR number_of_different_cars
    Last edited by Clyde Schechter; 15 Apr 2024, 11:46.

    Comment

    Working...
    X