Assessing variation of different type of cars between 2 groups

Rose Matthews

Join Date: Aug 2023

Posts: 148
#1

Assessing variation of different type of cars between 2 groups

15 Apr 2024, 05:00

Trying to assess the variation of the NUMBER of different type of cars used by high vs low income people. I do not have a threshold of what is accetable and what isn't. For eg perhaps high income will use just 1 car vs low income who more commonly will use 3 cars

Therefore I first, For each ID person, I would like to calculate the total number different cars used . I used the following, which gave me an error:

dataset set:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(type car_model id volvo fiat mercedes renault) 1 1 12 1 0 0 0 1 1 12 1 0 0 0 1 2 17 0 1 0 0 2 1 16 1 0 0 0 2 2 19 0 1 0 0 2 3 20 0 0 1 0 2 4 21 0 0 0 1 end label values type m label def m 1 "high income", modify label def m 2 "low income", modify label values car_model q label def q 1 "volvo", modify label def q 2 "fiat", modify label def q 3 "mercedes", modify label def q 4 "renault", modify

Code:

egen total_used = total(volvo fiat mercedes renault), by(id)

Error:
//egen total_used = count(volvo fiat mercedes renault), by(id)
volvofiatmercedesrenault not found
r(111);

I would like to then do the following - any recommendations/advice welcome

Code:

tab type, sum(total_used) regress total_used i.type
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 28624
#2

15 Apr 2024, 11:41

The easy part is explaining the error. The -egen, count()- function does not add things up. It allows only a single variable argument, not a list of variables, and it returns the count of non-missing value of that variable (within id, given that you specified -by(id)-). So it is not at all what you want. You want to add some things up.

But your explanation of what you want does not tell me clearly what you want to add up. For example, in your data, id 1 has two different observations, both of which have a Volvo. So what is the correct answer for id 1? Is it 2 (2 cars) or is it 1 (1 distinct model of care)? If the correct answer is 2, then I think the simplest way to get this is:

Code:

by id, sort: gen number_of_cars = _N

If, however, you want the number of different types of car, it would be this:

Code:

by id (model), sort: gen number_of_different_cars = sum(model != model[_n-1]

As an aside, I notice that you have separate indicator variables for the different models. Do you have a specific reason for using those variables? They are completely redundant of the information provided by the nice variable car_model. Most likely you can do everything you need with just the car_model variable.

Added:

I would like to then do the following - any recommendations/advice welcome

Code:
tab type, sum(total_used)

regress total_used i.type

This is going to produce incorrect results because you have multiple observations per id, so somebody like id 1 will get double-counted. For this, you need to reduce the data set to one observation per id.

Code:

by id, sort: keep if _n == 1 tab type, sum(number_of_cars) // OR number_of_different_cars, WHICHEVER IS WHAT YOU NEED regress number_of_cars i.type // OR number_of_different_cars

Last edited by Clyde Schechter; 15 Apr 2024, 11:46.
Comment

Announcement

Assessing variation of different type of cars between 2 groups

Comment