Dear Statalist Community,
I have a database of cars sold new in Spain for several years up to 2019. I have several models that are similar, but not completely identical.
I know that brands should be avoided on some forums (Stack Overflow, for example). For this reason, I am removing the car brand from my -dataex- below, for consistency with other forums.
I want to make a "fuzzy" group by grouping the more or less identical models with Julian Reif's -strgroup-. This command is available from:
or from SSC:
To give more context, here is a dataex:
For example, I'd like to group the "V 60 T6" models, "V 70 2.4" models, the "V 70 2.0" models, and so on through my data set. If possible. But with some conditions:
The final idea is to calculate the average price for the grouped models. The price for each model above is represented by -valor-. I need then to merge this dataset with another one from completely different sources and name conventions differ, which will merit another post from me soon.
Thank you in advance for your help.
Best regards,
Michael
I have a database of cars sold new in Spain for several years up to 2019. I have several models that are similar, but not completely identical.
I know that brands should be avoided on some forums (Stack Overflow, for example). For this reason, I am removing the car brand from my -dataex- below, for consistency with other forums.
I want to make a "fuzzy" group by grouping the more or less identical models with Julian Reif's -strgroup-. This command is available from:
Code:
net install strgroup, from("https://raw.githubusercontent.com/reifjulian/strgroup/master") replace
Code:
ssc install strgroup, replace
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str170 modelo str9 periodo str4 cc str6 cilind str3 gd str9 pkw str17 cvf str5 co2 str4 cv str6 valor "V 60 T6 Momentum Aut. 306" "-2013" "1969" "4" "G" "225" "13.2" "157" "306" "36800" "V 60 T6 Momentum AWD Aut." "-2013" "2953" "6" "G" "224" "19.79" "237" "304" "41200" "V 60 T6 R-Design Momentum Aut. 306" "-2013" "1969" "4" "G" "225" "13.2" "157" "306" "38700" "V 60 T6 R-Design Momentum AWD Aut." "-2013" "2953" "6" "G" "224" "19.79" "237" "304" "43200" "V 60 T6 Summum Aut. 306" "-2013" "1969" "4" "G" "225" "13.2" "157" "306" "39400" "V 60 T6 Summum AWD Aut." "-2013" "2953" "6" "G" "224" "19.79" "237" "304" "42900" "V 70 T6 Momentum AWD " "2007-2009" "2953" "6" "G" "209" "19.79" "270" "284" "39400" "V 70 T6 R-Design AWD" "2007-2009" "2953" "4" "G" "209" "16.83" "270" "284" "43700" "V 70 T6 Summum AWD " "2007-2009" "2953" "6" "G" "209" "19.79" "270" "284" "42000" "V 70 2.0 Aut." "1997-2000" "1984" "5" "G" "93" "14.49" "" "127" "17800" "V 70 2.0" "1997-2000" "1984" "5" "G" "93" "14.49" "" "127" "16900" "V 70 2.0D Kinetic" "2007-2013" "1997" "4" "D" "100" "13.31" "157" "136" "28000" "V 70 2.0D Momentum" "2007-2013" "1997" "4" "D" "100" "13.31" "157" "136" "30100" "V 70 2.0D Summum" "2007-2013" "1997" "4" "D" "100" "13.31" "157" "136" "32900" "V 70 2.0F Kinetic" "2007-2009" "1999" "4" "M" "107" "13.32" "206" "146" "27300" "V 70 2.0F Momentum" "2007-2009" "1999" "4" "M" "107" "13.32" "206" "146" "29400" "V 70 2.0F Summum" "2007-2009" "1999" "4" "M" "107" "13.32" "206" "146" "32200" "V 70 2.3 T5 Optima Aut" "2000-2004" "2319" "5" "G" "184" "15.92" "" "250" "31300" "V 70 2.4 140 Aut" "2000-2004" "2435" "5" "G" "103" "16.39" "" "140" "21800" "V 70 2.4 140 Optima Aut" "2000-2004" "2435" "5" "G" "103" "16.39" "" "140" "23800" "V 70 2.4 140 Optima" "2000-2004" "2435" "5" "G" "103" "16.39" "" "140" "22800" end
For example, I'd like to group the "V 60 T6" models, "V 70 2.4" models, the "V 70 2.0" models, and so on through my data set. If possible. But with some conditions:
- I'd like to group them according to their cubic capacity (represented by the -c.c.- variable),
- and their commercial period (variable -period- above), please.
The final idea is to calculate the average price for the grouped models. The price for each model above is represented by -valor-. I need then to merge this dataset with another one from completely different sources and name conventions differ, which will merit another post from me soon.
Thank you in advance for your help.
Best regards,
Michael