I'm pleased to announce that fencode is now available from SSC. Please install it using:
What it does:
fencode provides frequency-based encoding of string and numeric variables. Unlike encode which assigns codes alphabetically, fencode assigns numeric codes ordered by frequency (descending or ascending). The command accepts both string variables and labeled numeric variables with non-sequential codes, standardizing them to sequential order.
This functionality is particularly valuable for creating tables and graphs where frequency ordering improves interpretability, and for regression analysis where the most frequent category serves as a more meaningful base category.
Why frequency-based encoding matters
Alphabetical ordering serves many purposes well, but frequency ordering often aligns better with several analytical goals. When creating a frequency table with Stata's excellent table command, the output appears in the order of the underlying numeric codes, with no built-in option to reorder by frequency. With fencode, categories automatically appear in frequency order, making patterns in the data immediately apparent.
The same principle applies to data visualization and regression analysis. Bar charts ordered by frequency reveal the distribution structure at a glance, highlighting which categories dominate the data. In regression models, using the most frequent category as the base often provides a more natural reference point for interpretation (coefficients then represent deviations from the most common case rather than from an arbitrary alphabetical baseline).
Quick examples
The command also provides sensible defaults - varname_fencode - if generate is not specified (no need to type gen() every time).
Full documentation is available through help fencode after installation. I hope others find this useful for their data preparation and visualization needs.
Best,
Kabira
P.S. The name fencode combines "f" for frequency with "encode" - a portmanteau that describes exactly what it does: frequency-based encoding. I do hope that people use it!
Code:
ssc install fencode
fencode provides frequency-based encoding of string and numeric variables. Unlike encode which assigns codes alphabetically, fencode assigns numeric codes ordered by frequency (descending or ascending). The command accepts both string variables and labeled numeric variables with non-sequential codes, standardizing them to sequential order.
This functionality is particularly valuable for creating tables and graphs where frequency ordering improves interpretability, and for regression analysis where the most frequent category serves as a more meaningful base category.
Why frequency-based encoding matters
Alphabetical ordering serves many purposes well, but frequency ordering often aligns better with several analytical goals. When creating a frequency table with Stata's excellent table command, the output appears in the order of the underlying numeric codes, with no built-in option to reorder by frequency. With fencode, categories automatically appear in frequency order, making patterns in the data immediately apparent.
The same principle applies to data visualization and regression analysis. Bar charts ordered by frequency reveal the distribution structure at a glance, highlighting which categories dominate the data. In regression models, using the most frequent category as the base often provides a more natural reference point for interpretation (coefficients then represent deviations from the most common case rather than from an arbitrary alphabetical baseline).
Quick examples
Code:
// Basic usage - creates sex_fencode with most frequent category as 1 webuse hbp2, clear fencode sex // Clean up messy codes - converts 2,3,4 to clean 1,2,3 sysuse voter, clear fencode candidat // Frequency-ordered graph without manual sorting sysuse citytemp, clear fencode region graph bar (count), over(region_fencode)
Full documentation is available through help fencode after installation. I hope others find this useful for their data preparation and visualization needs.
Best,
Kabira
P.S. The name fencode combines "f" for frequency with "encode" - a portmanteau that describes exactly what it does: frequency-based encoding. I do hope that people use it!
Comment