Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: fencode (Frequency-based encoding of variables)

    I'm pleased to announce that fencode is now available from SSC. Please install it using:

    Code:
    ssc install fencode
    What it does:

    fencode provides frequency-based encoding of string and numeric variables. Unlike encode which assigns codes alphabetically, fencode assigns numeric codes ordered by frequency (descending or ascending). The command accepts both string variables and labeled numeric variables with non-sequential codes, standardizing them to sequential order.

    This functionality is particularly valuable for creating tables and graphs where frequency ordering improves interpretability, and for regression analysis where the most frequent category serves as a more meaningful base category.

    Why frequency-based encoding matters

    Alphabetical ordering serves many purposes well, but frequency ordering often aligns better with several analytical goals. When creating a frequency table with Stata's excellent table command, the output appears in the order of the underlying numeric codes, with no built-in option to reorder by frequency. With fencode, categories automatically appear in frequency order, making patterns in the data immediately apparent.

    The same principle applies to data visualization and regression analysis. Bar charts ordered by frequency reveal the distribution structure at a glance, highlighting which categories dominate the data. In regression models, using the most frequent category as the base often provides a more natural reference point for interpretation (coefficients then represent deviations from the most common case rather than from an arbitrary alphabetical baseline).

    Quick examples

    Code:
    // Basic usage - creates sex_fencode with most frequent category as 1
    webuse hbp2, clear
    fencode sex
    
    // Clean up messy codes - converts 2,3,4 to clean 1,2,3
    sysuse voter, clear  
    fencode candidat
    
    // Frequency-ordered graph without manual sorting
    sysuse citytemp, clear
    fencode region
    graph bar (count), over(region_fencode)
    The command also provides sensible defaults - varname_fencode - if generate is not specified (no need to type gen() every time).

    Full documentation is available through help fencode after installation. I hope others find this useful for their data preparation and visualization needs.

    Best,
    Kabira

    P.S. The name fencode combines "f" for frequency with "encode" - a portmanteau that describes exactly what it does: frequency-based encoding. I do hope that people use it!

  • #2
    See also https://journals.sagepub.com/doi/pdf...6867X211045582 for another command that will do this (and other taks in the same territory).

    Comment


    • #3
      Thank you, Nick, for pointing me to your article. I should have known you had addressed this problem already!

      You’re absolutely right that myaxis handles this case. For example:

      Code:
      webuse hbp2, clear
      fencode sex
      myaxis sex_myaxis = sex, sort(count) descending
      gives the same result.

      myaxis thoughtfully tackles the broader challenge of reordering categorical variables by any criterion, while fencode is narrower, focusing only on the frequency-encoding scenario I kept encountering in my own work.

      The only added value I can see for fencode is that it mirrors the familiar encode syntax. If users need broader flexibility (e.g. sorting by means, medians, or other criteria), then myaxis is the more general solution. I particularly appreciate the flexibility myaxis offers through the use of any egen function (not just frequency).

      And thank you (as always) for your contributions to the Stata community.

      Comment


      • #4
        Thanks much for #3. I agree whole-heartedly with your assessment. Your command is clearly and well focused and should be easy to use.

        Comment


        • #5
          Thanks to both of you (Kabira Namit and for an informative thread. I now have both packages installed. 👍
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 19.5 (Windows)

          Comment

          Working...
          X