Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a new variable based on existing data

    I have a list of 100 companies with 10 years of data. Each row is a different year e.g. row 1 is 2010 for company X, row 10 is 2020 for company X and row 11 is 2010 for company Y. I am trying to create a new variable that basically numbers each company from 1-100, so rows 1-10 will = 1, rows 11-20 will = 2 and so on. What code can I write to create this? Note: not all companies have 10 years of data

  • #2
    In Stata what lies on a different row in spreadsheet terms is called an observation.

    So, each company has 10 observations, unless it doesn't. That lack of balance is workable, but not in terms of a rule for a new variable based on observation numbers going up by 1 each block of 10 observations.

    Code:
    egen id = group(company), label
    will map company names to numeric identifiers 1 up. The companies will be sorted alphabetically, which for most purposes is fine. If you have a compelling reason to keep the existing order of companies, you should try

    Code:
    gen ID = sum(company != company[_n-1])
    and test that worked as you wish by

    Code:
    tab ID year
    isid ID year 
    For creating integer sequences in blocks, see the help for egen and its function seq(), except as said that may not be quite right for what you want.

    See also

    FAQ . . . . . . . . . . . . . . . . . . . . . . Creating group identifiers
    . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and W. Gould
    3/01 How do I create individual identifiers numbered
    from 1 upwards?
    https://www.stata.com/support/faqs/d...p-identifiers/
    Last edited by Nick Cox; 17 Aug 2023, 00:50.

    Comment

    Working...
    X