How to create a new variable based on existing data

Jane McGrath

Join Date: Aug 2023

Posts: 13
#1

How to create a new variable based on existing data

16 Aug 2023, 23:59

I have a list of 100 companies with 10 years of data. Each row is a different year e.g. row 1 is 2010 for company X, row 10 is 2020 for company X and row 11 is 2010 for company Y. I am trying to create a new variable that basically numbers each company from 1-100, so rows 1-10 will = 1, rows 11-20 will = 2 and so on. What code can I write to create this? Note: not all companies have 10 years of data
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35778
#2

17 Aug 2023, 00:45

In Stata what lies on a different row in spreadsheet terms is called an observation.

So, each company has 10 observations, unless it doesn't. That lack of balance is workable, but not in terms of a rule for a new variable based on observation numbers going up by 1 each block of 10 observations.

Code:

egen id = group(company), label

will map company names to numeric identifiers 1 up. The companies will be sorted alphabetically, which for most purposes is fine. If you have a compelling reason to keep the existing order of companies, you should try

Code:

gen ID = sum(company != company[_n-1])

and test that worked as you wish by

Code:

tab ID year isid ID year

For creating integer sequences in blocks, see the help for egen and its function seq(), except as said that may not be quite right for what you want.

See also

FAQ . . . . . . . . . . . . . . . . . . . . . . Creating group identifiers
. . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and W. Gould
3/01 How do I create individual identifiers numbered
from 1 upwards?
https://www.stata.com/support/faqs/d...p-identifiers/

Last edited by Nick Cox; 17 Aug 2023, 00:50.
Comment

Announcement

How to create a new variable based on existing data

Comment