Converting a Continuous Variable to an Ordinal Variable (For Large Numbers)

Bisola Hamzat

Join Date: Mar 2021

Posts: 2
#1

Converting a Continuous Variable to an Ordinal Variable (For Large Numbers)

24 Apr 2021, 17:38

Hi, I have a continuous variable (median household income) from my census data that I am trying to convert into an ordinal variable (i.e. I'm trying to make median household income groups). I attempted to split up the variable (FSA_medincome_household) into 12 groups based on an incremental income of $10,000, using this code:

egen FSA_medincome_household_743ord = cut(FSA_medincome_household_743), at(10000(10000)120000) label

The problem here is that all my variables get coded as missing. I am not sure if it has something to do with the large numbers for the variable or because it's represented as a long storage type.

Can anyone help me figure out what I am doing wrong?

Thanks in advance!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

24 Apr 2021, 19:47

The following example based on the single command you have shown us works as expected.

Code:

. describe medinc storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------ medinc long %12.0g . egen incord = cut(medinc), at(10000(10000)120000) label (2 missing values generated) . generate incordnl = incord (2 missing values generated) . list, clean medinc incord incordnl 1. 5000 . . 2. 15000 10000- 0 3. 25000 20000- 1 4. 35000 30000- 2 5. 45000 40000- 3 6. 55000 50000- 4 7. 65000 60000- 5 8. 75000 70000- 6 9. 85000 80000- 7 10. 95000 90000- 8 11. 105000 100000- 9 12. 115000 110000- 10 13. 125000 . . .

But perhaps the problem is in your data. Let me guess one very common cause. Your Census data came from a text file or an Excel worksheet or something similar, and for some reason the values were imported as a string variable. And then you used encode to convert the string variable to a numeric variable.

Was that a good guess?

If so, that's the source of your problem. The encode command is designed for assigning numerical codes to non-numeric strings like "France", "Germany", "United States". The output of help encode instructs us

Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.

So if you were using encode where you should have used destring, you need to go back to your original data and correctly convert the strings to numbers.

If that wasn't a good guess, then your problem really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output. In particular, it's particularly helpful to use the dataex command to provide sample data, as described in section 12 of the FAQ.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35760
#3

25 Apr 2021, 06:04

William Lisowski gives excellent advice and here's some more. As documented at (e.g.) https://www.stata-journal.com/articl...article=dm0095 and as evident also from the function definitions

Code:

gen bin = 10000 * floor(whatever / 10000)

and

Code:

gen bin = 10000 * ceil(whatever / 10000)

are direct ways to get bins of width 10000 with lower or upper bin limits as given by floor() or ceil() respectively.

This method has various direct advantages, including

1. Missing values are mapped to missing, as should usually be desired.

2. The bins are self-documenting, which is good for graphs and tables. That doesn't rule out fancier value labels if desired so long as the limits are integers.

3. There is less or even no need to fuss about what the overall range is, once you have decided on a bin width.

4. The floor and cei[ing] definitions make it evident -- including to non-Stata users who might read your code -- what happens at bin limits.

Evidently this applies "whatever value of 10000 you use", to adapt a comment attributed to William Feller. .
2 likes
Comment

Announcement

Converting a Continuous Variable to an Ordinal Variable (For Large Numbers)

Comment

Comment