how does stata decide if a variable is continuous or ordinal?

Cynthia Tedore

Join Date: Sep 2014

Posts: 7
#1

how does stata decide if a variable is continuous or ordinal?

04 Sep 2014, 04:39

Does it do this automatically? If it is stored as a byte or int then can I assume it's considered ordinal?
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35720
#2

04 Sep 2014, 04:48

I can't think of any context in Stata ever decides whether a variable is continuous or ordinal. Stata doesn't even have the concept of a continuous variable anywhere.

Storage type is essentially no more than that: storage type. You can hold measured variables as byte if the values allow or binary variables as double if you wish. The latter would be silly, but Stata would not act differently.

You, the user, get to decide, or rather to indicate, that a variable is ordinal if you feed that variable as response or outcome to a command such as ologit.

You may need to give much more context and explain why you are asking this if this answer doesn't help.
Comment
Cynthia Tedore

Join Date: Sep 2014

Posts: 7
#3

04 Sep 2014, 04:58

Thanks for your response! I am running a meglm, and specifying the family as ordinal, so I feel confident that the response variable is being treated as an ordinal variable. I am less sure about my independent variables, most of which are continuous, but one of which is ordinal. An example of one of the models I am running:

meglm escalation focalminusopponent order || males:, family(ordinal) link(cloglog)

escalation = four classes of escalation that a male spider can exhibit during a contest with another male (ordinal)
focalminusopponent = size difference between male opponents (continuous)
order = whether this was an individual male's first, second, or third trial (ordinal)
males = identity of each male, identified as a string

How do I specify that the variable order should be treated as an ordinal variable?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35720
#4

04 Sep 2014, 05:04

Covariates in any model will be treated literally, meaning numerically, unless you specify otherwise. An ordinal covariate will often be entered as a set of indicator variables. Check out "factor variables".
Comment
Cynthia Tedore

Join Date: Sep 2014

Posts: 7
#5

04 Sep 2014, 06:19

Thanks for pointing me in the right direction. So let me make sure I'm understanding correctly what I've read:
1) Stata treats categorical and ordinal variables equivalently. (If so, is it always valid to do so?)
2) Adding "i." to the beginning of a variable name in the command window during model specification converts that variable to a categorical variable.
Thanks!!
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#6

04 Sep 2014, 06:37

Originally posted by Cynthia Tedore View Post

1) Stata treats categorical and ordinal variables equivalently. (If so, is it always valid to do so?)

I think you are coming from the SPSS point of view where you can select whether variable is nominal, interval, ratio or ordinal. Broadly speaking in Stata you will be mostly using strings or numeric variables (help data_types). You may have numeric variable: 1, 1.1, 1.3, 1.4 and use it as a group where 1 and when 1.2 are used to denote group names that would be equivalent to "Group A" and "Group B". On principle, you define how your variable should be treated when you specify a command using bys varname : or , by(varname) your varname will be treated, in first case to sort and group the data, in second case, to group the output (more or less). It doesn't matter what is the content of varname, Stata can group by numbers, strings, combination of those with missing values. So, on a conceptual level, contrary to SPSS you will specify the character of variable when performing a command. Naturally, if you attempt to introduce string variable where numeric variable is required Stata will produce and error. If I remember correctly, in SPSS selection dialog boxes provide access to variables depending on type so it's not possible to introduce numeric variable where only strings are accepted. Personally, I find that Stata's approach is much more sensible as you may wish to treat given variable (1, 1.1, 1.2 for example) as nominal, interval or ratio depending on the context.

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35720
#7

04 Sep 2014, 08:48

I think Konrad has most of the answer.

With the major exception of anova, where there is ancient syntax supported as a matter of continuity, Stata doesn't really have a concept of categorical variable either. Rather, what you do is specify which variables are to be treated as a bunch of indicators, for the purposes of a model fit. Thus "convert" is too strong a word. i.varname has no effects beyond the running of the command, other than through saved results.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#8

04 Sep 2014, 09:46

note that sometimes you want to treat nominal and ordinal predictors differently (you virtually always want to treat differently when they are outcomes); in fact, I once wrote a Stata program called -cascade- (use -search- to find and download) to make "cascading" dummy variables (comparing each level to the preceding level rather than to a reference level; the point is that, using Stata, you can deal with these variables in any way you want - which is good
Comment
Cynthia Tedore

Join Date: Sep 2014

Posts: 7
#9

04 Sep 2014, 09:46

Thanks very much for your comments! I have to admit I am a bit stumped by the bys varname : command that Konrad suggested, though... However, it sounds like using i.varname will accomplish what I'm aiming for, right?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#10

04 Sep 2014, 19:31

I'll just add a few things.

The method you are using makes assumptions about the measurement of the dependent variable. if you are using ologit it treats the DV as ordinal, if logit it treats it as binary. It is usually up to you to get it right though; Stata isn't going to whine at you if you use a binary or ordinal or multinomial dependent variable with regress.

Likewise, Stata wouldn't whine at you if you added a multinomial independent variable (e.g. religion) as an independent variable. It would just treat it as continuous, even if that were nonsensical.

When you have a categorical variable, you generally want to use the i.varname notation rather than create the dummy variables yourself. Besides saving a little work, this is very advantageous for post-estimation commands like margins. By using i. notation, Stata will know that, say, a person can't be both ages 20-29 and ages 30-39 simultaneously. You should use the i. notation even when the variable is already coded as a 0/1 dummy.

In short, it is really up to you to make sure ordinal variables are treated as ordinal, categorical variables are treated as categorical, etc. Stata isn't going to save you if, say, you try to treat a 5 category nominal variable as though it were continuous.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5008

#11

04 Sep 2014, 19:37

Originally posted by Rich Goldstein View Post

note that sometimes you want to treat nominal and ordinal predictors differently (you virtually always want to treat differently when they are outcomes); in fact, I once wrote a Stata program called -cascade- (use -search- to find and download) to make "cascading" dummy variables (comparing each level to the preceding level rather than to a reference level; the point is that, using Stata, you can deal with these variables in any way you want - which is good

I have not used Rich's -cascade- program, but I bet you could now do the same thing or something similar with the -contrast- command. Operators available with contrast (from the help):

Code:

      r.                     differences from the reference (base) level; the default
      a.                     differences from the next level (adjacent contrasts)
      ar.                    differences from the previous level (reverse adjacent contrasts)

    As-balanced operators
      g.                     differences from the balanced grand mean
      h.                     differences from the balanced mean of subsequent levels (Helmert contrasts)
      j.                     differences from the balanced mean of previous levels (reverse Helmert contrasts)
      p.                     orthogonal polynomial in the level values
      q.                     orthogonal polynomial in the level sequence

    As-observed operators
      gw.                    differences from the observation-weighted grand mean
      hw.                    differences from the observation-weighted mean of subsequent levels
      jw.                    differences from the observation-weighted mean of previous levels
      pw.                    observation-weighted orthogonal polynomial in the level values
      qw.                    observation-weighted orthogonal polynomial in the level sequence

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Cynthia Tedore

Join Date: Sep 2014

Posts: 7
#12

05 Sep 2014, 02:15

Thanks very much for the feedback! I tried the cascade program and put in the two new variables it constructed for me as fixed effects. This seemed to give me exactly the same result as when I set the middle value as the base using "ib2." notation.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#13

05 Sep 2014, 02:47

1. re: Rich W's post (#11) - yes, of course you can; the article (STB-6) discusses how to use -test- following the use of either regular dummies or cascading dummies to reproduce the other

2. re: Cynthia's comment in #12; yes, if there are only 3 categories then choosing the middle category for your reference level with regular dummies will give the same result as cascading dummies; choosing the middle category, as you did, means that the two dummies in the model are each compared to that - so one is "up" and the other is "down"
Comment
Gedeao Locks

Join Date: Aug 2018

Posts: 10
#14

14 Feb 2019, 09:33

Dear all,

In the same line; I have age only in years and I'm clustering my SE by age cell. Results change if I indicate age as (i.age) rather than just inserting it in the regression. Not using the factor (i.) means I'm not telling STATA to treat age as discrete?

Thanks in advance.

Best,
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#15

14 Feb 2019, 09:39

Originally posted by Gedeao Locks View Post

Dear all,

In the same line; I have age only in years and I'm clustering my SE by age cell. Results change if I indicate age as (i.age) rather than just inserting it in the regression. Not using the factor (i.) means I'm not telling STATA to treat age as discrete?

Thanks in advance.

Best,

You are correct. If you don't precede the age variable with i., Stata will treat it as continuous.

If age was coded in raw years, then this is probably not too far wrong. You could ask Stata to fit a quadratic specification for age if you thought the effect wasn't linear, or you could ask for a more complex specification involving splines, or fractional polynomials, or whatever.

If age was coded as a group variable (e.g. in 10-year age groups), then that's probably not what you want to do.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement