Path Analysis Using sem – Dummy Coding for Categorical Independent Variables

Sarah Won

Join Date: Jun 2025

Posts: 9
#1

Path Analysis Using sem – Dummy Coding for Categorical Independent Variables

15 Jun 2025, 21:21

Hello everyone,

Thank you so much in advance for your time and support.

I am conducting a path analysis using the sem command in Stata. All of my mediators and outcome variables are continuous. The independent variables include continuous, dichotomous, and categorical variables.

As I understand it, the sem command does not support factor variable notation (e.g., i.varname), so I created dummy variables manually for the categorical variables with three or more categories. I would appreciate it if you could review my approach and let me know if it is correct.

Example: Race/Ethnicity Variable

Code:

tab racehisp_2015 racehisp_2015 | Freq. Percent Cum. ----------------------------------------+----------------------------------- 0 Non-Hispanic White | 894 61.53 61.53 1 Non-Hispanic Black | 405 27.87 89.40 2 Others (AI/AN/Asian/NHPI/Other/Hispan | 154 10.60 100.00 ----------------------------------------+----------------------------------- Total | 1,453 100.00

To set Non-Hispanic White (0) as the reference group, I created the following dummy variables:

Code:

gen black_dummy_2015 = (racehisp_2015 == 1) gen others_dummy_2015 = (racehisp_2015 == 2)

These dummy variables are coded as 1 if the participant belongs to the specified group, and 0 otherwise.

Code:

tab black_dummy_2015 black_dummy | _2015 | Freq. Percent Cum. ------------+----------------------------------- 0 | 1,048 72.13 72.13 1 | 405 27.87 100.00 ------------+----------------------------------- Total | 1,453 100.00

Code:

tab others_dummy_2015 others_dumm | y_2015 | Freq. Percent Cum. ------------+----------------------------------- 0 | 1,299 89.40 89.40 1 | 154 10.60 100.00 ------------+----------------------------------- Total | 1,453 100.00

Questions:

Does this approach correctly treat Non-Hispanic White as the reference group?
I understand that each dummy variable includes Non-Hispanic White and the remaining group(s) in the 0 category. Is this appropriate for creating dummy variables?

Should all dichotomous variables in the model be coded as 0 and 1?
For example, should “1” consistently indicate the presence of a characteristic or condition, and “0” indicate absence?

Thank you again for your time and support. I would greatly appreciate your feedback.
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 691
#2

16 Jun 2025, 00:02

1. Your coding is fine. Simply include black_dummy_2015 and others_dummy_2015 in your model and the non-hispanic white will be the reference group. You can also save time by using tab for creating dummies for you:

Code:

tab racehisp_2015, gen(race_dummies)

2. Stata does not care whether 1 means "presence" or "absence" of a group or characteristic. Of course, a consistent pattern makes the interpretation of the results easier for you and helps you avoid mistakes.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Sarah Won

Join Date: Jun 2025

Posts: 9
#3

16 Jun 2025, 08:14

Felix Bittmann Thank you so much for your review and suggestions. It is really helpful.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#4

16 Jun 2025, 08:40

Another way to automate the creation of indicator ("dummy") variables is with the -xi- command or -xi:- prefix. Although this is largely obsolete, its remaining role is for use in those commands that, like -sem- don't support factor variable notation. It looks a great deal like factor-variable notation. In your situation you could write your -sem- command as:

Code:

xi: sem (dv <- iv1 iv2 i.racehisp_2015)

replacing the italicized parts with the actual corresponding variables in your model. Note that all though this approach resembles factor-variable notation typographically, it does not enable you to subsequently use -margins- correctly. See -help xi- for a full explanation of the -xi- command.
Comment
Sarah Won

Join Date: Jun 2025

Posts: 9
#5

16 Jun 2025, 10:14

Clyde Schechter Thank you very much for suggesting an alternative approach. This is very helpful!
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1131
#6

16 Jun 2025, 15:49

Note too that -gsem- allows factor variables. See the Intro 3 section here for more info.

PS- You mentioned "mediators" in #1. IMO, it is better to call them putative or presumed mediators. The article by Fiedler et al. (2018) explains why I say that. I think this article should be required reading for doing mediation analysis. YMMV.

Fiedler, K., Harris, C., & Schott, M. (2018). Unwarranted inferences from statistical mediation tests–An analysis of articles published in 2015. Journal of Experimental Social Psychology, 75, 95-102.

Last edited by Bruce Weaver; 16 Jun 2025, 15:56. Reason: Added the PS.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Sarah Won

Join Date: Jun 2025

Posts: 9
#7

16 Jun 2025, 18:48

Bruce Weaver Thank you so much for your invaluable information and suggestions regarding the terminology. This has been truly helpful!
1 like
Comment

Announcement

Path Analysis Using sem – Dummy Coding for Categorical Independent Variables

Comment

Comment

Comment

Comment

Comment

Comment