Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A question regarding the use of control variables in the form of dummy variables in panel analysis

    Hi everyone... I'm Hadi... right now I'm working on my undergrad thesis. I'm planning to observe the relationship between the regional fiscal capacity ratio and local government social spending with SDG 1 performance—proxied by the 'percentage of poor population'—partially from 2020-2024. I'm also planning to use 'control variables' in the form of 'dummy variables,' specifically regional categorization or island differentiators to show where the local governments are located.... Because in Indonesia, there really is a difference between local governments on Java and those outside of Java.... I'm worried that if it's not controlled for, the panel analysis results might show a correlation/influence just because of the inherent nature of those islands... Java tends to be more advanced in terms of infrastructure, etc. Is that possible in Stata? Oh, and I'm a beginner, this is my first time using Stata. I've never dealt with panel analysis before—usually just regular regression. Thanks.

  • #2
    Hi Hadi,

    Dummy independent variables are fine in a regression or panel regression as long as they are 0/1 coded. If you have multiple categories or groups, or two groups with values other than 0 and 1, then you want to dummy encode. You can do that with the i. prefix, so i.varname will dummy encode the variable in a regression. For example:

    Code:
    reg depvar indvar i.dummyvar
    would dummy encode the last variable.

    You say you have a panel model. Are you using the xt commands, like xtset and xtreg? There are two basic flavors of panel models, the fixed effects model and the random effects model. The default is the fixed effects model. The fixed effects model will automatically control for all regional differences, assuming the size of the effect of your time-dependent variables is the same in any region. In that case, if you try to control for any regional differences, the coefficients will be "absorbed" by the fixed effects. Basically, the fixed effects model will have already controlled for any regional differences automatically, so you'll get a message saying the variable was "omitted". That's not a Stata thing, that is a consequence of the math. The random effects model drops the assumption that effects are the same across geographical units, but adds on some other assumptions as a cost. If you really needed to estimate the effect of one of those cross-sectional regional difference variables directly because it is a core part of your research question, you can do that with a random effects model. I'd talk to your instructor first though.

    I notice you say you only have about 4 years of data. Panel models are usually for controlling the cross-sectional variation (differences by region) and looking only at the time-series variation (deferences by time). If your data is annual, and you only have 4 years, you might not have enough variation in the time component to see an effect, so you might see your key coefficient is not statistically significant. In that case, you might want a model that focuses on the cross-sectional associations. I might use a normal regression with standard errors clustered by region using the vce(cluster regionvar) option, but you should ask your instructor what they think might be best. If you were looking more at the cross-sectional variation, you'd definitely want to include those cross-sectional control variables.
    Last edited by Daniel Schaefer; 29 Apr 2026, 12:51.

    Comment

    Working...
    X