Dealing with census year in multilevel model

Jean-Sebastien Bournival

Join Date: Aug 2022

Posts: 16
#1

Dealing with census year in multilevel model

11 Sep 2022, 07:35

Hi,
I have a hard time figuring this out. I have population data gathered from 7 consecutive censuses. I'm planning to use a model like this:

Code:

melogit y census x1 x2 x3 || dyad:

where "census" are the census years and my second level consists of dyads (parent-child pairs).

Although they are consecutive, it seems counterintuitive to treat census years as continuous since there is an interval of 10 years between each one. When running the model as continuous, I obtain a coefficient, but I'm not sure how to interpret it. I read somewhere that if time is taken at the same moment for everyone, it should be treated as a fixed effect. When I use i.census, my first census is omitted (which is correct in this case) but the last one also because of collinearity. Not sure what that means in this case. Since I'm following individuals in dyads over time (repeated measures), I assume that censoring in the last census is the issue.

Finally, I was wondering if time should not be considered at a higher level. I tested something like this:

Code:

melogit y x1 x2 x3 || year: || dyad:

but I got the message that initial values were not feasible.

Not sure how to deal with that. Any hint would be appreciated.
Tags: census year, multilevel
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

11 Sep 2022, 09:48

Although they are consecutive, it seems counterintuitive to treat census years as continuous since there is an interval of 10 years between each one. When running the model as continuous, I obtain a coefficient, but I'm not sure how to interpret it.

It would be more fruitful not to think of it is a contrast between continuous and discrete time, and rather as a contrast between a linear relationship between log odds y and time, vs an arbitrary relationship. When you use census as a "continuous" variable your model is asserting that the log odds of y increases by the exact same amount during each inter-censal period. Since you have provided no explanation of what y is, only you can say whether that is a plausible assertion or not. If, on the other hand, you introduce it as i.census, a "discrete" variable, you are allowing the effect of the census on y to change arbitrarily from one census to the next. This latter approach is always plausible, since it asserts nothing. But if there really is a linear relationship between census and log odds y, the results will show that only indirectly and not in a particularly convenient way. It is also less statistically efficient because the effect is spread out over 6 degrees of freedom instead of 1. So you need to think about this issue and perhaps consult the literature and experts in your discipline for advice.

When I use i.census, my first census is omitted (which is correct in this case) but the last one also because of collinearity. Not sure what that means in this case. Since I'm following individuals in dyads over time (repeated measures), I assume that censoring in the last census is the issue.

I don't know what you mean by censoring in the last census, but the most common explanation for an additional time variable being omitted due to colinearity is that one of the other model variables itself defines a time period. So if, say, variable x1 is an indicator variable for before vs after the 2008 financial crisis, the inclusion of that variable induces an extra colinearity with cenus and leads to one more variable dropping. This kind of thing comes up frequently and if I had to make a small bet, I would bet that this is what is going on here.

Finally, I was wondering if time should not be considered at a higher level. I tested something like this:

Not a good idea. You have only 7 census periods, and an N of 7 for a higher level is seldom helpful. Basically, to estimate the variation attributable by census, this model uses a sample size of 7 from census-space. That's too small to provide useful estimates.

I read somewhere that if time is taken at the same moment for everyone, it should be treated as a fixed effect.

I don't know where you would have read that, but, assuming you are remembering it correctly, all I can say is "don't believe everything you read."
Comment
Jean-Sebastien Bournival

Join Date: Aug 2022

Posts: 16
#3

11 Sep 2022, 12:23

Thank you for your thorough answer Mr Schechter.

My outcome is a dichotomous variable on whether or not 2 individuals from a dyad live in the same area at a specific time (census year). In the literature (mostly in demography and population studies), models are often limited to one year (surveys) or use event history models in which time is treated as discrete. Using continuous time, would you suggest adding time squared to validate the relationship between time and log odds? Or an interaction between time and the socioeconomic context?

For the "censoring" issue your comment is very helpful. As covariates I only use age, SES, sex, role (parent or children), head (being head of household), prop (the proportion of male children in the household) and context (the socioeconomic context -> fewer categories than enumeration areas of the census), so I don't see where the problem is. I ran this simple model and, still, the last census is omitted:

Code:

melogit y i.census

Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#4

11 Sep 2022, 12:31

Using continuous time, would you suggest adding time squared to validate the relationship between time and log odds? Or an interaction between time and the socioeconomic context?

Adding a squared term allows you to model certain types of curvilnear relationships between time and log odds. This is more flexible than just linear, but doesn't go all the way to the unrestrained freedom of i.time. The question is whether this type of curvilinear model is plausible for the relationship. These quadratic relationships are parabolic: and depending on whether the turning point of the parabola is inside or outside the range of the observed time period you could use it to model a relationship that is an upright or inverted parabola, or a curving arc. It would not allow you to model a relationship that oscillates. Not being a demographer, I cannot advise you whether the freedom and constraints of a quadratic model are appropriate. I think you will need to consult an expert in your field, or perhaps a demographer who follows this thread will respond. The same applies to an interaction between time and socioeconomic context. These are, at this point, substantive questions that are out of my field.

Concerning the dropping of an extra time period variable, the culprit here is age. Age and year go up in parallel, so there is going to be a linear relationship between age and year (census) within each dyad, and this leads to omission for colinearity of a year variable. Do note that this is nothing to be concerned about. It simply reflects the fact that any information that would be conveyed by that last time period variable is, in fact, already conveyed by the age variable and is properly accounted for in the modeling.
Comment
Jean-Sebastien Bournival

Join Date: Aug 2022

Posts: 16
#5

11 Sep 2022, 12:51

Thank you Mr Schechter, this is really helpful!
Comment

Announcement

Dealing with census year in multilevel model

Comment

Comment

Comment

Comment