Doubts regarding interpretation of Multilevel Logistic Regression Models

Rui Agostinho

Join Date: Apr 2019

Posts: 24
#1

Doubts regarding interpretation of Multilevel Logistic Regression Models

08 Aug 2024, 12:06

Hello. I have a dataset with information on workers and industries. The dataset's key is worker_id-year.

What I want to understand is what percentage of the impact on change_to_eship (my dependent variable) is due to worker-level variables, and what percentage is due to industry-level variables. change_to_eship is a binary variable, which is 1 if a worker transitions to entrepreneurship, and 0 otherewise.

At the worker-level I have the following independent variables: tenure, gender, rganho, schooling_1d, nemp, vn.

The industry-level independent variables: tenure_median, vn_per_employee_median, secondary_education, higher_education , rganho_median, nemp_median, num_firms, and gender_industry.

Year is common for both industry and worker level.

I want to understand what percentage of the effect in change_to_eship is due to: tenure gender rganho schooling_1d nemp vn job_level_1d, and what percentage of the effect is due to tenure_median vn_per_employee_median secondary_education higher_education rganho_median nemp_median num_firms gender_industry.

To achieve this, I constructed 3 Multilevel Logistic Regression Models. The first model, which includes only Worker-level Variables. The second model includes only Industry-level Variables. The third model includes both Worker-level and Industry-level Variables.

Code:

*Model 1 - only Worker-level Variables melogit change_to_eship tenure gender rganho i.schooling_1d nemp vn i.year /// || industry_id:, vce(cluster industry_id) estat icc *Model 2 - only Industry-level Variables melogit change_to_eship tenure_median vn_per_employee_median secondary_education higher_education /// rganho_median nemp_median num_firms gender_industry i.year /// || industry_id:, vce(cluster industry_id) estat icc *Model 3 - both worker and industry-level variables melogit change_to_eship tenure gender rganho i.schooling_1d nemp vn job_level_1d tenure_median /// vn_per_employee_median secondary_education higher_education rganho_median nemp_median num_firms /// gender_industry i.year || industry_id:, vce(cluster industry_id) estat icc

The icc of Model 1 was of 0.013664, the icc of Model 2 was of 0.1589, and the icc of Model 3 was of 0.0098919.

From what I can understand, the ICC can be interpreted as the proportion of the total variance in the probability of changing to an entrepreneur that is attributable to industry-level factors. How can it then be that the ICC of Model 1, which includes no industry-level variables is larger than that of model 3, which includes both worker- and industry-level variables? Also, since when only industry-level variables are included, 15.89% of the variance is due to differences between industries, is it ok to state that this indicates that industry-level variables alone explain a significant portion of the variance in the likelihood of transitioning to entrepreneurship?

Thank you very much for any help!
Tags: None
Erik Ruzek

Join Date: Oct 2017

Posts: 423
#2

08 Aug 2024, 14:09

With multilevel data, you have to remember that variables measured at the lowest level of the data hierarchy (individual workers), provide information not only at the worker level but also at the industry level. Each industry contains a collection of workers, and accordingly, industries can differ in their aggregate worker characteristics. If you want to know what the ICC is solely based on within-industry differences, then you need to disentangle the within- from between-industry differences in worker characteristics. The following is how to do that:

Code:

foreach v of varilist tenure, gender, rganho, schooling_1d, nemp, vn { * create industry mean (imn) variables for each worker variable bysort industry_id: egen imn_`v' = mean(`v') * create worker variables that remove variance due to industry mean differences (centered within industry - cwi) gen cwi_`v' = `v' - imn_`v' } * re-run model 1 with the cwi worker variables melogit change_to_eship cwi_tenure cwi_gender cwi_rganho cwi_schooling_1d cwi_nemp cwi_vn i.year /// || industry_id:, vce(cluster industry_id) estat icc

This will tell you the ICC when solely accounting for within-industry differences due to the worker variables in your model. Compare the variance estimate for industry_id in this model to the one from your original model 1 and calculate the ICC. The coefficients from the cwi version of model 1 are interpreted as average within-industry differences in the outcome associated with a 1-unit change in the predictor.
Comment

Announcement

Doubts regarding interpretation of Multilevel Logistic Regression Models

Comment