Interpretation of Stata table

Rohini Pillai

Join Date: Dec 2019

Posts: 20
#1

Interpretation of Stata table

12 Jan 2020, 06:30

Please could someone explain this simple stata table? Many Thanks in advance.

table SALARY, con (mean OCCUPATION sd OCCUPATION)

SALARY mean(OCCUPATION) sd(OCCUPATION)

Below 20,000 2.563 0.899

Between 20,000 & 60,000 1.954 0.464

Above 60000 1.789 0.535
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

12 Jan 2020, 08:14

Rohini:
out of context, even trivial data are impossible to explain.
You should refer to the original source (article; working paper; else) to get an idea of what is reported in the excerpt of table you shared.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

12 Jan 2020, 08:31

I'm going to go a bit farther than Carlo and guess about your data based on the command you show.

Your variable SALARY is a categorical variable that takes 3 values whose value labels are "Below 20,000", "Between 20,000 & 60,000", and "Above 60000".

You have a variable OCCUPATION whose values are unknown to me. But very often occupation is coded as a categorical variable. For example, 1 means agricultural worker, 2 means service worker, 3 means manufacturing worker, ... .

Your table command divides your dataset into three groups using the value of SALARY, and for the observations in each category of SALARY it calculates the mean and standard deviation of the value of OCCUPATION.

In short, the table makes no sense to me. If indeed OCCUPATION is a categorical variable, it would have made sense to instead to do something like

Code:

tabulate SALARY OCCUPATION, row column

More generally, though, the output of help table explains how to understand what the table command requested. Note that "con" is an abbreviation for "contents".
1 like
Comment
Rohini Pillai

Join Date: Dec 2019

Posts: 20
#4

14 Jan 2020, 07:08

Thank you so much, Both SALARY and OCCUPATION are categorical variables.SALARY with 3 values, values labels are "Below 20,000", "Between 20,000 & 60,000", and "Above 60000". OCCUPATION has value labels,1 Senior Manager 2 Professionals 3 Associate Professionals 4 Others.
Comment
Rohini Pillai

Join Date: Dec 2019

Posts: 20
#5

14 Jan 2020, 07:09

tabulate SALARY OCCUPATION, row column The results i got is just a frequency table.
Comment
Rohini Pillai

Join Date: Dec 2019

Posts: 20
#6

14 Jan 2020, 07:12

The context of analysis, is SALARY is the independent variable and i am trying to understand the impact of Occupation and technical education on Salary.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

14 Jan 2020, 10:12

Rohini:
I'm not clear with your last post.
As per your description, you seem to have in mind to perform an OLS where the dependent variable is -salary-, whereas -occupation- and -technical education- are the independent ones.
Hence, translating your statistical project into Stata code, it would look like:

Code:

regress salary i.occupation i.technical_level education

You might also want to interact the two predictors:

Code:

regress salary i.occupation##i.technical_level education

That said, the main (and well known) problem with this kind of regressions is that they are plagued with endogeneity, as you do not explicitly take into account -personal_ability- that is correlated with both -technical_education- (other things being equal, smarter people obtain, on average, better marks) and salary (other things being equal, smarter people obtain, on average, higher wages).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Rohini Pillai

Join Date: Dec 2019

Posts: 20
#8

24 Jan 2020, 22:32

Thank you so much sir. I have performed a multinomial logistic regression on these variables. The purpose of table SALARY, con (mean OCCUPATION sd OCCUPATION) was to get some prior statistics. I have a large sample of 688 participants, whose technical education, salary and occupation are known. The idea is to get the statistics initially and then go on to perform regression so that the relationship is totally understood.

Last edited by Rohini Pillai; 24 Jan 2020, 22:53.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

25 Jan 2020, 02:45

Rohini:
thanks for clarifying.
As it is often the case, I do share William's take.
If all your variables are categorical, numbers are meaningless, as you can see from the following toy-example (that heavily draws upon my old but still lasting interest for tennis):

Code:

. set obs 10
number of observations (_N) was 0, now 10

. g A=1 in 1/5
(5 missing values generated)

. replace A=2 in 6/10
(5 real changes made)

. label define A 1 "Federer supporters" 2 "Nadal supporters"

. label val A A

. sum

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           A |         10         1.5    .5270463          1          2

. tab A

                 A |      Freq.     Percent        Cum.
-------------------+-----------------------------------
Federer supporters |          5       50.00       50.00
  Nadal supporters |          5       50.00      100.00
-------------------+-----------------------------------
             Total |         10      100.00

.

As you can see, -sum- produces totally uninformative results here.

Kind regards,
Carlo
(Stata 19.0)

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

25 Jan 2020, 08:59

Can you tell us exactly what your multinomial logistic regression command was?

Better yet, if you could copy the command and all the output from your Results window and paste it into a reply post, with the code delimiters [CODE] on the line before and [/CODE] on the line after , like the following example.

[CODE]
// sample code
sysuse auto, clear
describe
[/CODE]

so that the result will be presented in a readable font

Code:

// sample code sysuse auto, clear describe
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

25 Jan 2020, 10:52

Rohini:
another detail thatshould be interesting to share is the following one: how could you perform inference on those variables and have some problems with the descriptive statistics if they relate to the same dataset?

Kind regards,
Carlo
(Stata 19.0)
Comment

Rohini Pillai

Join Date: Dec 2019
Posts: 20

#12

05 Feb 2020, 21:36

Thank you sirs. Let me explain the context. I have two different data sets from two different periods. In both the periods i have the same variables, salary, technical education and occupation.Salary is the independent variable while the other two are the dependent variables. The first data set contains 236 observations and second data set has 680 observations.
Initially a chi square test is done to determine whether an association (or relationship) between 2 categorical variables Salary and Technical Education. The test is repeated for both sets of data to compare the results during both the periods. The following is the result.
Chi Square Test between Variables Salary and Technical Education

	TECHEDU
SALARY	Graduate		Diploma		PG Diploma		No Tech Qualification		Total
	2011-12	2017-18	2011-12	2017-18	2011-12	2017-18	2011-12	2017-18	2011-12	2017-18
Below 20,000	11	66	12	35	19	8	45	176	87	285
Between 20,000 & 60000	40	165	14	20	36	35	40	147	130	367
Above 60000	10	13	1	0	4	3	4	12	19	28

Total	61	244	27	55	59	46	89	335	236	680

Null Hypothesis H₀= Salary does not depend on technical education (Techedu is not associated with salary)
Alternate Hypthesis H₁= Salary depends on technical education (Techedu is associated with salary)

Table:

	SALARY
2011-12			2017-18
CHI SQ	P		CHIQ	P
TECHEDU	21.9	0.001		60.76	0.000
OCCUPATION	47.18	0.000		34.44	0.000

Next a multinomial logistic regression is done for both data sets using the command below. Results summarised in table below.

mlogit SALARY ib4.TECHEDU i.OCCUPATION, base (1)

	DEPENDENT VARIABLE: SALARY				DEPENDENT VARIABLE: SALARY
	2011-12				2017-18
INDEPENDENT VARIABLES	BETWEEN 20000&60000		ABOVE 60000		BETWEEN 20000&60000		ABOVE 60000
TECHEDU	Coef	Std. Err	Coef	Std. Err	Coef	Std. Err	Coef	Std. Err
Graduate	1.000	0.438	2.225	0.751	0.991	0.192	1.178	0.465
Diploma	-0.131	0.481	-0.51	1.2	-0.458	0.307	-14.096	676.17
PG Diploma	0.437	0.391	0.579	0.795	1.629	0.415	1.791	0.759
OCCUPATION
Professionals	-0.845	0.594	-2.2	0.7998	-0.0938	0.270	-1.11	0.545
Associate Professionals	-1.827	0.723	-2.679	1.261	-0.228	0.288	-1.487	0.626
Others	-4.351	1.1751	-16.059	612.736	-1.404	0.405	-15.62401	789.525
CONSTANT	1.216	0.590	-0.151	0.804	0.0863	0.242	-1.596	0.436
No of Observations	236				680
LR Chi2	63.52				91.14
Log likelihood =	-180.444				-517.925
Prob > chi2	0.000				0.000

Following this, relative risk ratio is calculated for both periods using the two different data sets. Results obtained is summarised as below.

	DEPENDENT VARIABLE: SALARY				DEPENDENT VARIABLE: SALARY
	2011-12				2017-18
INDEPENDENT VARIABLES	BETWEEN 20000&60000		ABOVE 60000		BETWEEN 20000&60000		ABOVE 60000
TECHEDU	rrr	Std. Err	rrr	Std. Err	rrr	Std. Err	rrr	Std. Err
Graduate	2.719	1.192	9.251	6.945	2.693	0.517	3.247	1.509
Diploma	0.877	0.422	0.600	0.720	0.633	0.194	0.000	0.000
PG Diploma	1.548	0.606	1.784	1.418	5.097	2.113	5.995	4.551
OCCUPATION
Professionals	0.429	0.255	0.111	0.089	0.910	0.246	0.330	0.180
Associate Professionals	0.161	0.116	0.069	0.865	0.796	0.229	0.226	0.142
Others	0.013	0.015	1.60E-07	0.000	0.246	0.099	0.000	0.000
CONSTANT	3.374	1.992	0.860	0.690	1.090	0.264	0.202	0.088
No of Observations	236				680
LR Chi2	63.52				91.14
Log likelihood =	-180.444				-517.925
Prob > chi2	0.000				0.000

The overall effect of technical education(techedu) and occupation during both the periods is tested using the test commands: Results summarised as below.

test 1.TECHEDU 2.TECHEDU 3.TECHEDU
test 1.OCCUPATION 2.OCCUPATION 3.OCCUPATION

	2011-12		2017-18
chi2	Pr >Chi2	chi2	Pr >Chi2
Techedu	13.48	0.036	46.18	0.000
Occupation	12.48	0.0135	6.47	0.167

Following this the predicted probabilities are calculated for both variables using the margins command. This is done for both data set to compare the results. Results summarised in table below.

	2011-12						2017-18
	BELOW 20000		BETWEEN 20000 & 60000		ABOVE 60000		BELOW 20000		BETWEEN 20000&60000		ABOVE 60000
TECHEDU	Margin	Std. Err	Margin	Std. Err	Margin	Std. Err	Margin	Std. Err	Margin	Std. Err	Margin	Std. Err
Graduate	0.239	0.968	0.687	2.780	0.074	3.740	0.293	0.380	0.687	0.890	0.021	1.268
Diploma	0.513	0.308	0.477	0.288	0.010	0.559	0.645	0.065	0.355	0.065	0.000	0.000
PG Diploma	0.370	0.454	0.607	0.739	0.022	1.184	0.179	0.270	0.797	1.171	0.023	1.433
No Techedu	0.478	0.423	0.506	0.447	0.016	0.862	0.528	0.381	0.460	0.333	0.011	0.712
OCCUPATION
Senior Managers	0.134	0.065	0.643	0.098	0.223	0.895	0.359	0.789	0.601	1.319	0.040	2.105
Professionals	0.308	0.037	0.635	0.039	0.057	0.020	0.391	0.308	0.595	0.469	0.014	0.775
Associate Professionals	0.530	0.112	0.409	0.110	0.061	0.059	0.424	0.251	0.565	0.333	0.011	0.579
Others	0.942	0.057	0.058	0.057	0.000	0.000	0.709	0.068	0.291	0.068	0.000	0.000
No of Observations	236						680

Now, marginsplot command is used to get the graphs for the above results for both periods. Graphs not copied here.

So this is the sequence of operations I have done, please let me know if these don't make any sense or can i get any meaningful interpretations from these?

Thank you so much for helping me out.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

06 Feb 2020, 00:14

Rohini:
I find your post a bit puzzling.
1) you're still mistaking dependent for independent variables (your dependent variable is -salary-);
2) if you have two datasets (and provided that are not related to the same sample size, otherwise you would have panel data), why not -append-ing them and perform -mlogit- on the resulting dataset, including -i.year- as a predictor in the right-hand side of your equation?

Kind regards,
Carlo
(Stata 19.0)
Comment
Rohini Pillai

Join Date: Dec 2019

Posts: 20
#14

06 Feb 2020, 04:53

Sir, thank you so much.

I am sorry, "Salary is the independent variable while the other two are the dependent variables" this statement was a typo. dependent variable is salary and independent variables occupation and technical education.

Regarding appending the two datasets and performing mlogit on the new data set i haven't thought about it that way since data set is huge with more than 140 variables. Would you be able to help me with the mlogit command for this.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#15

06 Feb 2020, 06:16

Rohini:
I think you should -append- first, and then go -mlogit- on the resulting dataset.

Kind regards,
Carlo
(Stata 19.0)
Comment

SALARY	mean(OCCUPATION)	sd(OCCUPATION)

Below 20,000	2.563	0.899
Between 20,000 & 60,000	1.954	0.464
Above 60000	1.789	0.535

Announcement