Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of Stata table

    Please could someone explain this simple stata table? Many Thanks in advance.

    table SALARY, con (mean OCCUPATION sd OCCUPATION)


    SALARY mean(OCCUPATION) sd(OCCUPATION)
    Below 20,000 2.563 0.899
    Between 20,000 & 60,000 1.954 0.464
    Above 60000 1.789 0.535

  • #2
    Rohini:
    out of context, even trivial data are impossible to explain.
    You should refer to the original source (article; working paper; else) to get an idea of what is reported in the excerpt of table you shared.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I'm going to go a bit farther than Carlo and guess about your data based on the command you show.

      Your variable SALARY is a categorical variable that takes 3 values whose value labels are "Below 20,000", "Between 20,000 & 60,000", and "Above 60000".

      You have a variable OCCUPATION whose values are unknown to me. But very often occupation is coded as a categorical variable. For example, 1 means agricultural worker, 2 means service worker, 3 means manufacturing worker, ... .

      Your table command divides your dataset into three groups using the value of SALARY, and for the observations in each category of SALARY it calculates the mean and standard deviation of the value of OCCUPATION.

      In short, the table makes no sense to me. If indeed OCCUPATION is a categorical variable, it would have made sense to instead to do something like
      Code:
      tabulate SALARY OCCUPATION, row column
      More generally, though, the output of help table explains how to understand what the table command requested. Note that "con" is an abbreviation for "contents".

      Comment


      • #4
        Thank you so much, Both SALARY and OCCUPATION are categorical variables.SALARY with 3 values, values labels are "Below 20,000", "Between 20,000 & 60,000", and "Above 60000". OCCUPATION has value labels,1 Senior Manager 2 Professionals 3 Associate Professionals 4 Others.

        Comment


        • #5

          tabulate SALARY OCCUPATION, row column The results i got is just a frequency table.

          Comment


          • #6
            The context of analysis, is SALARY is the independent variable and i am trying to understand the impact of Occupation and technical education on Salary.

            Comment


            • #7
              Rohini:
              I'm not clear with your last post.
              As per your description, you seem to have in mind to perform an OLS where the dependent variable is -salary-, whereas -occupation- and -technical education- are the independent ones.
              Hence, translating your statistical project into Stata code, it would look like:
              Code:
              regress salary i.occupation i.technical_level education
              You might also want to interact the two predictors:

              Code:
              regress salary i.occupation##i.technical_level education
              That said, the main (and well known) problem with this kind of regressions is that they are plagued with endogeneity, as you do not explicitly take into account -personal_ability- that is correlated with both -technical_education- (other things being equal, smarter people obtain, on average, better marks) and salary (other things being equal, smarter people obtain, on average, higher wages).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thank you so much sir. I have performed a multinomial logistic regression on these variables. The purpose of table SALARY, con (mean OCCUPATION sd OCCUPATION) was to get some prior statistics. I have a large sample of 688 participants, whose technical education, salary and occupation are known. The idea is to get the statistics initially and then go on to perform regression so that the relationship is totally understood.
                Last edited by Rohini Pillai; 24 Jan 2020, 22:53.

                Comment


                • #9
                  Rohini:
                  thanks for clarifying.
                  As it is often the case, I do share William's take.
                  If all your variables are categorical, numbers are meaningless, as you can see from the following toy-example (that heavily draws upon my old but still lasting interest for tennis):
                  Code:
                  . set obs 10
                  number of observations (_N) was 0, now 10
                  
                  . g A=1 in 1/5
                  (5 missing values generated)
                  
                  . replace A=2 in 6/10
                  (5 real changes made)
                  
                  . label define A 1 "Federer supporters" 2 "Nadal supporters"
                  
                  . label val A A
                  
                  . sum
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                             A |         10         1.5    .5270463          1          2
                  
                  . tab A
                  
                                   A |      Freq.     Percent        Cum.
                  -------------------+-----------------------------------
                  Federer supporters |          5       50.00       50.00
                    Nadal supporters |          5       50.00      100.00
                  -------------------+-----------------------------------
                               Total |         10      100.00
                  
                  .
                  As you can see, -sum- produces totally uninformative results here.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Can you tell us exactly what your multinomial logistic regression command was?

                    Better yet, if you could copy the command and all the output from your Results window and paste it into a reply post, with the code delimiters [CODE] on the line before and [/CODE] on the line after , like the following example.

                    [CODE]
                    // sample code
                    sysuse auto, clear
                    describe
                    [/CODE]

                    so that the result will be presented in a readable font
                    Code:
                    // sample code
                    sysuse auto, clear
                    describe

                    Comment


                    • #11
                      Rohini:
                      another detail thatshould be interesting to share is the following one: how could you perform inference on those variables and have some problems with the descriptive statistics if they relate to the same dataset?
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Thank you sirs. Let me explain the context. I have two different data sets from two different periods. In both the periods i have the same variables, salary, technical education and occupation.Salary is the independent variable while the other two are the dependent variables. The first data set contains 236 observations and second data set has 680 observations.
                        Initially a chi square test is done to determine whether an association (or relationship) between 2 categorical variables Salary and Technical Education. The test is repeated for both sets of data to compare the results during both the periods. The following is the result.
                        Chi Square Test between Variables Salary and Technical Education
                        TECHEDU
                        SALARY Graduate Diploma PG Diploma No Tech Qualification Total
                        2011-12 2017-18 2011-12 2017-18 2011-12 2017-18 2011-12 2017-18 2011-12 2017-18
                        Below 20,000 11 66 12 35 19 8 45 176 87 285
                        Between 20,000 & 60000 40 165 14 20 36 35 40 147 130 367
                        Above 60000 10 13 1 0 4 3 4 12 19 28
                        Total 61 244 27 55 59 46 89 335 236 680

                        Null Hypothesis H0= Salary does not depend on technical education (Techedu is not associated with salary)
                        Alternate Hypthesis H1= Salary depends on technical education (Techedu is associated with salary)


                        Table:
                        SALARY
                        2011-12 2017-18
                        CHI SQ P CHIQ P
                        TECHEDU 21.9 0.001 60.76 0.000
                        OCCUPATION 47.18 0.000 34.44 0.000
                        Next a multinomial logistic regression is done for both data sets using the command below. Results summarised in table below.


                        mlogit SALARY ib4.TECHEDU i.OCCUPATION, base (1)

                        DEPENDENT VARIABLE: SALARY DEPENDENT VARIABLE: SALARY
                        2011-12 2017-18
                        INDEPENDENT VARIABLES BETWEEN 20000&60000 ABOVE 60000 BETWEEN 20000&60000 ABOVE 60000
                        TECHEDU Coef Std. Err Coef Std. Err Coef Std. Err Coef Std. Err
                        Graduate 1.000 0.438 2.225 0.751 0.991 0.192 1.178 0.465
                        Diploma -0.131 0.481 -0.51 1.2 -0.458 0.307 -14.096 676.17
                        PG Diploma 0.437 0.391 0.579 0.795 1.629 0.415 1.791 0.759
                        OCCUPATION
                        Professionals -0.845 0.594 -2.2 0.7998 -0.0938 0.270 -1.11 0.545
                        Associate Professionals -1.827 0.723 -2.679 1.261 -0.228 0.288 -1.487 0.626
                        Others -4.351 1.1751 -16.059 612.736 -1.404 0.405 -15.62401 789.525
                        CONSTANT 1.216 0.590 -0.151 0.804 0.0863 0.242 -1.596 0.436
                        No of Observations 236 680
                        LR Chi2 63.52 91.14
                        Log likelihood = -180.444 -517.925
                        Prob > chi2 0.000 0.000
                        Following this, relative risk ratio is calculated for both periods using the two different data sets. Results obtained is summarised as below.
                        DEPENDENT VARIABLE: SALARY DEPENDENT VARIABLE: SALARY
                        2011-12 2017-18
                        INDEPENDENT VARIABLES BETWEEN 20000&60000 ABOVE 60000 BETWEEN 20000&60000 ABOVE 60000
                        TECHEDU rrr Std. Err rrr Std. Err rrr Std. Err rrr Std. Err
                        Graduate 2.719 1.192 9.251 6.945 2.693 0.517 3.247 1.509
                        Diploma 0.877 0.422 0.600 0.720 0.633 0.194 0.000 0.000
                        PG Diploma 1.548 0.606 1.784 1.418 5.097 2.113 5.995 4.551
                        OCCUPATION
                        Professionals 0.429 0.255 0.111 0.089 0.910 0.246 0.330 0.180
                        Associate Professionals 0.161 0.116 0.069 0.865 0.796 0.229 0.226 0.142
                        Others 0.013 0.015 1.60E-07 0.000 0.246 0.099 0.000 0.000
                        CONSTANT 3.374 1.992 0.860 0.690 1.090 0.264 0.202 0.088
                        No of Observations 236 680
                        LR Chi2 63.52 91.14
                        Log likelihood = -180.444 -517.925
                        Prob > chi2 0.000 0.000

                        The overall effect of technical education(techedu) and occupation during both the periods is tested using the test commands: Results summarised as below.

                        test 1.TECHEDU 2.TECHEDU 3.TECHEDU
                        test 1.OCCUPATION 2.OCCUPATION 3.OCCUPATION
                        2011-12 2017-18
                        chi2 Pr >Chi2 chi2 Pr >Chi2
                        Techedu 13.48 0.036 46.18 0.000
                        Occupation 12.48 0.0135 6.47 0.167
                        Following this the predicted probabilities are calculated for both variables using the margins command. This is done for both data set to compare the results. Results summarised in table below.
                        2011-12 2017-18
                        BELOW 20000 BETWEEN 20000 & 60000 ABOVE 60000 BELOW 20000 BETWEEN 20000&60000 ABOVE 60000
                        TECHEDU Margin Std. Err Margin Std. Err Margin Std. Err Margin Std. Err Margin Std. Err Margin Std. Err
                        Graduate 0.239 0.968 0.687 2.780 0.074 3.740 0.293 0.380 0.687 0.890 0.021 1.268
                        Diploma 0.513 0.308 0.477 0.288 0.010 0.559 0.645 0.065 0.355 0.065 0.000 0.000
                        PG Diploma 0.370 0.454 0.607 0.739 0.022 1.184 0.179 0.270 0.797 1.171 0.023 1.433
                        No Techedu 0.478 0.423 0.506 0.447 0.016 0.862 0.528 0.381 0.460 0.333 0.011 0.712
                        OCCUPATION
                        Senior Managers 0.134 0.065 0.643 0.098 0.223 0.895 0.359 0.789 0.601 1.319 0.040 2.105
                        Professionals 0.308 0.037 0.635 0.039 0.057 0.020 0.391 0.308 0.595 0.469 0.014 0.775
                        Associate Professionals 0.530 0.112 0.409 0.110 0.061 0.059 0.424 0.251 0.565 0.333 0.011 0.579
                        Others 0.942 0.057 0.058 0.057 0.000 0.000 0.709 0.068 0.291 0.068 0.000 0.000
                        No of Observations 236 680
























                        Now, marginsplot command is used to get the graphs for the above results for both periods. Graphs not copied here.

                        So this is the sequence of operations I have done, please let me know if these don't make any sense or can i get any meaningful interpretations from these?

                        Thank you so much for helping me out.

                        Comment


                        • #13
                          Rohini:
                          I find your post a bit puzzling.
                          1) you're still mistaking dependent for independent variables (your dependent variable is -salary-);
                          2) if you have two datasets (and provided that are not related to the same sample size, otherwise you would have panel data), why not -append-ing them and perform -mlogit- on the resulting dataset, including -i.year- as a predictor in the right-hand side of your equation?
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Sir, thank you so much.

                            I am sorry, "Salary is the independent variable while the other two are the dependent variables" this statement was a typo. dependent variable is salary and independent variables occupation and technical education.

                            Regarding appending the two datasets and performing mlogit on the new data set i haven't thought about it that way since data set is huge with more than 140 variables. Would you be able to help me with the mlogit command for this.

                            Comment


                            • #15
                              Rohini:
                              I think you should -append- first, and then go -mlogit- on the resulting dataset.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X