Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Numerical variable treated as string

    Dear Users, I need your help for something that I guess would be easy for you, but really puzzling to me.

    I need to generate a var (NewVar) that contains for each category of an already existent variable (VarA) the mean values of another variable (VarB).
    What I did is the following:

    bys VarA: egen NewVar =mean((VarB)& !mi(VarB))
    collapse NewVar, by( VarA)

    This seems to work,and I obtain a dataset collapsed with two variables: NewVar (type: float) e VarA (type: int).

    The problem comes when I do:

    sort NewVar
    edit

    I do this because I want to create a new variable that ranks VarA by values of NewVar (1 to the category of VarA with the lowest value on NewVar, etc etc), and so I need to select the edit result and paste it on excel (to avoid me tyiping hundreds of times the code of varA and the correspondent position in the ranking).

    When I past the selection on facebook, the name of VarA appears as its 'label' (they are professions with associated each an Isco value), and not as correspondent numerical Isco, so I can not proceed with creating the ranking variable.

    Is it something related to the format type of the variable? I had a look to the faqby Cox, but this not seems the case: the command -destring- never works (Stata says that the variable is already numerical), and also forcing it to change type by -recast- does not change anything.
    Could you advise me something? Thanks in advance!

    Here some info on the variables
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int isco08h float isco08h_edu
      0  .25263157
     11   .5140244
     21          0
     31   .0391791
    111   .6024479
    112   .3796181
    121  .37253875
    122   .5175528
    131   .2173913
    132   .4659853
    133   .3128981
    134   .4913536
    141  .17859188
    142  .18693693
    143  .21981004
    211    .735843
    212   .7504409
    213   .7078636
    214   .6882623
    215   .4916793
    216   .3910538
    221   .8398227
    222   .7427094
    223   .3235294
    225   .5972222
    226  .57970506
    231   .7800233
    232   .4326591
    233   .7711642
    234   .7585078
    235   .6598737
    241   .6072804
    242  .47872865
    243   .3547469
    251  .54953396
    252    .295091
    261   .7798126
    262   .7251799
    263   .7427453
    264   .3988573
    265   .5344768
    311   .3306946
    312     .15362
    313   .1823318
    314   .7781418
    315  .52060896
    321   .4920245
    322   .9804015
    323   .1734694
    324   .1799629
    325    .608609
    331   .3630693
    332   .2635719
    333   .2845085
    334  .31903395
    335  .53774136
    341   .4953917
    342  .27296668
    343   .3530816
    351   .3044389
    352   .3272096
    411  .15577087
    412   .1975213
    413  .25170067
    421   .3708681
    422   .2588185
    431  .25247103
    432  .27215442
    441  .29308143
    511  .25473145
    512  .13747428
    513  .27876723
    514  .12759966
    515  .17792097
    516   .2395429
    521   .2736954
    522   .2588691
    523  .24619457
    524   .2107879
    531  .12883943
    532  .20065747
    541  .29246205
    611  .28790832
    612  .27000618
    613  .28796017
    621  .28708947
    622    .311973
    711  .19623983
    712    .194553
    713  .23169906
    721  .20255394
    722    .215664
    723   .2038805
    731   .2725832
    732   .3102977
    741 .066343375
    742  .12451716
    751  .29407528
    752   .2710541
    753  .27651244
    end
    label values isco08h isco3d_VL
    label def isco3d_VL 0 "ARMED FORCES OCCUPATIONS", modify
    label def isco3d_VL 11 "Commissioned armed forces officers", modify
    label def isco3d_VL 21 "Non-commissioned armed forces officers", modify
    label def isco3d_VL 31 "Armed forces occupations, other ranks", modify
    label def isco3d_VL 111 "Legislators and senior officials", modify
    label def isco3d_VL 112 "Managing directors and chief executives", modify
    label def isco3d_VL 121 "Business services and administration managers", modify
    label def isco3d_VL 122 "Sales, marketing and development managers", modify
    label def isco3d_VL 131 "Production managers in agriculture, forestry and fisheries", modify
    label def isco3d_VL 132 "Manufacturing, mining, construction, and distribution managers", modify
    label def isco3d_VL 133 "Information and communications technology service managers", modify
    label def isco3d_VL 134 "Professional services managers", modify
    label def isco3d_VL 141 "Hotel and restaurant managers", modify
    label def isco3d_VL 142 "Retail and wholesale trade managers", modify
    label def isco3d_VL 143 "Other services managers", modify
    label def isco3d_VL 211 "Physical and earth science professionals", modify
    label def isco3d_VL 212 "Mathematicians, actuaries and statisticians", modify
    label def isco3d_VL 213 "Life science professionals", modify
    label def isco3d_VL 214 "Engineering professionals (excluding electrotechnology)", modify
    label def isco3d_VL 215 "Electrotechnology engineers", modify
    label def isco3d_VL 216 "Architects, planners, surveyors and designers", modify
    label def isco3d_VL 221 "Medical doctors", modify
    label def isco3d_VL 222 "Nursing and midwifery professionals", modify
    label def isco3d_VL 223 "Traditional and complementary medicine professionals", modify
    label def isco3d_VL 225 "Veterinarians", modify
    label def isco3d_VL 226 "Other health professionals", modify
    label def isco3d_VL 231 "University and higher education teachers", modify
    label def isco3d_VL 232 "Vocational education teachers", modify
    label def isco3d_VL 233 "Secondary education teachers", modify
    label def isco3d_VL 234 "Primary school and early childhood teachers", modify
    label def isco3d_VL 235 "Other teaching professionals", modify
    label def isco3d_VL 241 "Finance professionals", modify
    label def isco3d_VL 242 "Administration professionals", modify
    label def isco3d_VL 243 "Sales, marketing and public relations professionals", modify
    label def isco3d_VL 251 "Software and applications developers and analysts", modify
    label def isco3d_VL 252 "Database and network professionals", modify
    label def isco3d_VL 261 "Legal professionals", modify
    label def isco3d_VL 262 "Librarians, archivists and curators", modify
    label def isco3d_VL 263 "Social and religious professionals", modify
    label def isco3d_VL 264 "Authors, journalists and linguists", modify
    label def isco3d_VL 265 "Creative and performing artists", modify
    label def isco3d_VL 311 "Physical and engineering science technicians", modify
    label def isco3d_VL 312 "Mining, manufacturing and construction supervisors", modify
    label def isco3d_VL 313 "Process control technicians", modify
    label def isco3d_VL 314 "Life science technicians and related associate professionals", modify
    label def isco3d_VL 315 "Ship and aircraft controllers and technicians", modify
    label def isco3d_VL 321 "Medical and pharmaceutical technicians", modify
    label def isco3d_VL 322 "Nursing and midwifery associate professionals", modify
    label def isco3d_VL 323 "Traditional and complementary medicine associate professionals", modify
    label def isco3d_VL 324 "Veterinary technicians and assistants", modify
    label def isco3d_VL 325 "Other health associate professionals", modify
    label def isco3d_VL 331 "Financial and mathematical associate professionals", modify
    label def isco3d_VL 332 "Sales and purchasing agents and brokers", modify
    label def isco3d_VL 333 "Business services agents", modify
    label def isco3d_VL 334 "Administrative and specialised secretaries", modify
    label def isco3d_VL 335 "Regulatory government associate professionals", modify
    label def isco3d_VL 341 "Legal, social and religious associate professionals", modify
    label def isco3d_VL 342 "Sports and fitness workers", modify
    label def isco3d_VL 343 "Artistic, cultural and culinary associate professionals", modify
    label def isco3d_VL 351 "Information and communications technology operations and user support technicians", modify
    label def isco3d_VL 352 "Telecommunications and broadcasting technicians", modify
    label def isco3d_VL 411 "General office clerks", modify
    label def isco3d_VL 412 "Secretaries (general)", modify
    label def isco3d_VL 413 "Keyboard operators", modify
    label def isco3d_VL 421 "Tellers, money collectors and related clerks", modify
    label def isco3d_VL 422 "Client information workers", modify
    label def isco3d_VL 431 "Numerical clerks", modify
    label def isco3d_VL 432 "Material-recording and transport clerks", modify
    label def isco3d_VL 441 "Other clerical support workers", modify
    label def isco3d_VL 511 "Travel attendants, conductors and guides", modify
    label def isco3d_VL 512 "Cooks", modify
    label def isco3d_VL 513 "Waiters and bartenders", modify
    label def isco3d_VL 514 "Hairdressers, beauticians and related workers", modify
    label def isco3d_VL 515 "Building and housekeeping supervisors", modify
    label def isco3d_VL 516 "Other personal services workers", modify
    label def isco3d_VL 521 "Street and market salespersons", modify
    label def isco3d_VL 522 "Shop salespersons", modify
    label def isco3d_VL 523 "Cashiers and ticket clerks", modify
    label def isco3d_VL 524 "Other sales workers", modify
    label def isco3d_VL 531 "Child care workers and teachers' aides", modify
    label def isco3d_VL 532 "Personal care workers in health services", modify
    label def isco3d_VL 541 "Protective services workers", modify
    label def isco3d_VL 611 "Market gardeners and crop growers", modify
    label def isco3d_VL 612 "Animal producers", modify
    label def isco3d_VL 613 "Mixed crop and animal producers", modify
    label def isco3d_VL 621 "Forestry and related workers", modify
    label def isco3d_VL 622 "Fishery workers, hunters and trappers", modify
    label def isco3d_VL 711 "Building frame and related trades workers", modify
    label def isco3d_VL 712 "Building finishers and related trades workers", modify
    label def isco3d_VL 713 "Painters, building structure cleaners and related trades workers", modify
    label def isco3d_VL 721 "Sheet and structural metal workers, moulders and welders, and related workers", modify
    label def isco3d_VL 722 "Blacksmiths, toolmakers and related trades workers", modify
    label def isco3d_VL 723 "Machinery mechanics and repairers", modify
    label def isco3d_VL 731 "Handicraft workers", modify
    label def isco3d_VL 732 "Printing trades workers", modify
    label def isco3d_VL 741 "Electrical equipment installers and repairers", modify
    label def isco3d_VL 742 "Electronics and telecommunications installers and repairers", modify
    label def isco3d_VL 751 "Food processing and related trades workers", modify
    label def isco3d_VL 752 "Wood treaters, cabinet-makers and related trades workers", modify
    label def isco3d_VL 753 "Garment and related trades workers", modify

  • #2
    Are you copying and pasting from the data editor? That is a very bad idea. What is presented in the data-editor is necessarily a simplification of your data. You have run into one aspect, which is that it by default displays the value labels instead of the underlying numeric code. But there are other issues, having to do with precision and reproducibility of your research. So, don't do that! Instead, you can export your dataset directly to Excel format. For more type in Stata help export.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Marteen,
      thanks for your answer.

      Actually I do not get why paste-copying from data editor it is a bad idea, and what does it mean that
      What is presented in the data-editor is necessarily a simplification of your data
      In this case, I have two variables, one with name of jobs (and Isco code associated to them, that it is the value I would like to obtain) and the other with a numerical value.
      When i copy-paste, I don't think I am loosing any relevant information to me. Could you better explain what you mean?

      Also, I tried the command export, but this does not solve the problem. It basically creates an excel file looking exactly at the data-editor, with the job variable on one column (with its labels, and not numerical values as I wish) and the mean value of NewVar on the other.

      What do you suggest to show the Isco codes instead of their labels?

      Thanks a lot, G.

      Comment


      • #4
        Your problem is that by default Stata exports value labels instead of the values. To solve that just add the nolabel option to export excel.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Dear Maarten,
          thanks, this worked perfectly., many thanks.

          Anyway, I would like to know something more on your advise about not copy-pasting from data-editor, it what cases could this be harmful?

          Thanks a lot again, best, G.

          Comment


          • #6
            Note that this may not work in the way you really want:

            Code:
            bys VarA: egen NewVar =mean((VarB)& !mi(VarB))
            The egen context is not crucial. What is the mean of the expression here?

            Code:
            . clear 
            
            . set obs 10 
            number of observations (_N) was 0, now 10
            
            . gen y = cond(_n <= 7, _n, .) 
            (3 missing values generated)
            
            . gen y2 = y & !missing(y) 
            
            . list 
            
                 +--------+
                 | y   y2 |
                 |--------|
              1. | 1    1 |
              2. | 2    1 |
              3. | 3    1 |
              4. | 4    1 |
              5. | 5    1 |
                 |--------|
              6. | 6    1 |
              7. | 7    1 |
              8. | .    0 |
              9. | .    0 |
             10. | .    0 |
                 +--------+
            
            . su y2 
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                      y2 |         10          .7    .4830459          0          1
            It is the number of non-missing values. Is that what you want? Note that the egen function count() does this for you directly.



            Comment


            • #7
              Actually I do not get why paste-copying from data editor it is a bad idea,
              for the most part because it is not [automatically] reproducible and hence opens up for human errors (selected not everything or too much, pasted in the wrong place, copied but forgot to paste, etc). Stata provides a lot of tools for automatic reports creation and output to Excel/Word and other document types.

              So, avoid manual copy-paste as much as possible both on input and output operations.

              Comment


              • #8
                Dear all,
                thank you for your feedback.

                The egen context is not crucial. What is the mean of the expression here?
                Well, actually this command seems working to me. I obtain the variable that I want (a new variable that has for each category of VarA the mean value of VarB).

                What it is still strange is the second step. Indeed, I have to create this new variable for two waves of my dataset, ranking it and calculating the Spearman coefficient.
                Thus, I did:

                Code:
                preserve
                drop if refyear != 1998
                bys VarA : egen NewVar=mean((VarB)& !mi(VarB))
                collapse NewVar, by( VarA)
                sort NewVar
                save s98, replace
                restore
                
                preserve
                drop if refyear != 2013
                bys VarA: egen NewVar=mean((VarB)& !mi(VarB))
                collapse NewVar, by( VarA)
                sort NewVar
                save s13, replace
                restore
                
                use "s98.dta", clear
                merge 1:1 VarA using s13
                spearman NewVar_98 NewVar_13
                The problem is that the Spearman is totally out-of-range, and this is due to the fact that in the dataset s98 NewVar has a lot of high value, as following:
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int isco08h float isco08h_edu_98
                 .c .8474885
                921 .9230769
                211 .9230769
                343 .9565217
                933 .9642857
                112 .9642857
                621 .9666666
                962  .968254
                932 .9722222
                751 .9772727
                end
                label values isco08h isco3d_VL
                label def isco3d_VL 112 "Managing directors and chief executives", modify
                label def isco3d_VL 211 "Physical and earth science professionals", modify
                label def isco3d_VL 343 "Artistic, cultural and culinary associate professionals", modify
                label def isco3d_VL 621 "Forestry and related workers", modify
                label def isco3d_VL 751 "Food processing and related trades workers", modify
                label def isco3d_VL 921 "Agricultural, forestry and fishery labourers", modify
                label def isco3d_VL 932 "Manufacturing labourers", modify
                label def isco3d_VL 933 "Transport and storage labourers", modify
                label def isco3d_VL 962 "Other elementary workers", modify
                label def isco3d_VL .c "Not available", modify
                This is fictious, because by coding
                Code:
                mean VarB if refyear == 1998, over (VarA)
                the values for each category of VarA are totally different (the correct ones).
                For 2013 this does not seems to happen.
                Do you have any clue on why?

                Thanks a lot, G.

                Comment


                • #9
                  My post in #6 needs correction. The mean returned is the fraction of non-missing values. It is not, in general, the mean of the variable specified.

                  Hence I am not surprised that later results based on that calculation make no sense to you.

                  Comment


                  • #10
                    To echo Nick Cox's excellent point, it seems that your -egen- code is not doing what you want it to do. Your -egen- code is taking the mean of a bunch of 0's and 1's because your expression "(VarB) & !mi(VarB)" will evaluate to either 0 or 1, not the value of VarB if VarB is not missing.

                    With -egen mean-, you do not need to indicate not missing as the function automatically excludes missing values from the mean calculation. Indeed, if you meant to say "find the mean of VarB for each VarA group, but leave out any missing values of VarB," the code should look something more like the following:
                    Code:
                    bys VarA: egen NewVar=mean(VarB) if !mi(VarB)
                    But the above code is no different from just saying [EDIT: See Nick's post #11, directly below. This statement by me is incorrect.]
                    Code:
                    bys VarA: egen NewVar=mean(VarB)
                    I'm also wondering why you don't skip the whole -egen- line and just go straight to -collapse-? You could accomplish your task (including the drop line) by using
                    Code:
                    collapse VarB if refyear==1998, by(VarA)
                    rename VarB NewVar
                    Last edited by Roger Chu; 24 Feb 2017, 11:27.

                    Comment


                    • #11
                      This thread seems to contain several quite different questions, as the data in #1 has nothing to do with (e.g.) the question in #8. Nor does the code in #8 relate to the data in #8.

                      What a good question contains is ideally all of

                      1. A clear and concise statement of the goal.

                      2. A self-contained data example.

                      3. Code so far written that relates to the data example.

                      Just looking at this code

                      Code:
                      preserve
                      drop if refyear != 1998
                      bys VarA : egen NewVar=mean((VarB)& !mi(VarB))
                      collapse NewVar, by( VarA)
                      sort NewVar
                      save s98, replace
                      restore  
                      
                      preserve
                      drop if refyear != 2013
                      bys VarA: egen NewVar=mean((VarB)& !mi(VarB))
                      collapse NewVar, by( VarA)
                      sort NewVar
                      save s13, replace
                      restore  
                      
                      use "s98.dta", clear
                      merge 1:1 VarA using s13
                      spearman NewVar_98 NewVar_13
                      the intent seems to be to achieve what this would do (building on Roger's key suggestion that collapse can be applied directly):

                      Code:
                      keep if inlist(refyear, 1998, 2013)
                      collapse varB, by(varA refyear)
                      reshape wide varB, i(varA) j(refyear)
                      spearman varB1998 varB2013
                      but I can't be totally confident of that as 1, 2 and 3 are all missing.

                      Note. Roger is not quite right that

                      Code:
                       bys VarA: egen NewVar=mean(VarB) if !mi(VarB)  
                       bys VarA: egen NewVar=mean(VarB)
                      are equivalent as in the first case there will be missings whenever VarB is missing but not in the second.
                      Last edited by Nick Cox; 24 Feb 2017, 11:38.

                      Comment


                      • #12
                        Yes, Nick is correct that I was incorrect. I often forget about egen returning missing with if conditions...

                        Comment

                        Working...
                        X