Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy Variables - New to Stata - Gender pay-gap policy

    Hello,

    I have recently started learning Stata and have been playing around with data sets. The observations in the data I am using come from the UK Labour Force Survey. Here are the key variables I am working with:

    GRSSWK – Gross weekly pay in the respondent’s main job, in pounds Sterling
    SEX – The respondent’s reported gender, with 1 = male, 2 = female, −8 = no answer and −9 =does not apply
    CONMON – The month the respondent started her current job
    CONMPY – The year the respondent started her current job

    I am trying to figure out whether the gender wage gap shrunk in Q2 2018 relative to before the regulation’s implementation? Using Q1 2018 as a comparison group.

    How do I create a dummy variable which will categorise each quarter of the year. I tried this:

    generate FirstQuarter = CONMON==January,Feburary,March,April
    Error - January not found

    I have also tried:

    generate FirstQuarter = 0
    replace FirstQuarter = 1 if CONMON="Janurary"

    Error - invalid syntax

    January is not a variable but a result. What would be the correct code?

    I have found that using numbers works such as:
    generate YEAR2017 = CONMPY==2017

    Apologise for the basic question and thank you for your time.




  • #2
    Orthodox English spellings are included in Stata:

    Code:
    . di "`c(Months)'"
    January February March April May June July August September October November December
    If CONMON is a string variable, then the first quarter is presumably January to March, so

    Code:
    gen FirstQuarter = inlist(CONMON, "January", "February", "March")
    would be neater. If you used different spellings, then amend as needed. What was missing in your code was (minimally) that something more like

    Code:
    generate FirstQuarter =  CONMON == "January"
    is needed where == is needed to test for equality, which is what your last line shows too.

    EDIT But that is all taking your questions rather literally. You will get better answers if you show an example of your data rather than letting us guess what it is like.
    Last edited by Nick Cox; 15 Jan 2019, 08:26.

    Comment


    • #3
      To Nick's answer let me add the following advice.

      I'm sympathetic to you as a new user of Stata - it's a lot to absorb.

      When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

      All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

      Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.



      Comment


      • #4
        Here is a section of my data set.

        CONMON / CONMPY / GRSSWK /SEX /log_wage/ male/ female/
        /February/ 2018/ 60/ Female/ 4.094345/ 0/ 1/
        /January/ 2018/ 83/ Male/ 4.41884/ 1 /0
        /June/ 2017 /117 /Female/ 4.762174/ 0/ 1
        /March /2018 /140 /Female /4.941642 /0 /1
        /January /2017 /166 /Female/ 5.111988/ 0/ 1

        Last edited by James Perowne; 15 Jan 2019, 10:29.

        Comment


        • #5
          Thank you Will and Nick for the quick reply. Your help and advice has been very beneficial. Hopefully I will improve with experience.

          Comment


          • #6
            I am going to take #4 literally because I have no reason to do otherwise. If slashes are separators, then your data are something like this.

            Note that the extra spaces in SEX could be troublesome except that you have the indicators too.

            You can convert your date information to a monthly date, after which you can extract the quarter.

            Code:
            clear
            input str8 CONMON int(CONMPY GRSSWK) str7 SEX double log_wage byte(male female)
            "February" 2018  60 " Female" 4.094345 0 1
            "January"  2018  83 " Male"    4.41884 1 0
            "June"     2017 117 "Female"  4.762174 0 1
            "March "   2018 140 "Female " 4.941642 0 1
            "January " 2017 166 "Female"  5.111988 0 1
            end
            
            . gen mdate = monthly(string(CONMPY) + " " + CONMON, "YM")
            
            . format mdate %tm
            
            . gen quarter = quarter(dofm(mdate))
            
            . list
            
                 +------------------------------------------------------------------------------------+
                 |   CONMON   CONMPY   GRSSWK       SEX   log_wage   male   female    mdate   quarter |
                 |------------------------------------------------------------------------------------|
              1. | February     2018       60    Female   4.094345      0        1   2018m2         1 |
              2. |  January     2018       83      Male    4.41884      1        0   2018m1         1 |
              3. |     June     2017      117    Female   4.762174      0        1   2017m6         2 |
              4. |   March      2018      140   Female    4.941642      0        1   2018m3         1 |
              5. | January      2017      166    Female   5.111988      0        1   2017m1         1 |
                 +------------------------------------------------------------------------------------+

            Comment


            • #7
              Actually, my Subscription to this topic let me see post #4 looked like when it went up, before it was edited to include the slashes. Starting from that I have
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str8 CONMON int(CONMPY GRSSWK) str6 SEX float log_wage byte(male female)
              "February" 2018  60 "Female" 4.094345 0 1
              "January"  2018  83 "Male"    4.41884 1 0
              "June"     2017 117 "Female" 4.762174 0 1
              "March"    2018 140 "Female" 4.941642 0 1
              "January"  2017 166 "Female" 5.111988 0 1
              end
              created with dataex. There remains the problem that we don't know if CONMON or SEX, shown as character strings, were indeed string variables, or if they were numeric variables with value labels assigned. If CONMON is indeed numeric, the code for mdate in post #6 will fail with a "type mismatch" error,

              To improve your presentation of your problems on Statalist, take a moment to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ. The dataex command includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays. It also makes it possible for those, like Nick, who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

              The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

              Comment


              • #8
                Hi guys,
                I have taken your advice on board and when attempting to create a new variable the error type mismatch occurs. Could you elaborate this section further:

                Code:
                  
                 gen mdate = monthly(string(CONMPY) + " " + CONMON, "YM")  format mdate %tm  gen quarter = quarter(dofm(mdate))
                To simplify my question could you explain how to convert the CONMON results into numerical values, such as 01 for "January". On data browse the values for CONMON are in blue in comparison to the rest of the results which are in black. Here is a small section of the data. I am however dealing with a large data set which contains 2000 different individuals and therefore not sure if there is a command for this.
                Code:
                 CONMON    CONMPY
                February    2018
                January    2018
                June    2017
                March    2018
                January    2017
                February    2017
                Thank you guys for the help, I can understand it must be difficult trying to comprehend my question but I am not familiar with all of the jargon you use.

                ​​​​​​​


                Comment


                • #9
                  Run the following command.
                  Code:
                  codebook CONMON
                  and read the output of help codebook to understand what the output is telling you about CONMON, Then run
                  Code:
                  tabulate CONMON
                  tabulate CONMON, nolabel
                  and read the output of help label to learn about value labels.

                  As I suggested in post #7, CONMON is not a string variable, CONMON is a numeric variable with value labels assigned so that Stata can display the name of the month corresponding to the number. Here is an example using one of Stata's example datasets.
                  Code:
                  . sysuse auto, clear
                  (1978 Automobile Data)
                  
                  . codebook foreign
                  
                  ------------------------------------------------------------------------------------------------
                  foreign                                                                                 Car type
                  ------------------------------------------------------------------------------------------------
                  
                                    type:  numeric (byte)
                                   label:  origin
                  
                                   range:  [0,1]                        units:  1
                           unique values:  2                        missing .:  0/74
                  
                              tabulation:  Freq.   Numeric  Label
                                              52         0  Domestic
                                              22         1  Foreign
                  
                  . tabulate foreign
                  
                     Car type |      Freq.     Percent        Cum.
                  ------------+-----------------------------------
                     Domestic |         52       70.27       70.27
                      Foreign |         22       29.73      100.00
                  ------------+-----------------------------------
                        Total |         74      100.00
                  
                  . tabulate foreign, nolabel
                  
                     Car type |      Freq.     Percent        Cum.
                  ------------+-----------------------------------
                            0 |         52       70.27       70.27
                            1 |         22       29.73      100.00
                  ------------+-----------------------------------
                        Total |         74      100.00
                  
                  . label list origin
                  origin:
                             0 Domestic
                             1 Foreign
                  This would have been readily apparent had you prepared your example data in post #8 using the dataex command as recommended by the Statalist FAQ you were referred to in post #7.

                  I am not familiar with all of the jargon you use
                  It is an error to think of Stata terms as "jargon" in the English language. Stata's language is a programming language, albeit based on English, used to control Stata's operation. Expecting to understand what Stata terms mean from a knowledge of the English language will not in general get you far.

                  It is your task to learn Stata's language, and one way to do this is to follow the advice I gave in post #3. Familiarize yourself with the basics and learn how to use the online help facilities.

                  Comment


                  • #10
                    Thank you.

                    Comment

                    Working...
                    X