Dummy Variables - New to Stata - Gender pay-gap policy

James Perowne

Join Date: Jan 2019

Posts: 5
#1

Dummy Variables - New to Stata - Gender pay-gap policy

15 Jan 2019, 07:42

Hello,

I have recently started learning Stata and have been playing around with data sets. The observations in the data I am using come from the UK Labour Force Survey. Here are the key variables I am working with:

GRSSWK – Gross weekly pay in the respondent’s main job, in pounds Sterling
SEX – The respondent’s reported gender, with 1 = male, 2 = female, −8 = no answer and −9 =does not apply
CONMON – The month the respondent started her current job
CONMPY – The year the respondent started her current job

I am trying to figure out whether the gender wage gap shrunk in Q2 2018 relative to before the regulation’s implementation? Using Q1 2018 as a comparison group.

How do I create a dummy variable which will categorise each quarter of the year. I tried this:

generate FirstQuarter = CONMON==January,Feburary,March,April
Error - January not found

I have also tried:

generate FirstQuarter = 0
replace FirstQuarter = 1 if CONMON="Janurary"

Error - invalid syntax

January is not a variable but a result. What would be the correct code?

I have found that using numbers works such as:
generate YEAR2017 = CONMPY==2017

Apologise for the basic question and thank you for your time.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

15 Jan 2019, 07:53

Orthodox English spellings are included in Stata:

Code:

. di "`c(Months)'" January February March April May June July August September October November December

If CONMON is a string variable, then the first quarter is presumably January to March, so

Code:

gen FirstQuarter = inlist(CONMON, "January", "February", "March")

would be neater. If you used different spellings, then amend as needed. What was missing in your code was (minimally) that something more like

Code:

generate FirstQuarter = CONMON == "January"

is needed where == is needed to test for equality, which is what your last line shows too.

EDIT But that is all taking your questions rather literally. You will get better answers if you show an example of your data rather than letting us guess what it is like.

Last edited by Nick Cox; 15 Jan 2019, 08:26.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

15 Jan 2019, 09:41

To Nick's answer let me add the following advice.

I'm sympathetic to you as a new user of Stata - it's a lot to absorb.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.
Comment
James Perowne

Join Date: Jan 2019

Posts: 5
#4

15 Jan 2019, 10:20

Here is a section of my data set.

CONMON / CONMPY / GRSSWK /SEX /log_wage/ male/ female/
/February/ 2018/ 60/ Female/ 4.094345/ 0/ 1/
/January/ 2018/ 83/ Male/ 4.41884/ 1 /0
/June/ 2017 /117 /Female/ 4.762174/ 0/ 1
/March /2018 /140 /Female /4.941642 /0 /1
/January /2017 /166 /Female/ 5.111988/ 0/ 1

Last edited by James Perowne; 15 Jan 2019, 10:29.
Comment
James Perowne

Join Date: Jan 2019

Posts: 5
#5

15 Jan 2019, 10:31

Thank you Will and Nick for the quick reply. Your help and advice has been very beneficial. Hopefully I will improve with experience.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

15 Jan 2019, 11:06

I am going to take #4 literally because I have no reason to do otherwise. If slashes are separators, then your data are something like this.

Note that the extra spaces in SEX could be troublesome except that you have the indicators too.

You can convert your date information to a monthly date, after which you can extract the quarter.

Code:

clear
input str8 CONMON int(CONMPY GRSSWK) str7 SEX double log_wage byte(male female)
"February" 2018  60 " Female" 4.094345 0 1
"January"  2018  83 " Male"    4.41884 1 0
"June"     2017 117 "Female"  4.762174 0 1
"March "   2018 140 "Female " 4.941642 0 1
"January " 2017 166 "Female"  5.111988 0 1
end

. gen mdate = monthly(string(CONMPY) + " " + CONMON, "YM")

. format mdate %tm

. gen quarter = quarter(dofm(mdate))

. list

     +------------------------------------------------------------------------------------+
     |   CONMON   CONMPY   GRSSWK       SEX   log_wage   male   female    mdate   quarter |
     |------------------------------------------------------------------------------------|
  1. | February     2018       60    Female   4.094345      0        1   2018m2         1 |
  2. |  January     2018       83      Male    4.41884      1        0   2018m1         1 |
  3. |     June     2017      117    Female   4.762174      0        1   2017m6         2 |
  4. |   March      2018      140   Female    4.941642      0        1   2018m3         1 |
  5. | January      2017      166    Female   5.111988      0        1   2017m1         1 |
     +------------------------------------------------------------------------------------+

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

15 Jan 2019, 12:16

Actually, my Subscription to this topic let me see post #4 looked like when it went up, before it was edited to include the slashes. Starting from that I have

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str8 CONMON int(CONMPY GRSSWK) str6 SEX float log_wage byte(male female) "February" 2018 60 "Female" 4.094345 0 1 "January" 2018 83 "Male" 4.41884 1 0 "June" 2017 117 "Female" 4.762174 0 1 "March" 2018 140 "Female" 4.941642 0 1 "January" 2017 166 "Female" 5.111988 0 1 end

created with dataex. There remains the problem that we don't know if CONMON or SEX, shown as character strings, were indeed string variables, or if they were numeric variables with value labels assigned. If CONMON is indeed numeric, the code for mdate in post #6 will fail with a "type mismatch" error,

To improve your presentation of your problems on Statalist, take a moment to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ. The dataex command includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays. It also makes it possible for those, like Nick, who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
Comment
James Perowne

Join Date: Jan 2019

Posts: 5
#8

16 Jan 2019, 06:29

Hi guys,
I have taken your advice on board and when attempting to create a new variable the error type mismatch occurs. Could you elaborate this section further:

Code:

gen mdate = monthly(string(CONMPY) + " " + CONMON, "YM") format mdate %tm gen quarter = quarter(dofm(mdate))

To simplify my question could you explain how to convert the CONMON results into numerical values, such as 01 for "January". On data browse the values for CONMON are in blue in comparison to the rest of the results which are in black. Here is a small section of the data. I am however dealing with a large data set which contains 2000 different individuals and therefore not sure if there is a command for this.

Code:

CONMON CONMPY February 2018 January 2018 June 2017 March 2018 January 2017 February 2017

Thank you guys for the help, I can understand it must be difficult trying to comprehend my question but I am not familiar with all of the jargon you use.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

16 Jan 2019, 11:12

Run the following command.

Code:

codebook CONMON

and read the output of help codebook to understand what the output is telling you about CONMON, Then run

Code:

tabulate CONMON
tabulate CONMON, nolabel

and read the output of help label to learn about value labels.

As I suggested in post #7, CONMON is not a string variable, CONMON is a numeric variable with value labels assigned so that Stata can display the name of the month corresponding to the number. Here is an example using one of Stata's example datasets.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. codebook foreign

------------------------------------------------------------------------------------------------
foreign                                                                                 Car type
------------------------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  origin

                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  0/74

            tabulation:  Freq.   Numeric  Label
                            52         0  Domestic
                            22         1  Foreign

. tabulate foreign

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

. tabulate foreign, nolabel

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         52       70.27       70.27
          1 |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

. label list origin
origin:
           0 Domestic
           1 Foreign

This would have been readily apparent had you prepared your example data in post #8 using the dataex command as recommended by the Statalist FAQ you were referred to in post #7.

I am not familiar with all of the jargon you use

It is an error to think of Stata terms as "jargon" in the English language. Stata's language is a programming language, albeit based on English, used to control Stata's operation. Expecting to understand what Stata terms mean from a knowledge of the English language will not in general get you far.

It is your task to learn Stata's language, and one way to do this is to follow the advice I gave in post #3. Familiarize yourself with the basics and learn how to use the online help facilities.

Comment

James Perowne

Join Date: Jan 2019

Posts: 5
#10

17 Jan 2019, 06:43

Thank you.
Comment

Announcement