Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a dummy variable with value from other variable to a new row (cell)

    Hi, my job here is to generate a new variable (dummy for months unemployed) for December. In my analysis, if a person is unemployed for 6 months, which is true in this case, then I will put 1 for him in December. In this example, the person is unemployed for 6 months (from Jan to Jun).
    Can anyone tell me how can I do this using stata command?
    I do appreciate your help.
    hhid month employment status months unemployed dummy for months unemployed
    1 January 1 1 0
    1 February 1 2 0
    1 March 1 3 0
    1 April 1 4 0
    1 May 1 5 0
    1 June 1 6 0
    1 July 2 0 0
    1 August 2 0 0
    1 September 2 0 0
    1 October 2 0 0
    1 November 2 0 0
    1 December 2 0 1
    employment status:
    1=unemployed
    2=employed
    Last edited by Tapas Paul; 01 Oct 2019, 21:53.

  • #2
    The above example data cannot have come from a Stata data set because the some of the column headers cannot be legal variable names in Stata. If you have not yet brought your data into Stata, it is premature to ask for help with code. If you have, then you should be using the -dataex- command to show your examples so that those who want to help you have an easily usable way to replicate your data. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    The code below will do what you ask.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte hhid str9 month byte(employmentstatus monthsunemployed)
    1 "January"   1 1
    1 "February"  1 2
    1 "March"     1 3
    1 "April"     1 4
    1 "May"       1 5
    1 "June"      1 6
    1 "July"      2 0
    1 "August"    2 0
    1 "September" 2 0
    1 "October"   2 0
    1 "November"  2 0
    1 "December"  2 0
    end
    
    by hhid, sort: egen wanted = max(monthsunemployed >= 6)
    replace wanted = 0 if month != "December"
    That said, it is based on a an assumption I have made based on the way your example data is presented. The code will only work correctly if you have exactly one year's worth of data for each household.

    I also wonder what you plan to do with this variable whose value is always 0 in all but the last month of the year. It is hard for me to envision how you will work with it and not get into trouble. Whatever you do with the variable the way you have created it, you will always be having to "find" the December value to use. Doing that will make coding all your commands more complicated; and if at some point you forget to do it, which is likely, your results will simply be wrong. When creating a variable that characterizes some overall attribute of a group of observations as a whole (e.g. a household observed monthly for a year) it is usually more convenient if that variable takes on the same value in each observation in the group (e.g. every month for the household). So consider omitting the last line of the code shown above.


    Comment


    • #3
      Thanks, Clyde Schechter for your suggestion. Let me take the opportunity to reframe my query here.

      The following is a snapshot from a big panel. I have five years and its monthly data. Data are arranged by person number. I want to create two dummy variables based on a person unemployment status. First, "unemp_6months1" is the unemployment duration for the first 6 months of the year. For example, in my following dataset, the person is unemployed for the first 6 months in 2009. This means I get 1 and I will keep this in December of 2009. Second, "unemp_6months2" is the unemployment for the second half of the year, July to December. If the person is unemployed for the second half of the year then I will put 1 in December for the second dummy variable "unemp_6months2". In the following example, the person is unemployed for the second half of 2008 and gets 1 in December of 2008.
      I need to keep the dummy variable records in the month of December of each year.
      Among other possibilities, it could be the case that the person is unemployed less than six months. In that case, the dummy variables will get 0 in the respective years of December value.
      I do appreciate your help.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte hhid int year str9 month byte(emp_status unemp_6months1 unemp_6months2)
      1 2008 "January"   1 0 0
      1 2008 "February"  1 0 0
      1 2008 "March"     1 0 0
      1 2008 "April"     1 0 0
      1 2008 "May"       1 0 0
      1 2008 "June"      1 0 0
      1 2008 "July"      2 0 0
      1 2008 "August"    2 0 0
      1 2008 "September" 2 0 0
      1 2008 "October"   2 0 0
      1 2008 "November"  2 0 0
      1 2008 "December"  2 0 1
      1 2009 "January"   2 0 0
      1 2009 "February"  2 0 0
      1 2009 "March"     2 0 0
      1 2009 "April"     2 0 0
      1 2009 "May"       2 0 0
      1 2009 "June"      2 0 0
      1 2009 "July"      1 0 0
      1 2009 "August"    1 0 0
      1 2009 "September" 1 0 0
      1 2009 "October"   1 0 0
      1 2009 "November"  1 0 0
      1 2009 "December"  1 1 0
      end
      emp_status (employment status):
      1=unemployed
      2=employed

      Comment


      • #4
        To do this you need some real Stata date variables: the string variable for month and separate variable for year are not useful. So the first part of the code is just to create usable variables for the time structure. The rest is a matter of applying standard -egen- functions to identify when somebody is unemployed for the entire half-year.
        Code:
        //  CREATE A HALF-YEAR VARIABLE
        gen my = month + string(year)
        gen mdate = monthly(my, "MY")
        format mdate %tm
        gen hy = halfyear(dofm(mdate))
        
        //  FOR EACH HALF-YEAR, MARK WHETHER THE PERSON HAS BEEN UNEMPLOYED
        //  FOR THE ENTIRE HALF-YEAR (1) OR NOT (0)
        by hhid year hy, sort: egen unemployed = min(emp_status == 1)
        
        //  CREATE FROM THAT THE VARIABLES UNEMP_6MONTHS1 AND UNEMP_6MONTHS2
        //  WHICH PROVIDE VALID INFORMATION ONOLY IN THE DECEMBER ENTRIES
        forvalues i = 1/2 {
            by hhid year, sort: egen unemp_6months`i' = max(cond(hy == `i', unemployed, .))
            replace unemp_6months`i' = 0 if month != "December"
        }
        sort hhid mdate
        Please do re-read my final paragraph in #2. I think you will find the variables you requested difficult to work with in Stata. The intermediate variable unemployed that is generated in the code will be much easier to manage. Anyway, I've given you what you asked for. You know what they say about being careful what you wish for. You've gotten it.

        Comment


        • #5
          Hi Clyde Schechter, this works perfectly for my analysis with some changes from your code. Again, I do appreciate your support.

          Comment

          Working...
          X