Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining separate month and year numeric variables

    I am working with a dataset on prisoners in California, where prison entry and exit dates are recorded in four separate variables as in the data example:


    ```
    input byte prison1_start_month int prison1_start_yr byte prison1_end_month int prison1_end_yr
    11 2002 5 2005
    9 1990 2 2008
    3 2014 3 2015
    1 1985 12 1998
    2 1982 3 2005
    1 2015 5 2016
    1 2000 1 2010
    5 1960 9 1989
    8 1965 4 1988
    8 1981 5 2001
    4 1982 2 1989
    7 1976 11 1998
    7 1997 6 2007
    5 1960 6 1995
    2 1970 12 1983
    2 1985 4 1993
    8 1989 5 2016
    6 1970 9 1995
    2 1993 5 1996
    2 1963 11 1968
    1 1983 6 1986
    5 1972 11 1983
    1 1981 1 1991
    5 1958 3 1981
    ```

    I am trying to combine separately recorded month and year variables into one variable for an individual's entry date (i.e. combining values from prison1_start_month and int prison1_start_yr). Similarly, I would need to do that for a prisoner's exit date (i.e. combining values from prison1_end_month and prison1_end_yr).


    I am essentially looking to create a duration in prison variable, so I need the date variables to allow to calculate something as follows:

    ```
    gen = prison_duration = (prison1_end - prison1_start)
    ```

    I have tried a code here: https://www.statalist.org/forums/for...rate-variables

    As well as instructions help datetime, where I wrote:
    ```
    gen prison1_start = ym(prison1_start_yr, prison1_start_month)
    format prison1_start %td
    ```

    While the code is working, I am not getting the output that I am looking for:
    ```
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float prison1_start
    514
    368
    650
    300
    265
    660
    480
    4
    67
    ```

  • #2
    While the code is working, I am not getting the output that I am looking for:
    Yes you are getting the output you need, sort of. Those numbers, 514, 368, 650, etc. are precisely the Stata numeric values corresponding to November 2002, September 1990, March 2014, etc. Your mistake is in the command -format prison1_start %td-. You created prison1_start as a monthly Stata date variable, so it is not compatible with the %td format. You should have it as %tm. Then when you -list- or -browse- your data it will look right to your eyes. (That is just an esthetic issue, however, as what matters for calculations is the numbers themselves, which do not change when you change the display format.)

    Note: When you run -dataex-, all numeric variables, including dates, are listed out as raw numbers, and the format commands are then given at the end of the -dataex- output. So no matter what you do, in -dataex- you will still see 514, 368, 650 etc. But in -list- or -browse- or -tab- you will see things that look like years and months, although the underlying data in Stata is still just 514, 368, 650, etc.

    Comment

    Working...
    X