Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Redesigning dataset with categorical variable

    Dear Statalist,

    I need a help with redesigning a dataset. It contains information on validity of coefficient types (variable coef_type, values 1-6) during different periods (year) in different towns (id). For instance, in a particular town, there could be different coefficient types (e.g. 1, 3, 5 - not necessarily all) valid during different years.

    Data excerpt for 2 towns:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float year str6 id float coef_type
    2011 "500011" 1
    2012 "500011" 1
    2013 "500011" 1
    2014 "500011" 1
    2015 "500011" 1
    2016 "500011" 1
    2017 "500011" 1
    2018 "500011" 1
    2011 "500011" 2
    2012 "500011" 2
    2013 "500011" 2
    2014 "500011" 2
    2015 "500011" 2
    2016 "500011" 2
    2017 "500011" 2
    2018 "500011" 2
    2011 "500011" 6
    2012 "500011" 6
    2013 "500011" 6
    2014 "500011" 6
    2015 "500011" 6
    2016 "500011" 6
    2017 "500011" 6
    2018 "500011" 6
    1994 "500054" 1
    1995 "500054" 1
    1996 "500054" 1
    1997 "500054" 1
    1998 "500054" 1
    1999 "500054" 1
    2000 "500054" 1
    2001 "500054" 1
    2002 "500054" 1
    2003 "500054" 1
    2004 "500054" 1
    2005 "500054" 1
    2006 "500054" 1
    2007 "500054" 1
    2008 "500054" 1
    2009 "500054" 1
    2010 "500054" 1
    2011 "500054" 1
    2012 "500054" 1
    2013 "500054" 1
    2014 "500054" 1
    2015 "500054" 1
    2016 "500054" 1
    2017 "500054" 1
    2018 "500054" 1
    1994 "500054" 3
    1995 "500054" 3
    1996 "500054" 3
    1997 "500054" 3
    1998 "500054" 3
    1999 "500054" 3
    2000 "500054" 3
    2001 "500054" 3
    2002 "500054" 3
    2003 "500054" 3
    end
    Instead of having the variable coef_type I want to create dummies (coef_type_x) indicating if a coefficient type was valid in a given town in a given year. The following example shall serve as an illustration (values do not correspond to the previous excerpt):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float year str6 id str1 float(coef_type_0 coef_type _1 coef_type_2 coef_type_ coef_type_4 coef_type_5 coef_type_6)
    2011 "500011" 1 1 0 1 1 0
    2012 "500011" 1 1 0 1 1 0
    2013 "500011" 1 1 0 0 1 0
    2014 "500011" 1 1 1 0 1 0
    2015 "500011" 1 1 1 0 1 0
    2016 "500011" 1 1 1 0 1 0
    2017 "500011" 1 1 1 0 1 0
    2009 "500054" 1 1 1 1 1 1
    2010 "500054" 1 1 1 1 1 1
    2011 "500054" 1 1 1 1 1 1
    2012 "500054" 1 1 1 1 1 1
    2013 "500054" 1 1 1 1 1 1
    2014 "500054" 1 1 1 1 1 1
    2015 "500054" 1 1 1 1 1 0
    2016 "500054" 1 1 1 1 1 0
    2017 "500054" 1 1 1 1 1 0
    2018 "500054" 1 1 1 1 1 0
    end
    Thank you for any advice.

  • #2
    I think the simplest way to get what you seek is:
    Code:
    levelsof coef_type, local(coef_types)
    foreach c of local coef_types {
        gen coef_type_`c' = `c'.coef_type
    }
    That said, why do you want to do this? The commonest reason people create indicator variables from a multi-level category variable like this is to use the indicators as predictors in regressions ("dummy" variables). But in modern Stata you don't need to do that. You just code:

    Code:
    regression_command outcome_var i.coef_type perhaps_other_variables
    Then Stata creates the appropriate indicators "on the fly" without actually taking up space in the memory allocated to your data set. And, best of all, you can then use the -margins- command to get lots of useful post-regression statistics.

    Comment


    • #3
      Thank you for reply! However, it is not exactly what I need. As a result, I want to have only one observation for a town in a year, with dummies indicating which coefficient types were valid during that year in the town. Using your suggested way I would still have many observations of a town in a single year, with dummies=1 being listed only in the observations with the respective coefficient type (not in a unique town-year observation). I need to do this to be able to merge the dataset with another town-year panel data.

      Comment


      • #4
        99 times out of 100 on Statalist, requests like yours are well-suited to the advice Clyde gave, and as a consequence he overlooked the fact that for your data each id/year combination can have multiple coefficient types.

        In Stata terminology, what you have is a long layout of your data, and you need to reshape it into a wide layout. The following example run on your example data should start you in a useful direction.
        Code:
        generate coef_type_ = 1
        reshape wide coef_type_, i(id year) j(coef_type)
        mvencode coef_type_*, mv(0)
        list, noobs sepby(id)) abbreviate(12)
        Code:
        . list, noobs sepby(id) abbreviate(12)
        
          +-----------------------------------------------------------------------+
          | year       id   coef_type_1   coef_type_2   coef_type_3   coef_type_6 |
          |-----------------------------------------------------------------------|
          | 2011   500011             1             1             0             1 |
          | 2012   500011             1             1             0             1 |
          | 2013   500011             1             1             0             1 |
          | 2014   500011             1             1             0             1 |
          | 2015   500011             1             1             0             1 |
          | 2016   500011             1             1             0             1 |
          | 2017   500011             1             1             0             1 |
          | 2018   500011             1             1             0             1 |
          |-----------------------------------------------------------------------|
          | 1994   500054             1             0             1             0 |
          | 1995   500054             1             0             1             0 |
          | 1996   500054             1             0             1             0 |
          | 1997   500054             1             0             1             0 |
          | 1998   500054             1             0             1             0 |
          | 1999   500054             1             0             1             0 |
          | 2000   500054             1             0             1             0 |
          | 2001   500054             1             0             1             0 |
          | 2002   500054             1             0             1             0 |
          | 2003   500054             1             0             1             0 |
          | 2004   500054             1             0             0             0 |
          | 2005   500054             1             0             0             0 |
          | 2006   500054             1             0             0             0 |
          | 2007   500054             1             0             0             0 |
          | 2008   500054             1             0             0             0 |
          | 2009   500054             1             0             0             0 |
          | 2010   500054             1             0             0             0 |
          | 2011   500054             1             0             0             0 |
          | 2012   500054             1             0             0             0 |
          | 2013   500054             1             0             0             0 |
          | 2014   500054             1             0             0             0 |
          | 2015   500054             1             0             0             0 |
          | 2016   500054             1             0             0             0 |
          | 2017   500054             1             0             0             0 |
          | 2018   500054             1             0             0             0 |
          +-----------------------------------------------------------------------+

        Comment


        • #5
          Thank you, this works perfectly!

          Comment

          Working...
          X