Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stata programming combining 2 variables

    Let me explain, what I want to do here. I have four variables of household id (ssuid), year, month and poverty. I have 28 months of data (May 2008 to August 2010). I want to see the distribution of poverty each month. Instead of doing this for each month manually, I want to do this by using stata's foreach and/or forvalue command.
    Finally, I want to see what percent of household months were in poverty each month.

    Thanks in advance!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double ssuid int year str9 month byte poverty
    19128000276 2008 "June"      1
    19128000276 2008 "July"      1
    19128000276 2008 "August"    1
    19128000276 2008 "September" 1
    19128000276 2008 "October"   1
    19128000276 2008 "November"  1
    19128000276 2008 "December"  1
    19128000276 2009 "January"   1
    19128000276 2009 "February"  1
    19128000276 2009 "March"     1
    19128000276 2009 "April"     1
    19128000276 2009 "May"       1
    19128000276 2009 "June"      1
    19128000276 2009 "July"      1
    19128000276 2009 "August"    1
    19128000276 2009 "September" 1
    19128000276 2009 "October"   1
    19128000276 2009 "November"  1
    19128000276 2009 "December"  1
    19128000276 2010 "January"   1
    19128000276 2010 "February"  1
    19128000276 2010 "March"     1
    19128000276 2010 "April"     1
    19128000276 2010 "May"       1
    19128000276 2010 "June"      1
    19128000276 2010 "July"      1
    19128000276 2010 "August"    1
    19128000334 2010 "June"      0
    19128000334 2010 "July"      0
    19128000334 2010 "August"    0
    19128000932 2008 "June"      0
    19128000932 2008 "July"      0
    19128000932 2008 "August"    0
    19128000932 2008 "September" 0
    19128000932 2008 "October"   0
    19128000932 2008 "November"  0
    19128000932 2008 "December"  0
    19128000932 2009 "January"   0
    19128000932 2009 "February"  0
    19128000932 2009 "March"     0
    19128000932 2009 "April"     0
    19128000932 2009 "May"       0
    19128000932 2009 "June"      0
    19128000932 2009 "July"      0
    19128000932 2009 "August"    0
    19128000932 2009 "September" 0
    19128000932 2009 "October"   0
    19128000932 2009 "November"  0
    19128000932 2009 "December"  0
    19128000932 2010 "January"   0
    19128000932 2010 "February"  0
    19128000932 2010 "March"     0
    19128000932 2010 "April"     0
    19128000932 2010 "May"       0
    19128038099 2010 "June"      0
    19128038099 2010 "July"      0
    19128038099 2010 "August"    0
    19128038334 2008 "June"      0
    19128038334 2008 "July"      0
    19128038334 2008 "August"    0
    19128038334 2008 "September" 0
    19128038334 2009 "February"  0
    19128038334 2009 "March"     0
    19128038334 2009 "April"     0
    19128038334 2009 "May"       0
    19128038334 2009 "June"      0
    19128038334 2009 "July"      0
    19128038334 2009 "August"    0
    19128038334 2009 "September" 0
    19128038334 2009 "October"   0
    19128038334 2009 "November"  0
    19128038334 2009 "December"  0
    19128038334 2010 "January"   0
    19128038334 2010 "February"  0
    19128038334 2010 "March"     0
    19128038334 2010 "April"     0
    19128038334 2010 "May"       0
    19133451319 2008 "June"      0
    19133451319 2008 "July"      0
    19133451319 2008 "August"    0
    19133451319 2008 "September" 0
    19133451319 2008 "October"   0
    19133451319 2008 "November"  0
    19133451319 2008 "December"  0
    19133451319 2009 "January"   0
    19133451319 2009 "February"  0
    19133451319 2009 "March"     0
    19133451319 2009 "April"     0
    19133451319 2009 "May"       0
    19133451319 2009 "June"      0
    19133451319 2009 "July"      0
    19133451319 2009 "August"    0
    19133451319 2009 "September" 0
    19133451389 2009 "October"   0
    19133451389 2009 "November"  0
    19133451389 2009 "December"  0
    19133451389 2010 "January"   0
    19133451389 2010 "February"  0
    19133451389 2010 "March"     0
    19133451389 2010 "April"     0
    end

  • #2
    The only real obstacle here is that you do not have a real Stata date variable. Once you do that, it is straightforward to calculate everything you ask with simple Stata commands. There is no need for loops.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double ssuid int year str9 month byte poverty
    19128000276 2008 "June"      1
    19128000276 2008 "July"      1
    19128000276 2008 "August"    1
    19128000276 2008 "September" 1
    19128000276 2008 "October"   1
    19128000276 2008 "November"  1
    19128000276 2008 "December"  1
    19128000276 2009 "January"   1
    19128000276 2009 "February"  1
    19128000276 2009 "March"     1
    19128000276 2009 "April"     1
    19128000276 2009 "May"       1
    19128000276 2009 "June"      1
    19128000276 2009 "July"      1
    19128000276 2009 "August"    1
    19128000276 2009 "September" 1
    19128000276 2009 "October"   1
    19128000276 2009 "November"  1
    19128000276 2009 "December"  1
    19128000276 2010 "January"   1
    19128000276 2010 "February"  1
    19128000276 2010 "March"     1
    19128000276 2010 "April"     1
    19128000276 2010 "May"       1
    19128000276 2010 "June"      1
    19128000276 2010 "July"      1
    19128000276 2010 "August"    1
    19128000334 2010 "June"      0
    19128000334 2010 "July"      0
    19128000334 2010 "August"    0
    19128000932 2008 "June"      0
    19128000932 2008 "July"      0
    19128000932 2008 "August"    0
    19128000932 2008 "September" 0
    19128000932 2008 "October"   0
    19128000932 2008 "November"  0
    19128000932 2008 "December"  0
    19128000932 2009 "January"   0
    19128000932 2009 "February"  0
    19128000932 2009 "March"     0
    19128000932 2009 "April"     0
    19128000932 2009 "May"       0
    19128000932 2009 "June"      0
    19128000932 2009 "July"      0
    19128000932 2009 "August"    0
    19128000932 2009 "September" 0
    19128000932 2009 "October"   0
    19128000932 2009 "November"  0
    19128000932 2009 "December"  0
    19128000932 2010 "January"   0
    19128000932 2010 "February"  0
    19128000932 2010 "March"     0
    19128000932 2010 "April"     0
    19128000932 2010 "May"       0
    19128038099 2010 "June"      0
    19128038099 2010 "July"      0
    19128038099 2010 "August"    0
    19128038334 2008 "June"      0
    19128038334 2008 "July"      0
    19128038334 2008 "August"    0
    19128038334 2008 "September" 0
    19128038334 2009 "February"  0
    19128038334 2009 "March"     0
    19128038334 2009 "April"     0
    19128038334 2009 "May"       0
    19128038334 2009 "June"      0
    19128038334 2009 "July"      0
    19128038334 2009 "August"    0
    19128038334 2009 "September" 0
    19128038334 2009 "October"   0
    19128038334 2009 "November"  0
    19128038334 2009 "December"  0
    19128038334 2010 "January"   0
    19128038334 2010 "February"  0
    19128038334 2010 "March"     0
    19128038334 2010 "April"     0
    19128038334 2010 "May"       0
    19133451319 2008 "June"      0
    19133451319 2008 "July"      0
    19133451319 2008 "August"    0
    19133451319 2008 "September" 0
    19133451319 2008 "October"   0
    19133451319 2008 "November"  0
    19133451319 2008 "December"  0
    19133451319 2009 "January"   0
    19133451319 2009 "February"  0
    19133451319 2009 "March"     0
    19133451319 2009 "April"     0
    19133451319 2009 "May"       0
    19133451319 2009 "June"      0
    19133451319 2009 "July"      0
    19133451319 2009 "August"    0
    19133451319 2009 "September" 0
    19133451389 2009 "October"   0
    19133451389 2009 "November"  0
    19133451389 2009 "December"  0
    19133451389 2010 "January"   0
    19133451389 2010 "February"  0
    19133451389 2010 "March"     0
    19133451389 2010 "April"     0
    end
    
    format ssuid %12.0f
    
    //  CREATE A STATA INTERNAL FORMAT MONTHLY DATE
    gen temp = string(year) + " " + month
    gen mdate = monthly(temp, "YM")
    assert missing(mdate) == missing(month, year)
    format mdate %tmMon_CCYY
    drop temp
    isid ssuid mdate
    
    //  DISTRIBUTION OF POVERTY BY MONTH
    tab mdate poverty, row
    
    //  CALCULATE PERCENT OF HOUSEHOLD MONTHS IN POVERTY EACH MONTHLY
    by mdate, sort: gen hh_months = _N
    by mdate: egen hh_months_in_poverty = total(poverty)
    gen pct_hh_months_in_poverty = 100*hh_months_in_poverty/hh_months

    Comment


    • #3
      I think if the end goal is to plot the data this might give you what you want:

      Code:
      gen month_date = mofd(date(month + "-" + string(year), "MY"))
      format month_date %tmMon-YY
      
      collapse (sum) poverty (count) ssuid, by(month_date)
      gen sh_poverty = 100*(poverty/ssuid)
      Note that this method doesn't preserve the dataset.

      Edit: Crossed with Clyde's post.

      Comment


      • #4
        @Clyde SchechterMany thanks for your support. This works well. In my data, the ssuid is a string variable and I didn't use
        Code:
         format ssuid %12.0f
        when I use your code.
        My understanding was using the stata loop may work faster than other methods due to 28 different months of analysis.

        @Justin Blasongame I tried to work with your codes using preserve and restore. However, I get an error message when stata executes the collapse command line. The following is the message that I get.
        Code:
        type mismatch
        r(109);
        Please note that I make sure that all the variables are in string format here.

        Comment


        • #5
          You told us in #1 that ssuid is double and now in #4 that it has a format %12.0f. Fine, but so it is not a string variable.

          There is no obvious reason why the collapse command should fail as all the variables concerned are numeric.

          My guess has to be that you are moving back and forth between different versions of your dataset.

          Comment


          • #6
            Correction: in #4 you said you didn't use %12.0f format. Conversely, using strings as reported in #4 makes no sense.

            Comment


            • #7
              My understanding was using the stata loop may work faster than other methods due to 28 different months of analysis.
              On the contrary, in Stata, solutions with -by- are always much faster than using loops. You should only resort to loops for problems that cannot be done with -by-.

              Comment

              Working...
              X