Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • re-generate time varying variable when collapsing from daily panel data to monthly panel data

    Dear Statalist Members,

    Hi, I'm trying to convert my daily panel data to monthly panel data using collapse function.
    I'm sorry if I'm asking a similar question that have been posted somewhere.
    Before I post this question, I searched for some posts and answers but couldn't find exact one that I wanted.
    If someone knows similar Q&A, I would appreciate if you could provide me with a link.

    First, here's the example data below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(hotel_id time_daily rating num_bedroom) str8 cancel_policy float cancel_numeric
    100 18263 5 2 "flexible" 0
    100 18264 4 2 "flexible" 0
    100 18265 5 2 "flexible" 0
    100 18266 1 2 "flexible" 0
    100 18267 1 2 "flexible" 0
    100 18294 1 2 "flexible" 0
    100 18295 2 2 "flexible" 0
    100 18298 1 2 "flexible" 0
    100 18299 2 2 "flexible" 0
    102 18264 4 3 "flexible" 0
    102 18265 5 3 "flexible" 0
    102 18266 3 3 "flexible" 0
    102 18267 4 3 "flexible" 0
    102 18268 2 3 "flexible" 0
    102 18269 3 3 "flexible" 0
    102 18294 4 3 "flexible" 0
    102 18295 4 3 "flexible" 0
    102 18297 3 3 "flexible" 0
    102 18300 2 3 "flexible" 0
    102 18322 5 3 "flexible" 0
    102 18323 4 3 "strict"   1
    102 18324 5 3 "strict"   1
    end
    format %tdMon_DD,_CCYY time_daily


    I don't know why time_daily and time_monthly shows in that way but it will show up properly (e.g. Jan 01, 2010 or Jan 2010) when you copy and paste the code in Stata.

    I created a monthly time variable based on daily time variable:
    Code:
    gen time_monthly=mofd(time_daily)
    format %tmMon_CCYY time_monthly
    In this example data set, there are 3 kinds of variables.
    1.time-invariant variable (number of bedrooms): since this never changes over the time, I thought I could just use first value of id when collapsing the data to monthly.
    2.time varying continuous variable (rating): since this changes over the time, I would calculate the average value of each variable by id and month.
    3.time varying binary variable (cancellation policy;My question): cancellation policy variable corresponds to this type.
    Let's say there are only two cases, whether it is flexible or strict. In addition, it may change from flexible to strict over time but it will never become flexible again once it becomes from flexible to strict.
    How do I convert this one-time changing variable when collapsing from daily to monthly panel?
    My plan is to have a new monthly cancellation policy variable which is equal to 1 if it has been changed within that month or 0 for the past months.
    For example, for hotel_id 102, cancellation policy had been changed from flexible to strict on Mar 02, 2010. So, it will be 1 after Mar 2010 (Apr 2010, May 2010, ...) but 0 before Feb 2010 (Jan 2010, Dec 2009,...).

    It looks like if I use "if condition", I can implement it but not clear...Can anyone help me out how to do this when using collapse command?

    Code:
    collapse (mean) avg_rating=rating (firstnm) num_bedroom_monthly=num_bedroom , by(hotel_id time_monthly)
    The above code excludes cancellation policy variable.

    The desired output will look like below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(hotel_id time_monthly avg_rating num_bedroom_monthly month_cancel_policy)
    100 600       3.2 2 0
    100 601       1.5 2 0
    102 600       3.5 3 0
    102 601      3.25 3 0
    102 602 4.6666665 3 1
    end
    format %tmMon_CCYY time_monthly
    Thank you in advance.



    Last edited by Ryan Kim; 06 Aug 2019, 19:31. Reason: Added a tag

  • #2
    My plan is to have a new monthly cancellation policy variable which is equal to 1 if it has been changed within that month or 0 for the past months.
    I don't think that's a good idea, because 0 will be used for both always flexible and always strict: it just encodes the absence of change, and doesn't distinguish always flexible from always strict.

    I would instead create a three level variable with values 0 for flexible, 1 for changed from flexible to strict during that month, and 2 for strict. Like this:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(hotel_id time_daily rating num_bedroom) str8 cancel_policy float cancel_numeric
    100 18263 5 2 "flexible" 0
    100 18264 4 2 "flexible" 0
    100 18265 5 2 "flexible" 0
    100 18266 1 2 "flexible" 0
    100 18267 1 2 "flexible" 0
    100 18294 1 2 "flexible" 0
    100 18295 2 2 "flexible" 0
    100 18298 1 2 "flexible" 0
    100 18299 2 2 "flexible" 0
    102 18264 4 3 "flexible" 0
    102 18265 5 3 "flexible" 0
    102 18266 3 3 "flexible" 0
    102 18267 4 3 "flexible" 0
    102 18268 2 3 "flexible" 0
    102 18269 3 3 "flexible" 0
    102 18294 4 3 "flexible" 0
    102 18295 4 3 "flexible" 0
    102 18297 3 3 "flexible" 0
    102 18300 2 3 "flexible" 0
    102 18322 5 3 "flexible" 0
    102 18323 4 3 "strict"   1
    102 18324 5 3 "strict"   1
    end
    format %tdMon_DD,_CCYY time_daily
    
    gen time_monthly=mofd(time_daily)
    format %tmMon_CCYY time_monthly
    
    by hotel_id time_monthly (time_daily), sort: ///
        assert cancel_policy >= cancel_policy[_n-1]
    
    by hotel_id time_monthly (time_daily): gen byte monthly_policy = 0 /// 
        if cancel_policy[_N] == "flexible"
    by hotel_id time_monthly (time_daily): replace monthly_policy = 1 ///
        if cancel_policy[1] != cancel_policy[_N]
    by hotel_id time_monthly (time_daily): replace monthly_policy = 2 ///
        if cancel_policy[1] == "strict"
        
    collapse (mean) rating (first) num_bedroom cancellation_policy = monthly_policy, ///
        by(hotel_id time_monthly)
    label define cancellation_policy    0   "Always Flexible" ///
                                        1   "Flexible -> Strict" ///
                                        2   "Always Strict"
    label values cancellation_policy cancellation_policy
    Concerning "I don't know why time_daily and time_monthly shows in that way but it will show up properly (e.g. Jan 01, 2010 or Jan 2010) when you copy and paste the code in Stata," this is exactly how -dataex- works. -dataex- output is not meant for human eyes to read. It is meant to be run as Stata code. What the -dataex- output shows you for the date variables is the actual numeric values that Stata uses to represent those dates. The command near the bottom of the -dataex- output that applies a %td... format to that variable then tells Stata that when -list-ing or -browse-ing the data, it should display them as human readable dates like Jan 01, 2010. But that doesn't change the internal representation of those dates as numbers. It is the internal representation as numbers that makes it possible for Stata to properly sort dates in chronological order and calculate intervals between dates. The display formats make them comprehensible for human readers when Stata is asked to show them to human eyes, but play no other role.

    Comment


    • #3
      Dear Clyde,

      I appreciate your reply. I tried your code step by step and it worked perfect.
      My apology that my question wasn't clear enough but I don't need to encode the transition stage (flexible->strict) separately but want it to be included in 'Strict' stage.
      In other words, the cancellation policy is a binary variable which equals to 0 if it is "Always Flexible" or 1 if it is "Flexible->Strict" and "Always Strict".

      In this case, do I just slightly change the last line (from 2 to 1), just like below?
      It seems so but just wanted to confirm it.
      Thank you.

      Code:
      by hotel_id time_monthly (time_daily): gen byte monthly_policy = 0 ///    if cancel_policy[_N] == "flexible"  
      by hotel_id time_monthly (time_daily): replace monthly_policy = 1 ///     if cancel_policy[1] != cancel_policy[_N]  
      by hotel_id time_monthly (time_daily): replace monthly_policy = 1 ///     if cancel_policy[1] == "strict"
      Last edited by Ryan Kim; 07 Aug 2019, 17:39.

      Comment


      • #4
        Yes, and change the label definition accordingly as well.

        Comment


        • #5
          Thank you, Clyde.
          I always learn a lot from here.

          Comment

          Working...
          X