Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • expand data, count consecutive values and create a time variable

    Hi there,

    Here is an example for a dataset I am working with, which contains firm id, sector code, year and month only. Here I consider only 2 years and 3 months.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(firm_id1 sector) int year byte month
    1 1 2010 1
    1 1 2010 2
    1 1 2010 3
    1 1 2011 1
    1 1 2011 2
    1 1 2011 3
    2 1 2010 1
    2 1 2010 2
    2 1 2010 3
    2 1 2011 1
    2 1 2011 2
    2 1 2011 3
    3 1 2010 1
    3 1 2010 2
    3 1 2010 3
    end

    I want to:
    1. expand this dataset
    2. Create two variables treat1 and treat2 such that
    treat 1 = 1 if a firm took a monthly action and is observed on that month. 0 otherwise
    treat 2 = 1 if a firm has consecutively taken action at least two times within a year

    Here is what I am looking for regarding the variables created


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(firm_id1 sector) int year byte(month treat treat1 treat2)
    1 1 2010 1 1 1 1
    1 1 2010 2 1 1 1
    1 1 2010 3 1 1 1
    1 1 2011 1 1 1 1
    1 1 2011 2 1 1 1
    1 1 2011 3 1 1 1
    2 1 2010 1 1 1 0
    2 1 2010 2 1 1 0
    2 1 2010 3 . 0 0
    2 1 2011 1 1 1 0
    2 1 2011 2 1 1 0
    2 1 2011 3 . 0 0
    3 1 2010 1 1 1 0
    3 1 2010 2 . 0 0
    3 1 2010 3 . 0 0
    end
    3. Next, I would like to create a variable time that goes from january2010 to march2011 such that
    march2010 =0
    feb2010=-1
    jan2010=-2
    jan2011=1
    feb2011=2
    march2011=3


    Can anyone knows how to handle these tasks in Stata?

  • #2
    Code:
    isid firm_id1 sector year month
    
    
    //  CHANGE 1/. CODING TO 1/0 FOR TREAT, TO SIMPLIFY CODE
    replace treat = 0 if missing(treat)
    rename treat treat1
    
    by firm_id1 year (month), sort: gen run = sum(treat != treat[_n-1])
    by firm_id1 year run (month), sort: gen byte treat2 = (_N >= 2 & treat1[1])
    by firm_id1 year (treat2), sort: replace treat2 = treat2[_N]
    The above code creates the first two new variables according to your words. But it does not agree with the results you show. Please recheck your shown results: I believe they are incorrect, because it appears that firm 2 in both 2010 and 2011 does take action consecutively in months 1 and 2.

    I do not understand your final request. It seems you want to track the progress of time month by month, with zero corresponding to March 2010. But then why do the months in 2011 restart at 1 in January? It doesn't make any sense to me. Is there some reason for, in effect, identifying January 2011 with April 2010, etc.?

    Comment


    • #3
      Hi Clyde,

      Thank you for this reply. You're right. The result I showed refers to the case when firms take action or not in three consecutive months instead of two.

      I forgot to specify that variable "treat" is not in the data. I created it to show the type of result I want to get. Sorry for the confusion. So, what treat shows is whether a given firms takes an action or not in a given month (meaning the firms is visible in the dataset or not).

      Regarding the last question (#3), I am trying to track the progress of time after the implementation of a communication policy that happens in March 2010 (t = 1,2, ...) and also before its implementation (t=-1,-2, ...). This is to be able to run a dynamic Diff-in-Diff afterwards.

      Comment


      • #4
        I forgot to specify that variable "treat" is not in the data. I created it to show the type of result I want to get. Sorry for the confusion. So, what treat shows is whether a given firms takes an action or not in a given month (meaning the firms is visible in the dataset or not).
        That data arrangement is unsafe. It is impossible to discern whether the next month after the final month the firm is visible is a month in which the firm did not take action or is a month in which the end of the firm's observation period was reached. Nevertheless, since that's what you have, I will give you code to use with it. The code will only produce correct results if the absence of an observation for any month after the last month a firm_id is visible is due to reaching the end of the firm's period of observation. (If that isn't true, there really any way to know what is going on with respect to taking action.)

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(firm_id1 sector) int year byte month
        1 1 2010 1
        1 1 2010 2
        1 1 2010 3
        1 1 2011 1
        1 1 2011 2
        1 1 2011 3
        2 1 2010 1
        2 1 2010 2
        2 1 2011 1
        2 1 2011 2
        3 1 2010 1
        end
        
        assert !missing(year, month)
        gen mdate = ym(year, month)
        format mdate %tm
        assert !missing(mdate)
        
        //  FILL GAPS IN THE DATA
        xtset firm_id1 mdate
        tsfill
        gen byte treat1 = !missing(year, month)
        
        
        //  FILL IN MISSING VALUES
        replace year = year(dofm(mdate)) if missing(year)
        replace month = month(dofm(mdate)) if missing(month)
        by firm_id1 (mdate): replace sector = sector[_n-1] if missing(sector)
        
        //  CALCULATE TREAT2
        by firm_id1 year (month), sort: gen run = sum(treat != treat[_n-1])
        by firm_id1 year run (month), sort: gen byte treat2 = (_N >= 2 & treat1[1])
        by firm_id1 year (treat2), sort: replace treat2 = treat2[_N]
        
        //  CREATE SEQUENTIAL TIME VARIABLE ZEROED AT MARCH 2010
        gen seq = mdate - tm(2010m3)
        In the future, bear in mind that the main point of providing example data is to give those who want to help you data set that they can use to develop and test code. If what you show is different from what you have, there is a good chance that you will get unusable or incorrect code in return. So, don't post an example with variables that don't actually exist. Or if you do, make it clear that those variables are things you want to get created, not things you are starting out with. Similarly, if you do show an example of what you want, make stringent efforts to make sure it is actually correct. Not doing these things just wastes other people's time, and, ultimately, wastes yours too, as you end up having to post back with explanations and requests for changes and then wait for somebody to reply to those.

        Comment


        • #5
          Thank you very much Clyde. I will bear these advices in mind for future posts. I add the option full to tsfill to acount for the 12 months and the code gives me what I was looking for.

          Comment

          Working...
          X