Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating status variable for firm

    Dear Statalist users,

    I have panel data for firms involved in international trade. t1 is my variable which is a dummy (1 if firm participates in global markets and 0 otherwise). Now I want to generate the following

    a) starter firms - These are firms that do not trade in t-1 and t-2 periods but begin trading at t and continue to do so till the sample period
    b) stop firms - These are firms who were trading but then stopped during the sample period and continued to not trade after deciding to stop
    c) switchers - these are firms that either started trading for the first time and then stopped, or vice-versa. (in the data example you can see firm 365 starts trading in 2002 and then stops in 2004)

    I am not able to construct these variables. It would be really helpful someone could help me with it

    input double id int year float t1
    365 2001 0
    365 2002 1
    365 2003 1
    365 2004 0
    365 2005 1
    365 2006 1
    365 2007 0
    365 2008 0
    365 2009 0
    365 2010 0
    365 2011 0
    365 2012 0
    365 2013 0
    381 2001 1
    381 2002 1
    381 2003 1
    381 2004 1
    381 2005 1
    381 2006 1
    381 2007 1
    381 2008 1
    381 2009 1
    381 2010 1
    381 2011 .
    381 2012 .
    381 2013 .
    569 2001 0
    569 2002 0
    569 2003 0
    569 2004 0
    569 2005 0
    569 2006 0
    569 2007 0
    569 2008 0
    569 2009 0
    569 2010 0
    569 2011 .
    569 2012 .
    569 2013 .
    595 2001 .
    595 2002 .
    595 2003 .
    595 2004 .
    595 2005 .
    595 2006 0
    595 2007 0
    595 2008 0
    595 2009 0
    595 2010 0
    595 2011 0
    595 2012 0
    595 2013 0
    600 2001 .
    600 2002 .
    600 2003 .
    600 2004 .
    600 2005 .
    600 2006 .
    600 2007 .
    600 2008 .
    600 2009 0
    600 2010 0
    600 2011 0
    600 2012 0
    600 2013 .
    783 2001 0
    783 2002 0
    783 2003 0
    783 2004 0
    783 2005 0
    783 2006 0
    783 2007 0
    783 2008 0
    783 2009 0
    783 2010 0
    783 2011 0
    783 2012 0
    783 2013 0
    870 2010 0
    934 2001 .
    934 2002 .
    934 2003 .
    934 2004 0
    934 2005 0
    934 2006 0
    934 2007 0
    934 2008 0
    934 2009 0
    934 2010 .
    934 2011 .
    934 2012 .
    934 2013 .
    1120 2001 1
    1120 2002 1
    1120 2003 1
    1120 2004 1
    1120 2005 1
    1120 2006 1
    1120 2007 1
    1120 2008 1
    end


  • #2
    Do you want a result with one line per firm, and then dummies to tell which is a switcher, which is a starter...?

    What if there are multiple switches? (I guess it's a switcher anyway)

    How do you consider missings? We could pretend they imply as few switches as possible: "0 . 0" means "0 0 0", "1 . 1" means "1 1 1", "0 0 . ." means "0 0 0 0", etc. Is that right? Or we could state we don't know the answer, hence missing result as well.

    Comment


    • #3
      yes, a dummy to tell which firm is a starter and a stopper.

      Yes, multiple switches would be a switcher firm.

      For missing, we would not know whether at the missing point if the switch took place or not hence, we can't possibly provide an answer for the missing aspect of it.

      Comment


      • #4
        Here is a possibility, tell me if it fits your needs. I still have a doubt because you consider t-1 and t-2, while I consider the whole time span.

        The idea: this is really a pattern matching problem, so convert this to something on which we can use patterns, i.e. strings.
        I also consider that if a firm trades for consecutive years, it does not add information with regard to the switching patterns, so I first remove consecutive duplicates, except for missings, which add a "noninformation" (i.e., we can't conclude in some cases). I translate missings to "2" to simplify patterns.
        That's the first part.

        Then we have a string of 0, 1 and 2, with no consecutive 0 or 1, but possibly consecutive 2. Then here are the rules

        * If there is a 010 or 101 pattern with possibly something in between or around (even missing), then it's necessarily a switcher and we are done.
        * Otherwise, there is either a series of 0 followed by a series of 1 (with possibly 2 interspersed), either 1s followed by 0s (still with no consecutive 1s or 0s)
        - if there are at least two missings, or a missing at the beginning or at the end, or a 020 or 121 pattern, we can't conclude.
        Three cases remain:
        - otherwise this means we have at most one "2" and if there is one it's necessarily at the boundary between 0s and 1s, so it's either 021 or 120, with nothing in between, so we can safely conclude it's either a starter or a stop (whatever the missing may be)
        - otherwise there is no missing, ant either it's a constant series of 0s or 1s (that is, a single one as we removed dupes), either it's 01 or 10. In both cases we can conclude.

        Simplification: the last three cases happen with only 6 possible strings: 0, 1, 01, 10, 021, 120. So even without considering the presence of a missing, the string length tells us if it's a constant series or not, and the first value in the string is enough to finish.


        Code:
        replace t1=2 if mi(t1)
        bysort id (year): drop if _n>1 & t1!=2 & t1[_n-1]==t1
        gen t=string(t1)
        drop t1
        reshape wide t, i(id) j(year)
        egen s=concat(t*)
        drop t*
        
        gen q1=regexm(s,"1.*0.*1|0.*1.*0")
        gen q2=regexm(s,"^2|2$|2.*2|121|020")
        gen cl=cond(q1,4,cond(q2,5,real(substr(s,1,1))+2*(strlen(s)>1)))
        lab def cl 0 "Don't trade" 1 "Always trade" 2 "Starter" ///
        3 "Stop" 4 "Switcher" 5 "Unsure"
        lab val cl cl
        drop q1 q2

        Comment

        Working...
        X