Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating policy effect with Logit model

    So I am testing a policy which was introduced in a country trying to incentivise people to stay employed at older ages (beyond the retirement age of 65). As such, they introduced a bonus where people who work a year longer beyond 65 receive a higher pension a year later. I want to know if this policy incentivised more people to work beyond 65 after the policy introduction in 2004.

    I am using a logit model with the binary variable showing status as (Employed/Retired) as my dependent variable. I have health and financial characteristics of the individuals as independent variables.

    How can I test the policy's effect? Have one dummy variable showing whether an individual is over 65 or not, one dummy variable that shows 1 for the years the policy was active, and finally an interaction of these two (lets call it PA) (this coefficient will show the real policy effect significance), and a bunch of control variables.

    And then to check whether the effect of the policy is different men and women, should I split the sample into men first and then run the regression, then take the sample of only women and then run the regression? Alternatively, keep the model as specified above and just add a dummy variable for "female = 1, male = 0) and then another one that is a cross of gender dummy variable and policy dummy variable?

  • #2
    You need to describe your data in somewhat greater detail. Is this a longitudinal data set where the same people are followed starting at some age before 65 and continuing until after age 65? Or do you have a bunch of people, some of whom are below 65 and other are over 65, but they are not the same people? This is key to how you would structure almost every aspect of this analysis.

    Comment


    • #3
      Yes, it's a longitudinal dataset where I follow the same people over the timespan of 6 years

      Comment


      • #4
        So you need to -xtset- your data with the unique person identifier. Then you can run an -xtlogit, fe- model with retired status as the dependent variable, your interaction between age over 65 and policy in effect, person-level fixed effects, and year effects. You can also include your various health characteristics, etc. Think about whether you also want cluster robust standard errors. And don't fill your model with too many predictors unless you have enough observations to support them: you need to keep your observations to predictors ratio high: ideally in the 50 to 100 range, and certainly above 25.

        As for your question about modification of the policy effect by sex, either approach you mention is viable. If it is plausible that the effects of all the various health and financial characteristics are the same in both sexes, then adding a sex variable and an interaction of that with the previous interaction is the simplest way to go. But if you think those other effects also vary by sex, then doing separate male and female analyses is probably better.

        If you want more concrete guidance in coding this, you need to post back with example data. Use the -dataex- command to do that. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          Great! Thank you so much for the confirmation on how to deal with the gender difference problem. I will try them out!

          And one more clarification to make sure I understand: this method is not effectively a DID, even though I have an interaction term between 'relevant age group' and 'policy is active'.
          Why?

          1) dataset contains individuals aged 50-80. There is a dummy for whether an individual is between 65-75 (therefore the policy is relevant to him/her), but it's not necessary that the individuals who effectively get a value of 0 for the dummy constitute a control group (they do not need to fulfil the parallel trend assumption?)

          2) In a DID framework, TREAT and TIME also need to appear on their own; here the goal is just to see if in addition to health work and wellbeing characteristics, individuals that fell in the age group 65-75 were more likely to be employed in the years this age group a policy was active promising them a bonus payment for remaining employed at those ages.

          Or is it effectively just a regular DID and I NEED to fulfil parallel trend assumption and if I can't fulfil it because I son't have a clear control group, then is it better to use approach 2: restrict the dataset to individuals ONLY aged 65-75 with employment status (=1 for employed and =0 for retired) as the dependent variable and then add ONLY a time dummy showing the years the policy was active along with health work and wellbeing characteristics to see if employment odds indeed increased after the policy introduction.

          (I was hoping to use DID as just a robustness check, so use approach 2 as a base case model and then a DID as a robustness check.

          Comment


          • #6
            And one more clarification to make sure I understand: this method is not effectively a DID, even though I have an interaction term between 'relevant age group' and 'policy is active'.
            Why?
            It is not a classical DID, because that would require that the policy take effect for all people simultaneously. But in your data, the policy only becomes effective for a person when they turn 65, and for different people that happens in different years. The structure of your analysis is, instead, called a generalized DID. https://www.annualreviews.org/doi/pd...-040617-013507 is a good reference about both kinds of DID if you want to read more about this.

            1) dataset contains individuals aged 50-80. There is a dummy for whether an individual is between 65-75 (therefore the policy is relevant to him/her), but it's not necessary that the individuals who effectively get a value of 0 for the dummy constitute a control group (they do not need to fulfill the parallel trend assumption?)
            The parallel trends assumption is just as important here as it would be in any DID analysis. It's just a little more complicated to figure out. In your situation, the "control" group is those people who never reach age 65 in your data set. You do want to show that their employment trends are parallel to the employment trends before age 65 among those who do ultimately reach 65 in your data set. The key thing here is that the "x-axis" for a parallel trends graph would be age rather than year.

            2) In a DID framework, TREAT and TIME also need to appear on their own;
            That's true in a classical DID framework. In the generalized DID framework we instead use fixed effects for person and time. You will not have a generalized DID estimator if you omit the person and time fixed effects. They fulfill the same purpose here that TREAT and TIME fulfill in classical DID analysis.

            Comment


            • #7
              Right, I see.

              And if my dataset is a combination of cross-sections, and some individuals are the same but others are not, (some individuals participate in all 3 cross-sections, others only in one), I can use a simple logit without needing to go into panel regressions setup right?

              Comment


              • #8
                Strictly speaking, if any of the individuals are the same in different time periods, then you should treat it as panel data and use -xtlogit-. As a practical matter, if the number of individuals who recur in different time periods is small, it might make sense to ignore it and just stuck with -logit-. But I emphasize that the number of such individuals needs to be very small for this to be OK. And if you are determined to stick with -logit- you could eliminate the recurring observations on individuals by including them only in the first cross-section in which they appear, or by randomly selecting only one cross-section to retain them in.

                Comment

                Working...
                X