Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyze an increase in minimu wages on stata

    Hi all, I'm very new to Stata and I'm doing my own research for a class to write a paper. I'm trying to figure out the how an increase in minimu wages can influence the employment. Can anyone help and point me in the right direction? i am using 4 countries as a control gruop and other 4 as a treatment. this is my dataset, it's not complete, because i will add other variables.
    i tried to do the diff in diff method but i am not sure that i am obtaining is correct. can you help me out please

    i created the dummy var treat and pos and did and i used this code, it' sverithing correct?
    regress Employment treat post did
    Last edited by Zakaria tun; 27 Apr 2021, 07:25.

  • #2
    I've converted the said .xls file using -dataex- for those who would like to consider answering this question:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str20 Country float(Year Houly_wage Employment Men_empl Women_empl Median_income_householhousehold)
    "Virginia"      2017 7.25 3134000 1720000 1414000  71293
    "Virginia"      2018 7.25 3276000 1812000 1464000  77151
    "Pennsylvania"  2017 7.25 4448000 2463000 1985000  63173
    "Pennsylvania"  2018 7.25 4465000 2474000 1991000  64524
    "Wisconsin"     2017 7.25 2159000 1233000  926000  59305
    "Wisconsin"     2018 7.25 2101000 1168000  933000  60773
    "Arizona"       2017 7.25 2283000 1312000  970000  56581
    "Arizona"       2018 10.5 2404000 1342000 1062000  59246
    "Washington"    2017 7.25 2538000 1472000 1066000  70979
    "Washington"    2018 11.5 2685000 1571000 1114000  74073
    "Massachusetts" 2017 7.25 2597000 1440000 1157000 77.385
    "Massachusetts" 2018   11 2704000 1473000 1231000 86.345
    end
    Zakaria, please check out the FAQ (https://www.statalist.org/forums/help) on using -dataex- so that next time you can post the code that directly describes the data.

    Comment


    • #3
      Hi Zakaria,

      As per the FAQ it's better to post using dataex rather than an excel spreadsheet.

      It might also be helpful to post some of the output you receive?

      In terms of the code, I would adjust it to account for the dummy variables (I am assuming did is an interaction term between treat and post?) and use robust standard errors:

      Code:
      reg employment i.treat i.post i.did, vce(robust)
      Best,
      Rhys

      Comment


      • #4
        Thanks Ken Chui, based on this I would consider a two-way fixed effects model to include region and time dummies. Something like:

        Code:
        xtreg employment i.treat i.year i.country, fe cluster(country)
        You will need to xtset the data first as I suggest using fixed effects (making the most of your panel data). Then cluster your standard errors over the state.

        Best,
        Rhys

        Comment


        • #5
          @rhyw Williams , yes.. did Is the interaction a post and treat

          Comment


          • #6
            Thankss a lot guys, so After i use as a panel set. It Will be everyhing ok?

            Comment


            • #7
              I mean, I don't know the full intricacies of your dataset, exactly what you are trying to estimate, the code you are using and what the output is etc

              I think playing around with the TWFE model should give you some nice estimates but I am not sure we can (yet) conclude that everything will be ok!

              You might also want to consider whether you log your dependent variable (if so, take caution with interpreting your estimates) and whether you omit certain outliers etc.

              Best,
              Rhys

              Comment


              • #8
                Thanks a lot !! Tonorrow i Will try to post the output
                thanks again.
                Last edited by Zakaria tun; 27 Apr 2021, 07:30.

                Comment


                • #9
                  Originally posted by Rhys Williams View Post
                  Thanks Ken Chui, based on this I would consider a two-way fixed effects model to include region and time dummies. Something like:

                  Code:
                  xtreg employment i.treat i.year i.country, fe cluster(country)
                  You will need to xtset the data first as I suggest using fixed effects (making the most of your panel data). Then cluster your standard errors over the state.

                  Best,
                  Rhys
                  I should clarify that xtreg means we don't need to include "i.country" (it is superfluous, and will be automatically omitted by Stata).

                  Comment


                  • #10
                    HI, you were so helpful and i want to show my final results, do you think that they are reliable? they can be used?
                    this is my dataset:

                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input byte ID str14 Country int Year double Houly_wage long(Employment Men_empl Women_empl) double Medianincomehousehold float(post treat log_Employment did _diff)
                    1 "Virginia" 2017 7.25 3134000 1720000 1414000 71293 0 0 14.95782 0 0
                    1 "Virginia" 2018 7.25 3276000 1812000 1464000 77151 1 0 15.002133 0 0
                    2 "Pennsylvania" 2017 7.25 4448000 2463000 1985000 63173 0 0 15.307965 0 0
                    2 "Pennsylvania" 2018 7.25 4465000 2474000 1991000 64524 1 0 15.31178 0 0
                    3 "North Carolina" 2017 7.25 3574000 1919000 1655000 49547 0 0 15.089196 0 0
                    3 "North Carolina" 2018 7.25 3679000 1979000 1700000 53369 1 0 15.118152 0 0
                    4 "Wisconsin" 2017 7.25 2159000 1233000 926000 59305 0 0 14.585155 0 0
                    4 "Wisconsin" 2018 7.25 2101000 1168000 933000 60773 1 0 14.557924 0 0
                    5 "New Jersey" 2017 7.25 3241000 1755000 1486000 72997 0 1 14.991392 0 0
                    5 "New Jersey" 2018 8.6 3321000 1834000 1487000 74176 1 1 15.015777 1 1
                    6 "Arizona" 2017 7.25 2283000 1312000 970000 56581 0 1 14.641 0 0
                    6 "Arizona" 2018 10.5 2404000 1342000 1062000 59246 1 1 14.692644 1 1
                    7 "Washington" 2017 7.25 2538000 1472000 1066000 70979 0 1 14.746887 0 0
                    7 "Washington" 2018 11.5 2685000 1571000 1114000 74073 1 1 14.80319 1 1
                    8 "Massachusetts" 2017 7.25 2597000 1440000 1157000 77.385 0 1 14.769868 0 0
                    8 "Massachusetts" 2018 11 2704000 1473000 1231000 86.345 1 1 14.810243 1 1
                    end
                    [/CODE]


                    this is my otput using the diff in diff method:

                    . reg log_Employment i.treat i.post i.did, vce(robust)

                    Linear regression Number of obs = 16
                    F(3, 12) = 0.81
                    Prob > F = 0.5143
                    R-squared = 0.1634
                    Root MSE = .24184

                    ------------------------------------------------------------------------------
                    | Robust
                    log_Employ~t | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    1.treat | -.1977475 .168516 -1.17 0.263 -.5649124 .1694175
                    1.post | .0124629 .2202943 0.06 0.956 -.4675171 .4924428
                    1.did | .0307138 .2418396 0.13 0.901 -.4962094 .557637
                    _cons | 14.98503 .1515964 98.85 0.000 14.65473 15.31533
                    ------------------------------------------------------------------------------

                    and this one is using panel data and xtreg so the two way fixed effect:

                    as you can say the variabeìle of treatment was omitted by collinearity

                    xtreg log_Employment i.treat i.Year, fe cluster( ID )
                    note: 1.treat omitted because of collinearity

                    Fixed-effects (within) regression Number of obs = 16
                    Group variable: ID Number of groups = 8

                    R-sq: Obs per group:
                    within = 0.5328 min = 2
                    between = . avg = 2.0
                    overall = 0.0037 max = 2

                    F(1,7) = 7.45
                    corr(u_i, Xb) = 0.0000 Prob > F = 0.0293

                    (Std. Err. adjusted for 8 clusters in ID)
                    ------------------------------------------------------------------------------
                    | Robust
                    log_Employ~t | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    1.treat | 0 (omitted)
                    |
                    Year |
                    2018 | .0278198 .0101912 2.73 0.029 .0037213 .0519182
                    |
                    _cons | 14.88616 .0050956 2921.36 0.000 14.87411 14.89821
                    -------------+----------------------------------------------------------------
                    sigma_u | .24394541
                    sigma_e | .01969136
                    rho | .9935264 (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------

                    .
                    sorry to bother you but can you help me out and tell me if they are statistically significant --> i thing that the 2 model is fit more my data, it have a r square more higher and the p-value is above the 0.05, maybe it's beacuse i dont have a lot of data or they are not statistically significant
                    Last edited by Zakaria tun; 28 Apr 2021, 03:01.

                    Comment


                    • #11
                      Hi Zakaria,

                      As you can see from the first output, none of your variables are significant (for it to be significant, the p-value would need to be lower than 0.10 or 0.05, depending on your convention). The variable of interest is on i.did.

                      Similarly, in the second output, we care about the variable on treatment. Therefore, you need to investigate why it is being omitted. Your output isn't formatted that clearly but it looks like you have two years for each state but I can't see any variation (I could just be being blind...). For DiD/TWFE to work, you need to have some states which change treatment status (i.e. in year 1 they don't have treatment and in year 2 they do have treatment), is this the case?

                      Also, just to be clear. Finding insignificant variables in itself might tell you something... It could be the case that the policy has no effect
                      However, it could also be that your data isn't good enough to find an effect. It is sometimes difficult to decide which of the two is truly the case, although here, your data doesn't look great

                      Best,
                      Rhys

                      Comment


                      • #12


                        yes in my case is the column called post, maybe i have to use that dummy variable for my regression, sorry fo the image, but i am hust learning on how to use dataex
                        Attached Files
                        Last edited by Zakaria tun; 28 Apr 2021, 03:29.

                        Comment


                        • #13
                          If I am not misunderstanding your variables, you have the first 4 states which are never treated (either pre- or post-), you then have another 4 states which are always treated (both pre- and post-). But it doesn't look like you have any states which are not treated in the pre-period and are treated in the post-period.
                          Is this true?

                          If so, then you have no variation and I don't see how you can estimate DiD. DiD looks at the difference between states which see different treatment assignments over time.

                          Comment


                          • #14
                            the first 4 state i took them as a group of control beacuse they werent treated, as you can see the hourly wage didn't change from 2017 to 2018, and the 4 final country there was a treatment in wich the hourly wage was increased differently for each state from 2017 to 2018.
                            Last edited by Zakaria tun; 28 Apr 2021, 04:19.

                            Comment


                            • #15
                              Sorry, I see. In that case I would redefine your treatment variable as equal to 1 when the state is treated (i.e. in the second period for those 4 states). Your did variable will still be the same (=post*treat) but when you run the TWFE model, you won't have the omitted variable:

                              Code:
                               xtreg log_Employment i.treat i.Year, fe cluster( ID )
                              Best,
                              Rhys

                              Comment

                              Working...
                              X