Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects probit model

    Hi everyone:

    I would like to ask you the following.
    I have a dataset, that is the outcome of a field experiment. It is formed of cross-sectional data: Two interviews took place in January 2017 and August 2017, in which the same individuals participated. The survey in Jan 2017 included socio-demographic variables, but apart from that both surveys had more or less the same questions regarding adoption habits.

    During these months, an intervention took place, and the outcome variable (Y) is adoption and equals 1 if the individual adopted by August 2017 and 0 otherwise.

    Five regions(R) were part of this intervention, adding up to 32 districts(D). [The treatment was randomized at the district level.] The names of these regions and districts are already in byte format.

    Since it is cross-sectional data I would like to have "region" as the fixed effects level. Also, I need to cluster errors terms at the district level, since individuals are likely to be similar within a district than between districts.

    The regression is as follows: Yidr = α + β0Tidr + γ1Xidr + γ2Wdr + Rr + eidr

    Yidr is the dependent variable (1 or 0), Tidr is the treatment variable (0, 1 or 2), Xidr is a vector of individual-level variables, Wdr controls for commune-level variations and Rr is the region strata fixed effects.

    Here I provide an example of my dataset (however the data is confidential and I cannot provide with more details. I hope this variables are enough to explain myself)

    Region District Adoption (Y) Treatment (T) Sex Education Children Risk aversion
    (Jan 17)
    Risk aversion (Aug 17) Income
    (Jan 17)
    Income (Aug 17)
    1 1 0 0 1 4 3 1 1 1500 1500
    1 3 1 1 0 1 0 0 1 1500 1750
    1 4 1 1 0 2 1 0 0 2000 1500
    2 5 0 1 0 3 2 1 0 1000 1200
    2 6 1 0 1 0 1 1 0 700 750
    2 8 1 1 0 1 1 0 1 1000 1000
    3 10 1 2 1 6 2 1 1 1500 3000
    3 11 0 0 0 0 0 0 0 1000 0
    3 12 1 2 1 2 2 1 1 1500 1500
    4 14 0 1 1 3 3 0 0 0 1500
    4 15 0 1 0 4 4 1 1 4000 4000
    4 16 1 0 1 5 1 1 1 500 500
    5 20 0 2 0 6 1 0 1 1000 1200
    5 22 1 2 0 2 2 1 0 1000 1000

    My question basically is
    (1) Which command should I use in Stata 16 to run the regression above, accounting for Y being 1 or 0 (probit) and also including region fixed effects.
    (2) Also, how can I create and include the vector of individual-level variables (X) (e.g. sex, education, children) and how can I add commune-level variations (W)?

    I know it is a very long post, but I would be very grateful if someone helps me. I have been struggling a while but I am stuck.

    Thanks in advance,
    Maria


  • #2
    Maria:

    You actually have two years of panel data, but you've put it in wide rather than long format. It is easy to switch from wide to long, and there are a number of threads here about it. That is not my comparative advantage.

    I assume that the treatment occurred between Jan and Aug -- hopefully. I would actually recommend starting off by ignoring that Y is binary and using a two-period fixed effects analysis. Because this is the same as differencing, you can actually do analysis without without making the data set long. Simple define

    gen cincome = Augincome - Janincome
    gen crisk = Augrisk - Janrisk
    reg cincome i.T crisk, vce(robust)

    If you want, you can include the time constant variables, but these would drop out of the first differencing.

    The differencing removes fixed effects at the district level, so also at the region level.

    You should not use region dummies (fixed effects) with probit when you only have a few observations per region. This creates the incidental parameters problem. I could make some suggestions for probit, but you seem to be a beginner. Thus, I would start with differencing following by OLS.

    JW

    Comment


    • #3
      Dear Jeff Wooldridge, thanks a lot for your reply.

      I will try to switch from wide to long.
      Yes, you are right. The treatment took place between Jan and Aug 17.

      My dataset is rather large (2000 individuals and 620 variables). On average, around 200 observations per region. Should I apply the probit instead of OLS then?
      Also, can I include variables that only exist for the second survey (not possible for differencing)?
      And finally, for clustering errors at the district level, is the code < vce(cluster district) >?


      Thanks in advance,
      Maria

      Comment


      • #4
        Maria:
        I can reply to your last question only:
        yes, your code for clustering standard errors on district is correct
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Maria: I discussed the problems with the phrase "fixed effects" when you are simply adding dummies not at the unit of observation in a recent thread:

          https://www.statalist.org/forums/for...tal-parameters

          To summarize, adding regional dummies when you have 200 observations per region is not a problem. make the data long format, run pooled probit with the region dummies, and cluster at the district level -- provided the treatment was assigned at that level, as appears to be the case.

          Hope this helps.

          JW

          Comment


          • #6
            Dear Jeff Wooldridge, I would like to kindly ask you for some help. I have made the data long format, and I also created new variables for differencing for the original dataset, to compare results.
            However, when I run a probit with the long format dataset, some coefficients are 0 and the standard errors empty or omitted.

            My long data looks similar to (see picture next post):


            producerID month region_code commune_code age sex farmsize agricplotsMay agricplotsDec agricplots adoptionMay adoptionDec adoption incomeDec
            3131 Dec Centre-Ouest Bakata 30 1 7.5 3 3 3 5 3 3 15000
            3131 May Centre-Ouest Bakata 30 1 7.5 3 3 3 5 3 5 15000
            3132 Dec Centre-Ouest Bakata 45 0 4 1 1 1 4 1 1 24000
            3132 May Centre-Ouest Bakata 45 0 4 1 1 1 4 1 4 24000
            3133 Dec Centre-Ouest Bakata 18 0 2 4 4 4 0 3 3 3000
            3133 May Centre-Ouest Bakata 18 0 2 4 4 4 0 3 0 3000
            3134 Dec Centre-Ouest Dassa 25 1 11.5 1 1 1 5 0 0 4000
            3134 May Centre-Ouest Dassa 25 1 11.5 1 1 1 5 0 5 4000
            3135 Dec Centre-Ouest Dassa 31 1 12 2 3 3 7 8 8 12000
            3135 May Centre-Ouest Dassa 31 1 12 2 3 2 7 8 7 12000
            5241 Dec Sud-Ouest Batie 53 0 5.5 3 0 0 2 2 2 20000
            5241 May Sud-Ouest Batie 53 0 5.5 3 0 3 2 2 2 20000
            5242 Dec Sud-Ouest Batie 50 0 3 4 4 4 0 2 2 24000
            5242 May Sud-Ouest Batie 50 0 3 4 4 4 0 2 0 24000
            5243 Dec Sud-Ouest Batie 23 0 1.5 5 7 7 4 1 1 6000
            5243 May Sud-Ouest Batie 23 0 1.5 5 7 5 4 1 4 6000




            The dataset in which I create differencing, is basically the same but with only one observation per producer, and with variables like: agricplotsDiff = agricplotsDec - agricplotsMay

            In both datasets, "treat" (treatment) equals 0, 1 or 2; and I created a dummy variable "adoptionDummy"= 1 if adoption >=0, as my dependent variable.

            My dependent variables, therefore, are "adoption", "adoptionDec" and adoptionDummy.

            I was trying to run this code:
            Code:
            xtset producerID May
            xtreg adoption i.treat age sex farmsize agricplots  incomeDec i.region_code, vce(cluster commune_code)
            But I am not sure if effects are fixed at the region level. Also, if I add "fe" at the end of this code, then the coefficients become 0.


            If I try [CODE]xtset region_code
            xtreg SLMPDummyDiff i.treat age sex farmsize agricplots incomeDec, vce(cluster commune_code) fe [\CODE]

            I get "panels are not nested within clusters" which makes sense because I have 5 regions and 32 communes, but then I don't know what should I do to fixed effects at the region level.



            If I try then this probit, the constant coefficients become 0 with omitted standard errors. I don't understand what's wrong.
            Code:
             probit adoptionDummy i.treat age sex farmsize agricplots  incomeDec i.region_code, vce(cluster commune_code)

            Regarding the dataset with differencing, is it correct to run this code:?
            Code:
            probit adoptionDummy i.treat age sex farmsize agricplotsDiff  incomeDec i.region_code, vce(cluster commune_code)

            And also, when I take the adoption in December as my dependent variable, am I right if I include the constant variables (e.g., sex, age) and the variables that I have for December (e.g. agricplotsDec, incomeDec) and not the variables that change over the months?


            As you can see, I am quite lost, I have never run a fixed effects myself, so I would appreciate a lot some help or guidance.
            I hope I have explained myself.
            Thanks in advance,
            Maria
            Last edited by Maria Domingo; 26 Apr 2020, 04:46.

            Comment


            • #7

              The data looks better here sorry:
              Attached Files
              Last edited by Maria Domingo; 26 Apr 2020, 04:44.

              Comment


              • #8
                You should

                xtset producerID month

                in which case the fixed effects are at the producer level. But the you forgot the fe option, so you did random effects.

                Comment


                • #9
                  Hello everyone,

                  Is it possible to run a fixed effects ordered logit model on a pseudo panel? What would be the stata command for a model like this?

                  My data consists of 10 countries, with 1500 individuals per country over 4 years. Different individuals are surveyed year by year for the same set of countries.

                  All my variables (dependent and independent) are ordinal on a scale of 1-5.


                  Comment


                  • #10
                    Here is how my data is set up
                    Individual country year Openness to FDI Safety Trust Education
                    1 1 2012
                    2 1 2012
                    3 1 2012
                    4,5,…,1500 1 2012
                    1 1 2014
                    2 1 2014
                    3 1 2014
                    4,5,…,1500 1 2014
                    1 1 2016
                    2 1 2016
                    3 1 2016
                    4,5,…,1500 1 2016
                    1 1 2018
                    2 1 2018
                    3 1 2018
                    4,5,…,1500 1 2018
                    1 2 2012
                    2 2 2012
                    3 2 2012
                    4,5,…,1500 2 2012

                    Comment


                    • #11
                      Hello every one,

                      Please I am new in stata, I am trying to merge a stata file and an excel file, then run a probit cluster fixed effect on the sample.
                      The stata file contains the survey response, while the excel file contains climate record based on the cluster.
                      for instance, the excel file is:
                      cluster tempreture precipitation
                      1 25.6 1.89
                      2 27 2.4
                      3 24.44 1.56
                      4 24.89 2.32
                      while the excel file is:
                      region cluster adoption age sex
                      urban 1 1 10 f
                      rural 1 0 15 m
                      urban 1 1 14 f
                      rural 1 0 5 f
                      urban 2 1 13 m
                      rural 2 1 6 m
                      urban 2 0 7 f
                      rural 3 1 13 f
                      urban 3 1 4 f
                      rural 3 0 7 m
                      urban 3 1 4 m
                      I imported the excel to stata and saved it as a stata file, then i tried merging the two files using the command,
                      use "C:\survey\A.dta", clear sort cluster joinby cluster using "C:\temp\B.dta", unmatched(both) sortby cluster: probit adoption age i.sex temperature precipitation I am getting error messages. Kindly guide me please. Thank you.

                      Comment

                      Working...
                      X