cross sectional, world bank entreprise survey, logistic regression with fixed effect

Abdramane Cherif

Join Date: Apr 2021

Posts: 23
#1

cross sectional, world bank entreprise survey, logistic regression with fixed effect

22 Apr 2021, 07:49

hello, i need your help. I want to estimate a simple logit on cross-sectional data. I am using the World Bank Business Survey. My sample consists of firms (i = 1 to 12540) from several countries (k = 1 to 27). Moreover, the firms are observed at a point without the possibility of repetition in the observations. for example, the survey was carried out in Ghana in 2013, Keyna in 2010, Central African Republic in 2011, Ethiopia in 2011, Cameroon in 2016, Chad in 2018, Gambia in 2018 ... etc. My dependent variable is a binary variable. I also want to control the indistrie, country and year fixed effects and Model cluster standard errors by country. I would like to know the command to use. since I can't use xtlogit.

my equation is:
Credit acess 𝑖,𝑘= 𝛽0 + 𝛽1×𝐶orruption 𝑖,𝑘 + 𝛽2×demographic charact.𝑖,𝑘 + 𝛽3×firm charact.𝑖,𝑘+ + fe (𝐶𝑜𝑢𝑛𝑡𝑟𝑦,𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦,𝑦𝑒𝑎𝑟)+ epsilon

Your suggestions are very much appreciated.

Thanks
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#2

22 Apr 2021, 08:16

I can't use xtlogit.

I do not see why you cannot use xtlogit. As your observations are at the firm level, you can condition out the country or industry fixed effects (whichever has more levels) and add year dummies and dummies for either country or industry (with 12540 observations, there will be enough observations within each year and either each country or industry).

Code:

xtset country xtlogit credit_access ... i.industry i.year, fe

The equivalent of the above using clogit is

Code:

clogit credit_access ... i.industry i.year, group(country)
Comment
Abdramane Cherif

Join Date: Apr 2021

Posts: 23
#3

22 Apr 2021, 09:33

Thanks you Andrew Musau .
Comment
Abdramane Cherif

Join Date: Apr 2021

Posts: 23
#4

22 Apr 2021, 09:36

I can't use xtlogit. Because Since the observations have not been repeated over time, I think it is not a panel.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#5

22 Apr 2021, 10:08

You do not have panel data, but you can use xtlogit or clogit as firms are nested in countries/ industries. Did you try what was suggested in #2 and it failed? In the xtset command line, you just declare a panel identifier without a time variable. This should work fine.
Comment
Abdramane Cherif

Join Date: Apr 2021

Posts: 23
#6

22 Apr 2021, 11:22

when I launch the command, here's what I get:

note: multiple positive outcomes within groups encountered.
note: 2018.a14ya omitted because of no within-group variance.

Iteration 0: log likelihood = -3284.7132 (not concave)
Iteration 1: log likelihood = -3277.6572 (not concave)
Iteration 2: log likelihood = -3276.9745 (not concave)
Iteration 3: log likelihood = -3276.8441 (not concave)
Iteration 4: log likelihood = -3276.8424 (not concave)
Iteration 5: log likelihood = -3276.8417 (not concave)
Iteration 6: log likelihood = -3276.8417 (not concave)
Iteration 7: log likelihood = -3276.8417 (not concave)
Iteration 8: log likelihood = -3276.8417 (not concave)
Iteration 9: log likelihood = -3276.8417 (not concave)
Iteration 10: log likelihood = -3276.8417 (not concave)
Iteration 11: log likelihood = -3276.8417 (not concave)
Iteration 12: log likelihood = -3276.8417 (not concave)
Iteration 13: log likelihood = -3276.8417 (not concave)
Iteration 14: log likelihood = -3276.8417 (not concave)
Iteration 15: log likelihood = -3276.8417 (not concave)
Iteration 16: log likelihood = -3276.8417 (not concave)
Iteration 17: log likelihood = -3276.8417 (not concave)
Iteration 18: log likelihood = -3276.8417 (not concave)
Iteration 19: log likelihood = -3276.8417 (not concave)
Iteration 20: log likelihood = -3276.8417 (not concave)
Iteration 21: log likelihood = -3276.8417 (not concave)
Iteration 22: log likelihood = -3276.8417 (not concave)
Iteration 23: log likelihood = -3276.8417 (not concave)
Iteration 24: log likelihood = -3276.8417 (not concave)
Iteration 25: log likelihood = -3276.8417 (not concave)
Iteration 26: log likelihood = -3276.8417 (not concave)
Iteration 27: log likelihood = -3276.8417 (not concave)
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#7

22 Apr 2021, 12:02

Convergence problems are common in maximum likelihood estimations. #2 of the following thread provides some pointers on what you can do:
https://www.statalist.org/forums/for...elogit-command
Comment
Abdramane Cherif

Join Date: Apr 2021

Posts: 23
#8

23 Apr 2021, 10:58

Thanks you.
Comment

Abdramane Cherif

Join Date: Apr 2021
Posts: 23

23 Apr 2021, 11:01

I would like to know how to have similar results

Country FE YES		YES YES			YES
Industry FE	YES		YES	YES		YES
Year FE	YES		YES	YES		YES
Cluster country	YES		YES	YES		YES
Method	LOGIT		LOGIT	LOGIT		LOGIT

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10254
#10

23 Apr 2021, 12:15

The following will do it:

Code:

clogit credit_access ... i.industry i.year, group(country) cluster(country)

where you have conditional country fixed effects and unconditional industry and year fixed effects.
Comment
Abdramane Cherif

Join Date: Apr 2021

Posts: 23
#11

25 Apr 2021, 08:17

Hello, I tried the code and got a result. my question will seem stupid to you but hey I ask it as well. how to make "Yes" appear.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10254

#12

25 Apr 2021, 11:04

Code:

ssc install estout, replace

Example:

Code:

webuse grunfeld, clear
gen industry=cond(inlist(company, 1,2,3), 1, cond(inlist(company, 4,5,6), 2, 3))
set seed 04252021
gen country=runiformint(1,5)
bys company (year): replace country=country[1]
gen outcome=runiformint(0,1)
*country indicators (i.country) will be dropped. Just needed for the output to indicate country FE
clogit outcome mvalue kstock i.industry i.year i.country, group(country)
esttab, indicate("Country FE=*.country" "Year FE=*.year" "Industry FE=*.industry")

Res.:

Code:

 
. esttab, indicate("Country FE=*.country" "Year FE=*.year" "Industry FE=*.industry")

----------------------------
                      (1)   
                  outcome   
----------------------------
outcome                     
mvalue          -0.000346   
                  (-0.85)   

kstock           0.000116   
                   (0.14)   

Country FE            Yes   

Year FE               Yes   

Industry FE           Yes   
----------------------------
N                     200   
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Comment

Ibai Ostolozaga Falcon

Join Date: May 2021

Posts: 36
#13

24 Sep 2021, 04:47

Hello Andrew Musau . I have a very similar problem.

I have the same database (from the WBES). I want to run a logit regression because my dependent variable is binary one at enterprise level. This variable is "fin11" and indicates if one company has collateral or not. My main explanatory variable is a variable at country level "n_outcome" which is a binary variable that takes value 1 if a country has some institution 0 otherwise. I added controls at firm level (like firm age or size) and other controls at country level too (Like GDP normalized at US value). I would like to add FE at country level, so my regression would be like this:

fin11𝑖,𝑘= 𝛽₀ + X𝑖,𝑘 + Z𝑘 FE(country level) + e𝑖,𝑘

Where X is a vector of enterprise variables and Z a vector of country variables.

This is an extract from my database

* Example generated by -dataex-. To install: ssc install dataex
clear
input str26 country_01 double firm_id float(fin11 age k11_sales GDP_GDPUSA) long n_outcome
"Argentina2017" 622481 1 20 .002667291 .4556047 0
"Argentina2017" 622318 1 20 .08750443 .4556047 0
"Argentina2017" 623036 . 8 . .4556047 0
"Argentina2017" 623248 . 20 . .4556047 0
"Argentina2017" 622448 1 24 .05040256 .4556047 0
"Argentina2017" 622727 . 17 . .4556047 0
"Armenia2020" 708831 . 10 . .2943979 0
"Armenia2020" 708699 . 21 . .2943979 0
"Armenia2020" 709055 . 5 . .2943979 0
"Armenia2020" 708729 . 21 . .2943979 0
"Armenia2020" 708889 1 21 . .2943979 0
"Armenia2020" 708856 . 16 . .2943979 0
"Armenia2020" 708909 . 3 . .2943979 0
"Armenia2020" 708827 . 6 . .2943979 0
"Armenia2020" 709153 . 6 . .2943979 0
"Armenia2020" 708958 . 2 . .2943979 0
"Armenia2020" 708676 . 21 . .2943979 0
end

However, if I use the codes that you post in #2, I obtain the next message:

1,617 (group size) take 1,369 (# positives) combinations results in numeric overflow; computations cannot proceed.

Furthermore, I do not know if make sense to use FE at country level when I am using control variables at country level.

Any advise?

I hope you can help me,

Regards,
Ibai

Last edited by Ibai Ostolozaga Falcon; 24 Sep 2021, 04:51.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#14

24 Sep 2021, 06:51

If you have 30+ observations per country, just use logit and include country dummies (i.country). The numerical overflow problem is a limitation of clogit.
Comment

Announcement

cross sectional, world bank entreprise survey, logistic regression with fixed effect

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment