Insufficient observations after regression

Fabio Bordinon

Join Date: Mar 2023

Posts: 7
#1

Insufficient observations after regression

12 Jun 2023, 01:19

Good Morning,

I have 2 datasets, one showing immigration flows in Italy (aggregated) and the other one showing different firm-level data.
I merged the 2 datasets m:m (because it is the only way stata allows me to do, although i would prefer to merge 1:m, which results in an error "the variables annoril and areag4 do not uniquely identify observations in the master data").However, after merging m:m, when I try to regress (clustering only for regional area) I get another error: "panels are not nested within the cluster". To solve that I used the command "nonest" at the end of the regression code, which makes the regression work but the results are insignificant (i believe because of those previous procedures). I tried to look for other ways to solve like reporting duplicates and dropping them, which result in an elimination of the whole dataset. I also tried to use the commands vce(jackknife) and vce(bootstrap), that allow me to run the regression but produces even more insignificant results. I also tried to collapse the firm-level dataset (by region and year) which allows me to merge 1:m but then I need to drop some duplicates and when I regress I do not have enough data (error "Insufficient observations").

Do you have any idea on how it should work?
Attached Files

dofile prova dopo meeting.do (2.5 KB, 1 view)
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17736
#2

12 Jun 2023, 07:44

Fabio:
as you know, without an example of your dataset that you can easily provide via -dataex- it is difficult to reply positively.

Kind regards,
Carlo
(Stata 19.0)
Comment
Fabio Bordinon

Join Date: Mar 2023

Posts: 7
#3

12 Jun 2023, 12:23

Dear Carlo,
I do not why, but Stata is producing only a dataex for 1 year and one specific region.
However in the attachment you can find the two datasets.

While this is the code i used for the merge:
merge 1:m annoril areag4 using "/Users/fabiobordignon/Desktop/UNI/thesis/data thesis/dataset/data immigration clean.dta", keep (match) nogen

It results in the error: "the variables annoril and areag4 do not uniquely identify observations in the master data".

Hope it can help.
Thank you in advance.
Attached Files

test_bankitaly_clean.dta (1.13 MB, 1 view)

data immigration clean.dta (12.4 KB, 1 view)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17736
#4

12 Jun 2023, 23:34

Fabio:
what if you -destring- (or -encode-; please see the cautionary tale about this command in its entry in Stata .pdf manual) the following variables: wstatus; citizen; sex; age?

Kind regards,
Carlo
(Stata 19.0)
Comment
Fabio Bordinon

Join Date: Mar 2023

Posts: 7
#5

12 Jun 2023, 23:51

Dear Carlo, I tried to destring the variables you indicated, but the result doesnt change:

destring wstatus, gen(wstat)
wstatus: all characters numeric; wstat generated as byte

. drop wstatus

. destring citizen, gen(citiz)
citizen: contains nonnumeric characters; no generate

. destring sex, gen(gender)
sex: contains nonnumeric characters; no generate

. destring age, gen (age_)
age: contains nonnumeric characters; no generate

. save "/Users/fabiobordignon/Desktop/UNI/thesis/data thesis/dataset/data immigration clean.dta", replace
file /Users/fabiobordignon/Desktop/UNI/thesis/data thesis/dataset/data immigration clean.dta saved

. merge 1:m annoril areag4 using "/Users/fabiobordignon/Desktop/UNI/thesis/data thesis/dataset/test_bankitaly_clean.dta", keep (match) nogen
variables annoril areag4 do not uniquely identify observations in the master data
r(459);

I also tried to merge starting with the other dataset:
merge 1:m annoril areag4 using "/Users/fabiobordignon/Desktop/UNI/thesis/data thesis/dataset/data immigration clean.dta", keep (match) nogen
variables annoril areag4 do not uniquely identify observations in the master data

The only way i have found is to merge m:m by collapsing the dataset "test_bankitaly_test" by region and year, but it then creates issues with nesting the clustered variables.

Regards,
Fabio
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#6

12 Jun 2023, 23:57

Fabio Bordinon it might help to post the original CSV files used in your do-file (just the relevant columns, preferably), and tell us some basics about the structure of the data (what each observation in each data represents, and which variables should uniquely identify an observation)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17736
#7

13 Jun 2023, 01:07

Fabio:
1) -string- variables that canot be -destring- should be converted in numeric format via -encode- paying the necessary attention;
2) -m:m- is something that should be read and forgotten;
3) you may want to create more useful key-variables via the -group- function avaiable from -egen-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Fabio Bordinon

Join Date: Mar 2023

Posts: 7
#8

13 Jun 2023, 12:28

Carlo Lazzaro also by encoding the variables, the results of the merge is the same error unfortunately
Comment
Fabio Bordinon

Join Date: Mar 2023

Posts: 7
#9

13 Jun 2023, 12:35

Hemanshu Kumar statalist cannot upload the file for some reason, however you can find the different datasets at this link:

https://www.bancaditalia.it/statisti...e-industriali/

The dataset "testinvid.csv" is a dataset provided by bank of italy which provides data about italian firms (panel data survey). The variables which I would like to keep are "areag4" which i s the geographic area, "annoril" which is the year of the observation, "ident" which is the identifiation code of the firm and "c105" which is the revenue in year t for each firm, plus other control variables.

https://data.europa.eu/data/datasets...kraq?locale=en

The dataset "eurostat total .csv" is a dataset which show the population of each european country by nuts 2 classification. the variables of interests here are "wstatus" which is considered only if = employed or unemployed, "citizen" considering only foreign people "EU27" and "nonEU27". also here i want to consider the variable "geo" which is the geographic area and "year". Also the main explanatory variable should be "observations_value" which indicates the number of people for eaach type of citizenship.

Hope it can help and thank you!
Comment

Announcement

Insufficient observations after regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment