increase/ increase the number of observation

Steve Bereznai

Join Date: Jun 2022

Posts: 16
#1

increase/ increase the number of observation

30 Jun 2022, 03:37

Good morning!

I have a complicated question. I've been trying to solve it for two weeks.

I have a file with a lot of observations (43 543).

I have a task, therefore I do a regress. I do this regress:

//regress employed
regress employed GDP_10K life_expectancy unemployed_rate age_55_60 age_60_65 age_65_70 not_alone Tenure civil_servant ph003_ mh002_

est store reg1

//regress unemployed
regress unemployed GDP_10K life_expectancy unemployed_rate age_55_60 age_60_65 age_65_70 not_alone Tenure civil_servant ph003_ mh002_

est store reg2

//regress retired
regress retired GDP_10K life_expectancy unemployed_rate age_55_60 age_60_65 age_65_70 not_alone Tenure civil_servant ph003_ mh002_

est store reg3

esttab reg1 reg2 reg3, b se wide

The important sections are bold. The leftovers are only information from my dataset and the regress command. After the run, I got a table with the results and the number of observations. The number of observations is too low, therefore I have bad results.

The number of observations is too low. I have to increase the number of observations.

What do you usually do in this situation?
What should I do?
What should I check or change?

I tried to write my situation in detail. But if you need more information about my situation just ask me.

Thank you
Tags: enhance, increase, number, observation, raise
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

30 Jun 2022, 03:47

my guess (please read and follow the advice in the FAQ which has lots of advice both on how to ask questions and how to include info that will help people give you an answer) is that you have missing values on at least some of the variables in your regression; what you should do about depends on why there are missing values and what type of missingness you have
1 like
Comment
Steve Bereznai

Join Date: Jun 2022

Posts: 16
#3

30 Jun 2022, 08:59

Thank you for your reply.

I've tried to check missing values, but I don't have a missing value. Only in variable Tenure are missing values, but if I change the missing value to 0 I get too many observations (all 43 543).

Do you have any idea?

Thank you, you are kind!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

30 Jun 2022, 09:06

Steve:
Stata applies listwise deletion by default to observations with missing value(s) in one or more variables.
Hence, all the observations that have -Tenure-==. will be ruled out from -regress-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Steve Bereznai

Join Date: Jun 2022

Posts: 16
#5

30 Jun 2022, 09:22

That was helpful, thank you.

The task includes these 12 variables ( employed GDP_10K life_expectancy unemployed_rate age_55_60 age_60_65 age_65_70 not_alone Tenure civil_servant ph003_ mh002_). 11 variables do not contain a missing value. Only Tenure contains missing values. But if I replace the missing values in the Tenure variable with 0, I have too many observations (I know that from there because they have been told in advance how much observation I should have).

What can you suggest to me?

Thank you
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

30 Jun 2022, 09:31

Steve:
I fail to get what you mean by "too many observations".
If you actually replace missing values with zero, you should have a complete (although probably unreliable) dataset.
As per FAQ, an example would help enormously (see -dataex-). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Steve Bereznai

Join Date: Jun 2022
Posts: 16

30 Jun 2022, 10:07

My job is to already reproduce an existing research. That’s why I know I need 16,000 observations (or at least about 16,000).

My example:

employed	unemployed	retired	GDP_10K	life_expectancy	unemployed_rate	age_55_60	age_60_65	age_65_70	not_alone	Tenure	civil_servant	ph003_	mh002
1	0	0	3.15	70	3.00	1	0	0	0	.	0	1	1
0	1	0	2.00	75	3.50	0	1	0	1	35	1	0	1
0	1	0	1.10	60	5.50	1	0	0	0	41	0	0	0
0	0	1	2.10	88	2.20	0	0	1	1	.	1	1	0

0 means FALSE and 1 means TRUE so e.g.:
In the 1. row the observation is an employed -> who is between 55 and 60 years old -> who lives with sombody (not alone) -> about his/her tenure no information (missing value) -> wasn´t civil servant and so on..

Only in the variable Tenure are missing values. If there stay missing values the number of observations is 8 350. If I replace missing values to 0 the number of observations will be 43 543.

I cannot add more variables (because I reproduce an existing research so I cannot change them).

So my question:
What can you suggest to me? How can I increase the number of observation?

Thank you

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#8

30 Jun 2022, 12:05

Steve:
are you dealing with a panel or a cross-sectional dataset?

Kind regards,
Carlo
(Stata 19.0)
Comment
Steve Bereznai

Join Date: Jun 2022

Posts: 16
#9

01 Jul 2022, 01:23

Originally posted by Carlo Lazzaro View Post

Steve:
are you dealing with a panel or a cross-sectional dataset?

I have panel
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#10

01 Jul 2022, 02:35

Steve:
that's the reason why the number ob observations increases when you replace missing with zero.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement