Categorize a survey data

Hossam Ali

Join Date: Dec 2019

Posts: 13
#1

Categorize a survey data

13 Dec 2019, 07:41

Hello Everyone
I'm working on large #survey data and I Should separate the data in #two_categories
1st categorie( within ppl in private sector paid or self employed or in family company and working in an unregistered firm or have no contract)

The challenge I'm having many variables to select from
V1 ➡ ppl working in ( Government /private sector/ Ngo / Self / family / other )
V2 ➡ contract ( written/ oral / no contract )
V3 ➡ registration ( yes / under / no / doesn't apply / idk )
And so on ...

What are the commands that i can select this characteristics and combine it in another variable within the state ( I'll be very grateful for any help )

I do apologize i had to write the whole problem I'm a beginner and I can't vocalize what i need to do

Last edited by Hossam Ali; 13 Dec 2019, 07:56.
Tags: categorical, panel data, survey
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

13 Dec 2019, 12:14

It is impossible to give you concrete help when you do not provide an example of your data. Please post back with that, using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

If you do that, I am confident you will get a timely and helpful reply. The solution involves the user of Stata's logical operators (-help operator-) but the details of the syntax depend on details of the data.
1 like
Comment
Hossam Ali

Join Date: Dec 2019

Posts: 13
#3

15 Dec 2019, 03:45

Thank you so much for your reply, I have taken your advice and downloaded dataex because I'm running Stata 14.
I don't know which part should i share to make it easier for ppl to understand me and help " I'm absolute beginner taught myself through YouTube and I'm in middle of project
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

15 Dec 2019, 09:38

I think if you just run

Code:

dataex V1 V2 V3 in 1/10

and post the output from that, it will show the information needed to solve this particular problem. What is needed her is to see some representative values of the three variables V1, V2, and V3 that play a role in your problem along with their metadata. This will do that.
1 like
Comment
JeongHoon Min

Join Date: Jun 2019

Posts: 38
#5

15 Dec 2019, 11:49

I think just -if- qualifier and -replace- command will do what you need. (note: -if- command and -if- qualifier are not the same. Search google and you will find an official FAQ document)

I assume that you enumerated the response categories in ascending sort in terms of values, e.g. in case of V1, Government==1, private==2, self==4 and so on. In that case, your description of the 1st category might be translated into as follows:

Code:

gen cat=0 replace cat=1 if (( V1==2 | V1==4 | V1==5 ) & ( V2==3 | V3==3 )) // select people who answered 2 or 4 or 5 in V1 and chose 3 in either V2 or V3

(I'm not sure how you classify firms with under registration, you might want to add V3==2 in the second part of the condition. & means AND operator, | means OR operator.)

I haven't test the code so you should check the result.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#6

15 Dec 2019, 13:48

JeongHoon Min Your solution may or may not work. You don't know if V1, V2, and V3 are numeric variables, and if they are, you don't know what numbers correspond to the categories Hossam Ali is interested in. Moreover, expressions like (V1 == 2 | V1 == 4 | V 1 == 5) can usually be simplified to (inlist(V1, 2, 4, 5)) leading to more readable code.

Finally, there is no need to -gen cat = 0- and then -replace cat = 1 if whatever-. It is simpler to just to -gen cat = whatever-.
1 like
Comment
JeongHoon Min

Join Date: Jun 2019

Posts: 38
#7

15 Dec 2019, 14:38

Clyde Schechter Well, I wrote <I assume> part because of the exactly same reason you mentioned, i.e., the fact that I do not know about how V1, V2, V3 were coded. And I generated cat with value 0 because usually no one wants missing values. In this case Hossam Ali wants a dummy variable, so it is clear that all observations not satisfying the condition should be assigned to one category/value other than 1(the category/value for observations satisfying the condition). Thank you for letting me know inlist(), I have never used that function before.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#8

15 Dec 2019, 14:57

And I generated cat with value 0 because usually no one wants missing values.

You are absolutely correct that nobody wants missing values. The code -gen cat = whatever- does not generate missing values. It generates 1 when whatever is true and 0 when it is false.
Comment
JeongHoon Min

Join Date: Jun 2019

Posts: 38
#9

15 Dec 2019, 15:10

Clyde Schechter Oh I misread the command in your comment(I thought that it was -gen cat=1 if whatever-). Thank you for informing me.
Comment

Hossam Ali

Join Date: Dec 2019
Posts: 13

#10

21 Dec 2019, 11:57

I have applied it @Clyde Schechter
dataex a806 E09 register_actv in 1/10

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(a806 E09 register_actv)
 . .  .
99 3  3
 . .  .
 . .  .
 . .  .
 . .  .
 . .  .
 . .  .
 2 2 98
 . .  .
end
label values a806 employmentype
label def employmentype 2 "Private Sector", modify
label def employmentype 99 "OTHER", modify
label values E09 contract
label def contract 2 "oral", modify
label def contract 3 "noAgreement", modify
label values register_actv contractR
label def contractR 3 "Not Registered", modify
label def contractR 98 "IDK", modify

------------------ copy up to and including the previous line ------------------

Comment

Hossam Ali

Join Date: Dec 2019

Posts: 13
#11

21 Dec 2019, 11:59

@JeongHoon Min thank you I have applied similiar code with the same methods of modifier but results doesn't match the published paper I'm following
Comment
Hossam Ali

Join Date: Dec 2019

Posts: 13
#12

21 Dec 2019, 12:06

dataex a806 E09 register_actv in 1/10

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(a806 E09 register_actv) . . . 99 3 3 . . . . . . . . . . . . . . . . . . 2 2 98 . . . end

------------------ copy up to and including the previous line ------------------
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#13

21 Dec 2019, 12:14

The example data you posted does not contain any observations meeting the criterion "
within ppl in private sector paid or self employed or in family company and working in an unregistered firm or have no contract" you specified. Which means that the examp;le does not provide the corresponding numeric values of the variables. Please post back with example data that includes some observations that you want to identify, as well as some that you do not.

In the end, the code you want will look something like

Code:

gen byte wanted = (V1 == ???? & V2 == 3 & V3 == ????)

but the numbers needed to replace the ????s above are not shown in your data example.
Comment

Announcement

Categorize a survey data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment