Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorize a survey data

    Hello Everyone
    I'm working on large #survey data and I Should separate the data in #two_categories
    1st categorie( within ppl in private sector paid or self employed or in family company and working in an unregistered firm or have no contract)

    The challenge I'm having many variables to select from
    V1 ➡ ppl working in ( Government /private sector/ Ngo / Self / family / other )
    V2 ➡ contract ( written/ oral / no contract )
    V3 ➡ registration ( yes / under / no / doesn't apply / idk )
    And so on ...

    What are the commands that i can select this characteristics and combine it in another variable within the state ( I'll be very grateful for any help )

    I do apologize i had to write the whole problem I'm a beginner and I can't vocalize what i need to do
    Last edited by Hossam Ali; 13 Dec 2019, 07:56.

  • #2
    It is impossible to give you concrete help when you do not provide an example of your data. Please post back with that, using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    If you do that, I am confident you will get a timely and helpful reply. The solution involves the user of Stata's logical operators (-help operator-) but the details of the syntax depend on details of the data.

    Comment


    • #3
      Thank you so much for your reply, I have taken your advice and downloaded dataex because I'm running Stata 14.
      I don't know which part should i share to make it easier for ppl to understand me and help " I'm absolute ​​​beginner taught myself through YouTube and I'm in middle of project

      Comment


      • #4
        I think if you just run
        Code:
        dataex V1 V2 V3 in 1/10
        and post the output from that, it will show the information needed to solve this particular problem. What is needed her is to see some representative values of the three variables V1, V2, and V3 that play a role in your problem along with their metadata. This will do that.

        Comment


        • #5
          I think just -if- qualifier and -replace- command will do what you need. (note: -if- command and -if- qualifier are not the same. Search google and you will find an official FAQ document)

          I assume that you enumerated the response categories in ascending sort in terms of values, e.g. in case of V1, Government==1, private==2, self==4 and so on. In that case, your description of the 1st category might be translated into as follows:
          Code:
          gen cat=0
          replace cat=1 if (( V1==2 | V1==4 | V1==5 ) & ( V2==3 | V3==3 )) // select people who answered 2 or 4 or 5 in V1 and chose 3 in either V2 or V3
          (I'm not sure how you classify firms with under registration, you might want to add V3==2 in the second part of the condition. & means AND operator, | means OR operator.)

          I haven't test the code so you should check the result.

          Comment


          • #6
            JeongHoon Min Your solution may or may not work. You don't know if V1, V2, and V3 are numeric variables, and if they are, you don't know what numbers correspond to the categories Hossam Ali is interested in. Moreover, expressions like (V1 == 2 | V1 == 4 | V 1 == 5) can usually be simplified to (inlist(V1, 2, 4, 5)) leading to more readable code.

            Finally, there is no need to -gen cat = 0- and then -replace cat = 1 if whatever-. It is simpler to just to -gen cat = whatever-.

            Comment


            • #7
              Clyde Schechter Well, I wrote <I assume> part because of the exactly same reason you mentioned, i.e., the fact that I do not know about how V1, V2, V3 were coded. And I generated cat with value 0 because usually no one wants missing values. In this case Hossam Ali wants a dummy variable, so it is clear that all observations not satisfying the condition should be assigned to one category/value other than 1(the category/value for observations satisfying the condition). Thank you for letting me know inlist(), I have never used that function before.

              Comment


              • #8
                And I generated cat with value 0 because usually no one wants missing values.
                You are absolutely correct that nobody wants missing values. The code -gen cat = whatever- does not generate missing values. It generates 1 when whatever is true and 0 when it is false.

                Comment


                • #9
                  Clyde Schechter Oh I misread the command in your comment(I thought that it was -gen cat=1 if whatever-). Thank you for informing me.

                  Comment


                  • #10
                    I have applied it @Clyde Schechter
                    ​​​​​​​dataex a806 E09 register_actv in 1/10

                    ----------------------- copy starting from the next line -----------------------
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input byte(a806 E09 register_actv)
                     . .  .
                    99 3  3
                     . .  .
                     . .  .
                     . .  .
                     . .  .
                     . .  .
                     . .  .
                     2 2 98
                     . .  .
                    end
                    label values a806 employmentype
                    label def employmentype 2 "Private Sector", modify
                    label def employmentype 99 "OTHER", modify
                    label values E09 contract
                    label def contract 2 "oral", modify
                    label def contract 3 "noAgreement", modify
                    label values register_actv contractR
                    label def contractR 3 "Not Registered", modify
                    label def contractR 98 "IDK", modify
                    ------------------ copy up to and including the previous line ------------------

                    Comment


                    • #11
                      @JeongHoon Min thank you I have applied similiar code with the same methods of modifier but results doesn't match the published paper I'm following

                      Comment


                      • #12
                        dataex a806 E09 register_actv in 1/10

                        ----------------------- copy starting from the next line -----------------------
                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input byte(a806 E09 register_actv)
                         . .  .
                        99 3  3
                         . .  .
                         . .  .
                         . .  .
                         . .  .
                         . .  .
                         . .  .
                         2 2 98
                         . .  .
                        end
                        ------------------ copy up to and including the previous line ------------------

                        Comment


                        • #13
                          The example data you posted does not contain any observations meeting the criterion "
                          within ppl in private sector paid or self employed or in family company and working in an unregistered firm or have no contract" you specified. Which means that the examp;le does not provide the corresponding numeric values of the variables. Please post back with example data that includes some observations that you want to identify, as well as some that you do not.

                          In the end, the code you want will look something like

                          Code:
                          gen byte wanted = (V1 == ???? & V2 == 3 & V3 == ????)
                          but the numbers needed to replace the ????s above are not shown in your data example.

                          Comment

                          Working...
                          X