Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I clean my data?

    Hi there, I need help with my data. After downloading my data from Google Forms, I imported it to Stata but my variables labels are showing as the questions. What can I do, please!

    Click image for larger version

Name:	Screenshot 2023-04-13 191014.png
Views:	1
Size:	81.4 KB
ID:	1709754

  • #2
    Hey Omotola. Glad to see you've finally joined Statalist. Please, give an example dataset of your data using the dataex command. Specifically, show us the output of
    Code:
    dataex, varlabel
    so that we may see your dataset exactly as you see it. There's an easy solution to this, but, you have to help us... help you.

    Comment


    • #3
      1) The variable label can be written over with another label. For example, assuming the desired label is "Respondent's gender":

      Code:
      label variable Whatgenderdoyouidentifyw "Respondent's gender"
      2) The very long variable name can be difficult to type later during the analysis. It can also be renamed to something shorter like "gender":

      Code:
      rename Whatgenderdoyouidentifyw gender
      3) What's more concerning is that it's a string variable with as many as 54 spaces. It means that some of the responses are much longer than "Female", as you can see there are totally 13 different "Unique values" in this variable. This means you'd have to run a tabulation using:

      Code:
      tabulate Whatgenderdoyouidentifyw
      and may have to revise this variable into something that is easier to analyze.

      To conclude, this is going to take a lot of time to investigate and clean. In future, questions with limited options like this would be better set up as a multiple choice in online questionnaires than an open-ended write-in question.

      Comment


      • #4
        Originally posted by Jared Greathouse View Post
        Hey Omotola. Glad to see you've finally joined Statalist. Please, give an example dataset of your data using the dataex command. Specifically, show us the output of
        Code:
        dataex, varlabel
        so that we may see your dataset exactly as you see it. There's an easy solution to this, but, you have to help us... help you.
        Thank you so much, for sharing this platform with me. I am excited I can get answers to my question!

        Comment


        • #5
          Originally posted by Ken Chui View Post
          1) The variable label can be written over with another label. For example, assuming the desired label is "Respondent's gender":

          Code:
          label variable Whatgenderdoyouidentifyw "Respondent's gender"
          2) The very long variable name can be difficult to type later during the analysis. It can also be renamed to something shorter like "gender":

          Code:
          rename Whatgenderdoyouidentifyw gender
          3) What's more concerning is that it's a string variable with as many as 54 spaces. It means that some of the responses are much longer than "Female", as you can see there are totally 13 different "Unique values" in this variable. This means you'd have to run a tabulation using:

          Code:
          tabulate Whatgenderdoyouidentifyw
          and may have to revise this variable into something that is easier to analyze.

          To conclude, this is going to take a lot of time to investigate and clean. In future, questions with limited options like this would be better set up as a multiple choice in online questionnaires than an open-ended write-in question.
          Thank you Ken, I will try this steps now.

          Comment


          • #6
            Originally posted by Jared Greathouse View Post
            Hey Omotola. Glad to see you've finally joined Statalist. Please, give an example dataset of your data using the dataex command. Specifically, show us the output of
            Code:
            dataex, varlabel
            so that we may see your dataset exactly as you see it. There's an easy solution to this, but, you have to help us... help you.
            Click image for larger version

Name:	Screenshot 2023-04-13 .png
Views:	1
Size:	46.7 KB
ID:	1709765

            Comment


            • #7
              Do what I wrote above, but include only 4 variables (dataex). Do not use screenshots, we can barely read them, and they don't help us help you

              Comment


              • #8
                Originally posted by Ken Chui View Post
                1) The variable label can be written over with another label. For example, assuming the desired label is "Respondent's gender":

                Code:
                label variable Whatgenderdoyouidentifyw "Respondent's gender"
                2) The very long variable name can be difficult to type later during the analysis. It can also be renamed to something shorter like "gender":

                Code:
                rename Whatgenderdoyouidentifyw gender
                3) What's more concerning is that it's a string variable with as many as 54 spaces. It means that some of the responses are much longer than "Female", as you can see there are totally 13 different "Unique values" in this variable. This means you'd have to run a tabulation using:

                Code:
                tabulate Whatgenderdoyouidentifyw
                and may have to revise this variable into something that is easier to analyze.

                To conclude, this is going to take a lot of time to investigate and clean. In future, questions with limited options like this would be better set up as a multiple choice in online questionnaires than an open-ended write-in question.
                This solved it, I really appreciate Ken. The instruction was that there must be an open-ended question and a contingent question in the survey.

                Comment


                • #9
                  Originally posted by Jared Greathouse View Post
                  Do what I wrote above, but include only 4 variables (dataex). Do not use screenshots, we can barely read them, and they don't help us help you
                  I used the code again, and the error message was that the r(1000) input statement exceeded the line size limit. Try specifying fewer variables.

                  Comment


                  • #10
                    I am using the regress Craving gender but I got the below error. I apologize for attaching the screenshot.What do I need to do
                    Click image for larger version

Name:	Screenshot 2023-04-13 230314m.png
Views:	1
Size:	23.0 KB
ID:	1709772

                    Comment


                    • #11
                      Show us the result of

                      Code:
                      describe
                      and

                      Code:
                      dataex Cravings gender

                      Comment


                      • #12
                        Originally posted by Omotola Oladapo View Post
                        I am using the regress Craving gender but I got the below error. I apologize for attaching the screenshot.What do I need to do [ATTACH=CONFIG]n1709772[/ATTACH]
                        It is probably because the gender variable is still in string format (aka, it's entered as characters rather than being numerically coded). As I said before, string variables are often not accepted in many Stata commands. Regression (regress) is one of them. It will not accept string dependent or string independent variable. It will simply say "no observation."

                        My post in #3 is perhaps not explicit enough. In order to "clean" this data for it to make it ready for analysis, most of the variables will need to be turned into numerical form.

                        Comment


                        • #13
                          Originally posted by Ken Chui View Post

                          It is probably because the gender variable is still in string format (aka, it's entered as characters rather than being numerically coded). As I said before, string variables are often not accepted in many Stata commands. Regression (regress) is one of them. It will not accept string dependent or string independent variable. It will simply say "no observation."

                          My post in #3 is perhaps not explicit enough. In order to "clean" this data for it to make it ready for analysis, most of the variables will need to be turned into numerical form.
                          Thank you, Ken, I am new to data cleaning. Can you please explain how I can change the characters to numbers?

                          Comment


                          • #14
                            Originally posted by Andrew Musau View Post
                            Show us the result of

                            Code:
                            describe
                            and

                            Code:
                            dataex Cravings gender
                            Andrew, this is the result of datax

                            Click image for larger version

Name:	Screenshot 2023-04-15 000817.png
Views:	1
Size:	100.5 KB
ID:	1709886

                            Comment


                            • #15
                              I found online that I can use the encode command, [encode gender, generate(gender2)], and did this for all the variables. I was able to do the regress command. Thank you all for your support!

                              Comment

                              Working...
                              X