Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No observations r(2000) with Regress and SEM

    Hello all. I am working with a dataset that contains many categorical variables with values that are still in written word format, which I am trying to convert into numbers. Initially when I typed "summarize" no observations would register for any of the variables, even though I could see the values when I manually scrolled through the dataset. When I tried to run PCA and Regression on some of the variables, these commands would obviously not work because the data were still in string format and needed to be converted to numbers. I fixed that issue by using: generate oldvar=real(oldvar) for all of the variables I was planning on including in my analyses. After running the real() function I was able to obtain summary statistics and observations for all of the variables i needed for my regression. So I thought my problem of the "no observations" error would be fixed when I ran a regression using the new numerical variables. This was not the case, however. I am able to run PCA on several of the variables successfully (Vo1-Vo10), but the regression model I ran still returned the "no observations" error again when I included the now numerical "Race, Age, Education, Ethnicity, and Income" variables. The same error arises when I try to run a SEM with some of the variables. Has anyone found a fix to this "no observations" error arising even after string variables have been converted to numeric? I'll copy and paste my code below:

    *Create race dummy variable with 1 as white, 0 as non white
    gen race=""
    replace race=Q101
    replace race="1" if Q101=="White"
    replace race="0" if Q101=="Black or African American"
    replace race="0" if Q101=="American Indian or Alaska Native"
    replace race="0" if Q101=="Asian"
    replace race="0" if Q101=="Native Hawaiian or Pacific Islander"
    replace race="0" if Q101=="Other"

    *Create ethnicity categorical variable
    gen ethnicity="."
    replace ethnicity=Q102
    replace ethnicity="1" if Q102=="Yes"
    replace ethnicity="0" if Q102=="No"

    *create income categorical variable
    gen income="."
    replace income=Q100
    replace income="1" if Q100=="Less than $19,999"
    replace income="2" if Q100=="$20,000 - $39,999"
    replace income="3" if Q100=="$40,000 - $59,999"
    replace income="4" if Q100=="$60,000 - $79,999"
    replace income="5" if Q100=="$80,000 - $99,999"
    replace income="6" if Q100=="$100,000 - $149,999"
    replace income="7" if Q100=="More than $150,000"

    *Create education categorical variable
    gen education="."
    replace education=Q99
    replace education="1" if Q99=="Less than high school"
    replace education="2" if Q99=="High school graduate"
    replace education="3" if Q99=="Some college"
    replace education="4" if Q99=="2 year degree"
    replace education="5" if Q99=="4 year degree"
    replace education="6" if Q99=="Professional degree"
    replace education="7" if Q99=="Doctorate"

    *Create age categorical variable
    gen age="."
    replace age=Q98
    replace age="1" if Q98=="16 - 24"
    replace age="2" if Q98=="25 - 34"
    replace age="3" if Q98=="35 - 44"
    replace age="4" if Q98=="45 - 54"
    replace age="5" if Q98=="55 - 64"
    replace age="6" if Q98=="65 - or older"

    *Code value oriententation questions to prep for PCA
    gen vo1="."
    replace vo1=DT
    replace vo1="1" if DT=="Strongly disagree"
    replace vo1="2" if DT=="Somewhat disagree"
    replace vo1="3" if DT=="Neither agree nor disagree"
    replace vo1="4" if DT=="Somewhat agree"
    replace vo1="5" if DT=="Strongly agree"

    gen vo2="."
    replace vo2=DU
    replace vo2="1" if DU=="Strongly disagree"
    replace vo2="2" if DU=="Somewhat disagree"
    replace vo2="3" if DU=="Neither agree nor disagree"
    replace vo2="4" if DU=="Somewhat agree"
    replace vo2="5" if DU=="Strongly agree"

    gen vo3="."
    replace vo3=DV
    replace vo3="1" if DV=="Strongly disagree"
    replace vo3="2" if DV=="Somewhat disagree"
    replace vo3="3" if DV=="Neither agree nor disagree"
    replace vo3="4" if DV=="Somewhat agree"
    replace vo3="5" if DV=="Strongly agree"

    gen vo4="."
    replace vo4=DW
    replace vo4="1" if DW=="Strongly disagree"
    replace vo4="2" if DW=="Somewhat disagree"
    replace vo4="3" if DW=="Neither agree nor disagree"
    replace vo4="4" if DW=="Somewhat agree"
    replace vo4="5" if DW=="Strongly agree"

    gen vo5="."
    replace vo5=DX
    replace vo5="1" if DX=="Strongly disagree"
    replace vo5="2" if DX=="Somewhat disagree"
    replace vo5="3" if DX=="Neither agree nor disagree"
    replace vo5="4" if DX=="Somewhat agree"
    replace vo5="5" if DX=="Strongly agree"

    gen vo6="."
    replace vo6=DY
    replace vo6="1" if DY=="Strongly disagree"
    replace vo6="2" if DY=="Somewhat disagree"
    replace vo6="3" if DY=="Neither agree nor disagree"
    replace vo6="4" if DY=="Somewhat agree"
    replace vo6="5" if DY=="Strongly agree"

    gen vo7="."
    replace vo7=DZ
    replace vo7="1" if DZ=="Strongly disagree"
    replace vo7="2" if DZ=="Somewhat disagree"
    replace vo7="3" if DZ=="Neither agree nor disagree"
    replace vo7="4" if DZ=="Somewhat agree"
    replace vo7="5" if DZ=="Strongly agree"

    gen vo8="."
    replace vo8=EA
    replace vo8="1" if EA=="Strongly disagree"
    replace vo8="2" if EA=="Somewhat disagree"
    replace vo8="3" if EA=="Neither agree nor disagree"
    replace vo8="4" if EA=="Somewhat agree"
    replace vo8="5" if EA=="Strongly agree"

    gen vo9="."
    replace vo9=EB
    replace vo9="1" if EB=="Strongly disagree"
    replace vo9="2" if EB=="Somewhat disagree"
    replace vo9="3" if EB=="Neither agree nor disagree"
    replace vo9="4" if EB=="Somewhat agree"
    replace vo9="5" if EB=="Strongly agree"

    gen vo10="."
    replace vo10=EC
    replace vo10="1" if EC=="Strongly disagree"
    replace vo10="2" if EC=="Somewhat disagree"
    replace vo10="3" if EC=="Neither agree nor disagree"
    replace vo10="4" if EC=="Somewhat agree"
    replace vo10="5" if EC=="Strongly agree"

    *Convert model variables from string to numeric
    generate Race=real(race)
    generate Age=real(age)
    generate Education=real(education)
    generate Ethnicity=real(ethnicity)
    generate Income=real(income)
    generate Vo1=real(vo1)
    generate Vo2=real(vo2)
    generate Vo3=real(vo3)
    generate Vo4=real(vo4)
    generate Vo5=real(vo5)
    generate Vo6=real(vo6)
    generate Vo7=real(vo7)
    generate Vo8=real(vo8)
    generate Vo9=real(vo9)
    generate Vo10=real(vo10)

  • #2
    Well, you need to look into missing values in your regression variables. Remember that any observation that contains a missing value for any variable mentioned in the regression command is omitted from the analysis. It doesn't take much missing data scattered around the data set to eliminate large numbers of observations in this way, or even all observations.

    We see posts here often with this problem: the user doesn't understand why a regression is throwing a "no observations" error. The solution is usually the above. Some additional possibliities suggest themselves here. If you used one of the string variables accidentally in the regression equation (forgetting perhaps to capitalize the first character of the name) then that variable, as a string, would be missing in all observations and could account for it. Also, remember that -real(x)- returns missing value if x cannot be converted into a numeric value without loss of information. Consequently, if a variable has a value that is mistyped (or has extraneous leading, lagging, or interior spaces, or some letter in the wrong case) and does not exactly match any of the expressions that appear on the right hand side of your -replace- commands, then those observations will have missing values and may contribute to the problem outlined in the first paragraph.

    In any case, you provide no example of your data, so a specific answer cannot be given; when the data are imaginary, the code fixes must be imaginary as well. If the above generalities do not enable you to resolve your problem, post back showing an excerpt of your data. Be sure to use the -dataex- command to do that. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      It also wouldn't surprise me if, say, you accidentally used race instead of Race or something like that. If you are prohibited from sharing the data at least show the commands and output.

      If a command like

      reg y x1 x2 x3

      bombs, I suggest copying it and replacing reg with sum. Then, if you are making a mistake with the variables, you'll see that mistake reflected in the sum results.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        One other thing to consider is that when using GSEM a capital letter indicates a latent variable, while regular variables in your dataset need to be named in lower case (at least the first letter). So it may be that you intended to use the numerically coded Race and Vo* variables, but when you typed them as race and vo*, Stata became confused and used the string versions of those variables instead.

        Comment


        • #5
          You should
          destring, force replace
          to force it to read numbers instead of strings.

          Comment


          • #6
            I strongly disagree with the advice in #5. The use of the -force- option with -destring- is dangerous. If -destring- objects to converting a string variable to numeric there is usually a good reason why. At a minimum, if -destring- refuses to run without the -force- option you should carefully scrutinize all the observations that -destring- finds problematic. You can easily identify them:

            Code:
            browse varname if real(varname) == . & !missing(varname)
            This will show you the observations that are giving -destring- reservations about proceeding. If all of them are things that properly could translate to a numeric missing value, for example, "N/A", then by all means try again with -force- specified. But you might very well turn up data errors, say something like "123.4.5" which might be a typo for either 123.45 or 1234.5. Such things should be investigated and fixed, not clobbered into oblivion with -force-.

            Never use -destring, force- until you have verified that everything it objects to is truly a synonym for missing value; otherwise you are mutilating your data and introducing error.

            Comment


            • #7
              Thank you all! Here is an excerpt from my data showing the variables before they have been turned into numbers using the real(x) command, and after:

              input str151 vo1 str21 age str41 race str42 income str40 education str24 ethnicity float(Vo1 Age Race Income Education Ethnicity)
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "3" "0" "4" "1" "0" . 3 0 4 1 0
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "5" "1" "4" "2" "0" . 5 1 4 2 0
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "2" "0" "4" "6" "0" . 2 0 4 6 0
              "" "3" "1" "3" "2" "1" . 3 1 3 2 1
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "4" "" "" "" "" "" 4 . . . . .
              "" "5" "1" "5" "6" "0" . 5 1 5 6 0
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "4" "" "" "" "" "" 4 . . . . .
              "" "2" "1" "3" "5" "0" . 2 1 3 5 0
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "5" "0" "2" "3" "0" . 5 0 2 3 0
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "5" "0" "7" "3" "0" . 5 0 7 3 0
              "" "2" "0" "6" "5" "0" . 2 0 6 5 0
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "2" "0" "1" "3" "0" . 2 0 1 3 0
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "4" "" "" "" "" "" 4 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "2" "0" "2" "3" "1" . 2 0 2 3 1
              "4" "" "" "" "" "" 4 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "4" "" "" "" "" "" 4 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "5" "1" "7" "5" "0" . 5 1 7 5 0
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "2" "1" "2" "3" "0" . 2 1 2 3 0
              "" "3" "0" "" "4" "0" . 3 0 . 4 0
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "1" "1" "7" "3" "0" . 1 1 7 3 0
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "4" "" "" "" "" "" 4 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "2" "0" "2" "6" "0" . 2 0 2 6 0
              "" "" "" "" "" "" . . . . . .
              "5" "" "" "" "" "" 5 . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              "" "" "" "" "" "" . . . . . .
              end

              Comment


              • #8
                Here are my outputs for "reg" and "sum" using the numeric versions of the variables

                reg Vo1 Age Race Income Education Ethnicity
                no observations



                . sum Vo1 Age Race Income Education Ethnicity

                Variable | Obs Mean Std. Dev. Min Max
                -------------+---------------------------------------------------------
                Vo1 | 703 4.84495 .4304963 1 5
                Age | 398 4.027638 1.280247 1 6
                Race | 391 .8337596 .3727733 0 1
                Income | 383 4.229765 1.837495 1 7
                Education | 396 3.878788 1.522973 1 7
                -------------+---------------------------------------------------------
                Ethnicity | 389 .066838 .2500629 0 1


                **the max for the age variable is 6 because we put age into 6 classes

                Comment


                • #9
                  The solution is what I suggested in the first paragraph back in #2.

                  Well, you need to look into missing values in your regression variables. Remember that any observation that contains a missing value for any variable mentioned in the regression command is omitted from the analysis. It doesn't take much missing data scattered around the data set to eliminate large numbers of observations in this way, or even all observations.
                  If you run
                  Code:
                  count if !missing(Vo1, Age, Race, Income, Education, Ethnicity)
                  the response from Stata (at least in the excerpt shown in #7) is 0. So the scattering of missing values in your data happens to leave you with at least one of these variables having a missing value in every observation.

                  Comment


                  • #10
                    Thank you Clyde! Would a possible solution be imputing data for missing values?

                    Comment


                    • #11
                      Not the question but your response is an ordered scale coded by integers 1 to 5. Does regress make sense at all? Does it make sense to treat income, age and education as numeric scales?

                      This may well be your first time doing this, and support from teachers and others may be essential here.
                      Last edited by Nick Cox; 03 Sep 2019, 11:16.

                      Comment


                      • #12
                        Hi Nick,

                        Thank you. regress, with the variables "Vo1 Age Race Income Education Ethnicity" returns the same output of no observations, unfortunately. I've narrowed it down to the "Vo1" variable being the one with all of the observations returning as missing. When I run a regression using other variables that I've fixed with real(x), I get results that look okay:

                        regress Income Age Race Education Ethnicity

                        Source | SS df MS Number of obs = 363
                        -------------+---------------------------------- F(4, 358) = 17.05
                        Model | 194.99841 4 48.7496025 Prob > F = 0.0000
                        Residual | 1023.68479 358 2.85945471 R-squared = 0.1600
                        -------------+---------------------------------- Adj R-squared = 0.1506
                        Total | 1218.6832 362 3.36652816 Root MSE = 1.691

                        ------------------------------------------------------------------------------
                        Income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        Age | .0521024 .0713505 0.73 0.466 -.0882163 .1924212
                        Race | .6599524 .2438153 2.71 0.007 .1804622 1.139443
                        Education | .4442883 .0592531 7.50 0.000 .3277603 .5608162
                        Ethnicity | -.6331451 .360429 -1.76 0.080 -1.341969 .0756792
                        _cons | 1.815065 .4138526 4.39 0.000 1.001178 2.628953


                        I've successfully run ordinal logistic regression and done SEM on a different data set, but I'm working with someone else's data here and its organized much differently than I would have done if it were my data, including the construction of the numeric scales for income, age, and education.
                        Last edited by Matty Cleary; 03 Sep 2019, 11:43.

                        Comment


                        • #13
                          Most of my confusion comes from the fact that when I call up an excerpt of these problem variables, it shows that there are observations there:

                          input float(Vo1 Vo2 Vo3 Vo4 Vo5)
                          . . . . .
                          5 5 4 5 5
                          . . . . .
                          . . . . .
                          5 5 5 5 3
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          5 4 3 3 5
                          5 4 5 4 4
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          5 5 5 5 5
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          4 4 4 4 4
                          . . . . .
                          5 5 5 5 5
                          . . . . .
                          5 5 5 1 3
                          . . . . .
                          . . . . .
                          . . . . .
                          5 4 4 4 4
                          . . . . .
                          . . . . .
                          4 4 3 4 2
                          . . . . .
                          5 5 3 3 3
                          5 5 5 5 1
                          . . . . .
                          . . . . .
                          . . . . .
                          5 2 4 2 4
                          5 5 5 5 2
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          5 5 5 5 5
                          5 5 5 5 5
                          . . . . .
                          . . . . .
                          5 5 3 5 3
                          5 5 5 5 2
                          . . . . .
                          . . . . .
                          5 4 5 5 5
                          5 5 5 5 5
                          . . . . .
                          4 5 3 4 3
                          5 5 4 4 4
                          . . . . .
                          5 5 5 5 1
                          5 5 5 5 2
                          5 5 3 3 3
                          . . . . .
                          4 4 4 2 3
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          5 5 5 4 3
                          5 5 5 5 1
                          4 4 4 4 3
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          5 5 5 5 3
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          5 5 5 5 4
                          4 4 3 1 3
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          . . . . .
                          5 5 4 5 2
                          . . . . .
                          . . . . .
                          . . . . .
                          end
                          [/CODE]


                          And I am able to run a regression using variables with a similar amount of missing values. I unfortunately don't have a good enough handle on how STATA thinks to understand why these variables (Vo1 - Vo10) don't function in a model when the sociodemographic variables that I also fixed using real(x) do work. Thank you all again for your help!

                          Comment


                          • #14
                            Would a possible solution be imputing data for missing values?
                            Missing data is a problem for which good solutions are seldom available. The ideal solution is to find the actual values, but if that were feasible it probably would have been done in the first place. Single imputation methodologies have fallen out of favor, for good reason. Multiple imputation may be a solution. But bear in mind that its ability to reduce bias relies on the assumption of missingness at random (MAR)--an assumption that is, by its very definition, untestable within the existing data. So you need to tell yourself a convincing story about the missingness mechanism and the relationships among the variables that supports the notion that MAR is credible. Can you do that in this situation?

                            Comment


                            • #15
                              Observations with missing values on all variables are no use to you here. Otherwise there is no mystery or deep idea about how Stata works here. Clyde Schechter explained in #9. Feed a bunch of variables to regress and only observations with non-missing values on all variables mentioned get included in the regression, and if there are none then no regression takes place.

                              Comment

                              Working...
                              X