Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help Generating a Feeling Thermometer

    Hi all,

    I have some experience with STATA but I am by far not an expert. I have a set of eight variables (from a larger dataset that I did not code) five of which are from one survey question asking people to assess different aspects of the US government and its response to hurricane Maria. The responses to these five variables are separated into five categories from excellent (coded 1) to poor (coded 5). The other three are formatted differently but still asking about the government's response, two have the response options Better, Worse or about the same, the other is dichotomous.

    What I was hoping I would be able to figure out on my own was how to create a single feeling thermometer using these responses as one is not included in this dataset. Ideally the thermometer would descend from the top score for all eight variables where an individual had responded positively to all the questions. For the second category it would include the positive responses from the three differently formatted questions, the middle category would include the ambivalent response option only etc. Those that have answered in varied ways so as not to lose them would then be in between their associated categories. I realize this would not be a feeling thermometer as done traditionally but more of like an assigned score that is being curved where someone scoring 1's across the board have the most affinity for the government working its way down.

    Now I know I could use egen to create a variable from the rowtotals or its other options but unless I'm mistaken there is no way to create a thermometer from this. I was also unable to find a post about this in the form. To compare as people answered one way across all the questions but that would require me to reduce the variability in the responses from the first five categories which I don't want to do. I am trying to make a scaled set of responses in order to latter use a linear regression model to predict the likelihood of a statehood vote in a future referendum (yes I am aware of the proposed bill from US congress just the other day, times change but submitted abstracts are forever).

    So using the dataex command for these variables I get the following output. I am unfortunately unsure where to start as I have realized that this is over my head and outside of what I have been taught. If this is far to difficult too answer please let me know so I can adjust my plans accordingly.

    Variables q16a-e were the first five mentioned above q17 is the dichotomous variable and q18 and q19 are the three option questions. The count here is only ten since the varname coding is so long but I can always provide more observations. I apologize in advance if I haven't articulated my question very well.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(q16a q16b q16c q16d q16e q17 q18 q19)
    6 6 3 3 4 2 3 1
    5 5 5 5 5 1 3 3
    4 4 4 4 5 2 2 1
    4 5 5 4 5 2 2 1
    2 4 2 3 2 1 2 1
    4 4 4 3 5 2 2 1
    3 3 3 3 5 1 2 1
    1 1 1 1 4 2 3 1
    5 5 5 5 5 1 2 1
    1 4 3 4 4 2 2 1
    end
    label values q16a q16a
    label def q16a 1 "Excellent", modify
    label def q16a 2 "Very good", modify
    label def q16a 3 "Good", modify
    label def q16a 4 "Fair", modify
    label def q16a 5 "Poor", modify
    label def q16a 6 "Don't know", modify
    label values q16b q16b
    label def q16b 1 "Excellent", modify
    label def q16b 3 "Good", modify
    label def q16b 4 "Fair", modify
    label def q16b 5 "Poor", modify
    label def q16b 6 "Don't know", modify
    label values q16c q16c
    label def q16c 1 "Excellent", modify
    label def q16c 2 "Very good", modify
    label def q16c 3 "Good", modify
    label def q16c 4 "Fair", modify
    label def q16c 5 "Poor", modify
    label values q16d q16d
    label def q16d 1 "Excellent", modify
    label def q16d 3 "Good", modify
    label def q16d 4 "Fair", modify
    label def q16d 5 "Poor", modify
    label values q16e q16e
    label def q16e 2 "Very good", modify
    label def q16e 4 "Fair", modify
    label def q16e 5 "Poor", modify
    label values q17 q17
    label def q17 1 "Priority", modify
    label def q17 2 "Not a priority", modify
    label values q18 q18
    label def q18 2 "Worse", modify
    label def q18 3 "About the same", modify
    label values q19 q19
    label def q19 1 "Better", modify
    label def q19 3 "About the same", modify
    label var q16a "16a. How would you rate the job The federal government has done in responding to" 
    label var q16b "16b. How would you rate the job President Trump has done in responding to Hurric" 
    label var q16c "16c. How would you rate the job The Puerto Rican government has done in respondi" 
    label var q16d "16d. How would you rate the job Governor Rossello has done in responding to Hurr" 
    label var q16e "16e. How would you rate the job Your municipal government and mayor has done in " 
    label var q17 "17. Do you think the rebuilding of Puerto Rico is a priority for the U.S. federa" 
    label var q18 "18. In your opinion, was the federal government's response to Hurricane Maria in" 
    label var q19 "19. In your opinion, do you think the federal government's response to Hurricane"

    .

  • #2
    Welcome to the Stata Forum / Statalist.

    Thanks for sharing data under code delimiters and for using - dataex - for that matter. This shows you have read the FAQ and acted accordingly.

    That being said, if I understood right, it seems you wish to create a "new" scale. This mean creation plus validation, and you must have some expertise to delve into such issues.

    Also, if I understood right, you don't have the variable which reflects this scale, but you want it to be a reflection taken from the questions.

    Being this so, you may think about using structural equation models (SEM), particularly the generalized approach (GSEM).

    If you want to learn more about this, please type - help gsem - in the command. There is also a whole (free!) Stata manual on this topic.

    Hopefully that helps.
    Best regards,

    Marcos

    Comment


    • #3
      Hi Marcos,

      Thank you for the welcome. Yes you have understood correctly what I am looking to do.

      GSEM seems to be the approach I need I have a (hopefully) quick question about the syntax for the GSEM model. From the help page/mannual I have gathered than the generalized syntax is
      Code:
       gsem (x1<-X) (x2<-X) (x3<-X) (x4<-X)
      and there is some obvious variability depending on the model required.

      My question is: for my data what would be X? If x1 is equivalent to my variable q16a and x2 is equivalent to q16b etc, then I am not sure what variable would constitute X. Or am I misunderstanding some and in the syntax X is just X? I realize this is probably quite an elementary question but the GSEM and SEM commands are new to me so i want to be sure I am understanding correctly.

      Thank you for your response I really appreciate it!

      Thank you,

      Harrison

      Comment


      • #4
        X is the latent variable. Things can go fast, but up to a reasonable point. Please read the manual. It would be nice to take a look at some tutorials. GSEM is not a model to be grasped in a matter of minutes. Be brave and the reward will be great.
        Best regards,

        Marcos

        Comment


        • #5
          Thank you for your response!

          Perhaps I should have be a touch more specific with my question. I know that I need to run the gsem with the mlogit option, if my latent variable is categorical and not continuous or binary. I however do not have a latent variable to use. All the variables are categorical and observed in the data.

          So I guess to rephrase my question: since I do not have a variable which represents the scale I am trying to make, which would be the latent variable, and since this scale does not exist within the dataset as per the qualities of the latent variable then how exactly do I proceed?

          I am assuming I need to create the latent variable which means generating a variable which represents the scale that is relative to the questions which is what I am already trying to do. Hence my confusion as to for my data and my end goal what the latent variable is for me.


          Thanks for your time,

          Harrison

          Comment


          • #6
            As I understand it, latent variables are what application of your model is trying to uncover (discover?). They aren't provided or specified by you.

            As broad strategic advice from someone who has never used them, I'd not expect to implement SEMs myself without undertaking first about one week's reading of some texts and papers. Much depends in practice on how much statistics you know. Also, SEMs don't seem the right tool if theory is weak.

            I want to go back to #1. It strikes me that you want to mush together data that are kept separate. You have some 5-point, some 3-point, some 2-point variables. There aren't clean and clear ways to combine those without some specific rationale. If you are going to combine them, which I fear won't be helpful, then at least rescale each first to (value - minimum) / range.

            That's all pretty negative but I can't see in #1 a precise statistical goal: "a feeling thermometer" isn't something I have heard of. What would you do with it if you got it -- a simple but possibly also hard question?
            Last edited by Nick Cox; 01 Apr 2019, 06:54.

            Comment


            • #7
              By my understanding, a feelings thermometer is usually one question on an issue. It's a visual analog scale where a respondent indicates their position on the scale.

              The question asks about developing a synthetic feelings thermometer or similar scale from 8 questions (5 ordinal, 3 binary). I assume the questions deal with the US government's response to the damage to Puerto Rico from Hurricane Maria.

              Elaborating on Marcos' recommendation, one logical thing to do might be to use item response theory, specifically irt hybrid. That command actually calls on gsem to fit the model. Responding to Harrison's question in post 3, you would be assuming that there's one latent variable that causes responses to the 8 questions. (NB: the gsem syntax as typed in post 3 would treat each question as if it were Gaussian, which is probably not what you would want to do.) I'd caution, as Nick did, that IRT requires some background reading.

              Furthermore, if you want to use that latent variable in a further regression (i.e. as if you were fitting a linear regression to that latent variable), you would have to do the syntax in gsem, and it can take a long time to maximize. Some sample syntax might look like:

              Code:
              gsem (Satisfaction -> q16?, ologit) (Satisfaction -> q17 q18 q19, logit) (Satisfaction <- x1 x2 x3)
              Now, you just estimate the latent variable with IRT, predict its value, then take the predicted values and throw them into a linear regression, e.g.

              Code:
              gsem (Satisfaction -> q16?, ologit) (Satisfaction -> q17 q18 q19, logit)
              predict satisfaction, latent
              regress satisfaction x1 x2 x3
              This, however, would ignore the fact that you're not certain about each person's value of satisfaction. It might not go too far wrong, but you might want to look up how IRT has been used in political science to get a sense if this is an acceptable practice. It's certainly not best practice.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                Hi Nick,

                Feeling thermometers are used in Political Science survey research to address how 'warmly' or 'coolly' respondent feel toward a group or organization. They have been used by the American National Election Survey (previously the NES) since at least the 60's as well as other survey houses like PEW. When it is asked in a survey it asks respondents to rate a group on a scale from 0-100 where 0 is cold and 100 is hot like a thermometer.

                The goal of making one using the variables mentioned above is to create a dependent variable for regression analysis. I want to observe the independent effect of different variables from my dataset have on this constructed thermometer. Constructing this thermometer would reduce the amount of regression analysis that would need to be done to assess how respondents feel toward the federal government as a whole. Now I may be wrong about this but I have made the assumption that if I use multiple variables to construct it I will end up with a more robust thermometer.

                I appreciate your point about doing more research on the function of SEM's, it is something I plan on doing in the coming week. I suppose my understanding of latent variables is becoming confused in that regard because in the -help gsem_command- window (not the pdf which I have also been reading) all the variables are given for the examples. From what I can tell there is no latent variable in the following example. The pdf however gives the syntax in terms of X's and Y's which is where I found the general syntax given in #3 hence my confusion.

                Code:
                help gsem_command
                webuse gsem_lbw
                gsem (low <- age lwt i.race smoke ptl ht ui), logit
                The way I perhaps naively imagined I would be able to do is to take the responses to all eight variables and set them to a 0-100 scale. At the top (100) of the scale would be an individual that has given all affirmative responses, for all three variable types this is coded as 1. At the bottom of the scale would be an individual that has answered negatively to all eight questions. The value of setting it to a 0-100 scale is the overlap multiple individuals could score a 100 or a 73 etc. likewise some scores may not be taken.

                Thanks for your time,

                Harrison

                Comment


                • #9
                  I suppose my understanding of latent variables is becoming confused in that regard because in the -help gsem_command- window (not the pdf which I have also been reading) all the variables are given for the examples. From what I can tell there is no latent variable in the following example. The pdf however gives the syntax in terms of X's and Y's which is where I found the general syntax given in #3 hence my confusion.

                  Sorry, but after taking a (close) look at the - help gsem - command, and after carefully reading (not thumbing through) the Stata Manual, the concept of latent variable should have been grasped.

                  Indeed, I wish to corroborate the advice given in #4:

                  GSEM is not a model to be grasped in a matter of minutes. Be brave and the reward will be great.
                  Best regards,

                  Marcos

                  Comment


                  • #10
                    Hello Everyone,

                    Weiwen: Thank you for your response, and I apologize I did not see your response before typing my own response explaining feeling thermometers, I must not have refreshed my browser before posting. I will start reading about the IRT Hybrid approach, I had not made it that far in the SEM reference manual yet. I will also make sure that it is an acceptable way to analyze data for political science purposes.

                    Since Satisfaction here is capitalized, I'm assuming that this is the latent variable, since this is not defined in my dataset is it just something you can enter into the command and in the process of the command it will define it? I do know satisfaction levels for different aspects of the federal government as well as the federal government itself these are the 5 ordinal variables. Variable q16a asks respondents to rate the federal government's response to hurricane Maria where 1 is excellent, 2 very good, 3 good, 4 fair, 5 poor. Perhaps I have wrongly assumed that adding more variables would make the thermometer more robust at the expense at any and all ease.

                    Marcos: Contrary to your belief that I am just thumbing through everything, I am reading with the intent to understand this command. I have a far less advanced background in statistics than you and many of the others on this form, it is quite confusing. Like Nick said it will take a week at least and I do not expect to nor have I claimed to understand this concept once. The reason I pointed out what I seem to understand as a discrepancy from the -help gsem_comand- example and the manual is because all of the variables in the given example are observed. None of them are capitalized to indicate that they are latent. The manual points out that a latent variable:

                    "...is not observed. A variable is latent if it is not in your dataset but you wish it were. You wish you had a variable recording the propensity to commit violent crime, or socioeconomic status, or happiness, or true ability, or even income. Sometimes, latent variables are imagined variants of real variables, variables that are somehow better, such as being measured without error. At the other end of the spectrum are latent variables that are not even conceptually measurable." Pg. 50 It continues to point out that these variables are typed with a capital letter so STATA can recognize latent from observed variables for this command language.

                    When I asked about the basic syntax I did not understand since X is a part of the very generalized syntax I gave in #3, to run the command then doesn't it need to exist in some way in my dataset even as just a defined variable name? Or is the command calculating and thus defining the latent variable? Perhaps I have missed this in the first 60 or so pages of the manual and I am sorry.

                    I really do appreciate everyone's responses thus far, and I apologize if I do not meet the standard intellectual level for posing to this form.

                    Thanks for your time,

                    Harrison Angelini

                    Comment


                    • #11
                      Harrison,

                      Latent variables are things that we can't directly measure. For example, you can directly measure someone's blood pressure, or their height or weight. In contrast, in your situation, I'm assuming you are trying to get a sense of how people feel about the US government's response to Hurricane Maria. You asked them a number of questions to try to infer that attitude. That attitude is a latent variable.

                      When you run a Stata command (note: it's not STATA) using latent variables, the program assumes that anything whose first letter is capitalized is a latent variable (you can change this default if you need).

                      In the case of IRT, you normally assume that the latent variable is normally distributed, with a mean of 0 and a variance of 1. This means that whatever is mean 0 is the average feeling towards the government's response. Actually, my syntax should have been:

                      Code:
                      gsem (Satisfaction -> q16?, ologit) (Satisfaction -> q17 q18 q19, logit) (Satisfaction <- x1 x2 x3), variance(Satisfaction@1)
                      Mean of 0 is implied by default.

                      You don't need to meet any sort of "standard intellectual level" to post on the forum. However, this is a relatively complex sort of model, and the syntax for gsem is a bit complicated. I'd use the irt hybrid command if you go this route.
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment

                      Working...
                      X