Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicting out-of-sample

    Dear Statalist

    I am somewhat new to Stata and in the midst of my bachelor's thesis in political science. I have a dataset with 23 countries (eg observations) and only a few variables.

    I have an index score-variable for 23 countries and wish to predict values for another index (measuring the same thing, the goal here is comparison), using the regression equation between my index and the other index. The other index only has values for 11 countries.

    So what I wish to do is to use the relationship between the 11 country-observation's index scores that my index and the other index has in common to predict the values for the remaining 12 countries in the other index, that do not have values for the other index.

    Am I making myself understandable and can someone explain to me what to do? :-)

    I am using the predict command after running a regression "reg my_index other_index" and am told by Stata that is has predicted 12 values.

    But when I look at the new variable with the predicted values, it has only predicted for the 11 countries which I already have values for?

    I hope someone can help this stata-newbie!

    Best regards from Copenhagen, Denmark

  • #2
    Dear Astrid
    I think it would be a good idea to show us some of the code you have written along with some of the data you have, so we can try to reproduce what you have done. It will help to identify the problem.
    Best
    Christophe

    Comment


    • #3
      Dear Christophe
      Thank you for your reply!
      I have uploaded a screenshot of my data. Taxrate is self-explanatory, cntry is country, decom is my index of decommodification, welcha is welfare chauvinism (the idea that immigrants shouldn't have access to social benefits) and ea_decom and sa_decom are the indexes I wish to predict values for. As you can see, they have missing values for 12 countries.

      I have also uploaded a picture of the new variable with predicted values, ea_decom1.

      I have really only written:
      "reg decom ea_decom if ea_decom!=.
      predict ea_decom1"

      I specify "if" in the regression to only make the regression based on countries with values for both indexes. My index serves as a dependent variable here and the ea-index serves as independent.

      I hope this was more useful, let me know otherwise The solution may be really simple but I have only worked with Stata at my 2nd year for the methods courses, and not for true independent analysis before. So I'm much appreciative of any help.

      Comment


      • #4
        If decom is the index for which you have complete data, and ea_decom the index with missing values, then you have run the wrong regression. You want to predict values of ea_decom using values of decom, so your code should have been
        Code:
        reg ea_decom decom
        predict ea_decom1
        Note that I left off the if clause because reg will automatically drop from its calculations any observations for which any of the dependent or independent variables are missing.

        For future reference, please read the FAQ linked to at the top of this page, especially section 12 instructions on posting data, code, and results. Screenshots are not recommended.

        Comment


        • #5
          Wow it really was as simple a solution as I feared. Thank you so much, I feel a bit silly now, but at least my analysis can move forward.

          And I will definitely read the FAQ before and if I post another time. Thank you :-)

          Comment

          Working...
          X