Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I put together a tripple-loop (foreach) command, with changes in (I.) Model, (II.) Indep.Vari., (III.) Dep. Var.?

    The way I imagined this working best is running the models in the outer loop, IV’s in the middle, and the DV in the inner loop, to be specific:

    Outer Loop: The three models I want to test are Pooled OLS, Fixed Effects GLS, and Random Effects GLS.
    Middle Loop: I want to test three independent variables (with the names IUI, IUI2, PUI) in combination with a list of control variable
    Inner Loop: A set of six different dependant variables.

    I tried to code the three loops, but I get various errors. I think, I have figured the outer and middle loop out.

    foreach reg in "reg" "xtreg" "xtreg,fe" {
    foreach regressor in "IUI" "IUI2" "PUI" {
    ?
    }
    }

    So here's the question: How to (exactly) write the commands, I know I could also put IUI, IUI2 and PUI together into a local macro, but that wouldn't change the underlying loops.
    Btw. I hope to find a correlation between Internet Use Intensity and Happiness in China

    Thanks!

  • #2
    I think the following will do this:

    Code:
    foreach dv of varlist dv1-dv6 {    // LOOP OVER OUTCOMES
        foreach iv of varlist iul iul2 pul { // LOOP OVER PREDICTORS
            foreach cmd in xtreg reg { // LOOP OVER ANALYSES
                if "`cmd'" == "xtreg" {
                    local options re fe
                    foreach o of local options {
                        `cmd' `dv' `iv', `o'
                    }
                }
                else {
                    `cmd' `dv' `iv'
                }
            }
        }
    }
    Note: as no sample data was shown, this code is not tested. Beware of typos, unbalanced braces, quotes, etc. Also note, that the commands written do not include any variables other than those mentioned in the loop. It is clear how to add those in.

    The alternation between -reg-, -xtreg-, and -xtreg, fe- is not straightforward. The code is actually simplified by being explicit about -re-, as opposed to just letting -xtreg- default to random effects. It isn't possible to make the code for -reg- completely parallel to that of -xtreg- because there is no option analogous to -re- or -fe- that can be tacked on at the end. And setting local options to the null string won't do it, because then -reg- would be skipped altogether.

    All of that said, this doesn't seem like a good analysis plan. If your data is panel data, you shouldn't be using -regress- unless you have already run -xtreg- and found that the variance component for the fixed or random effects is, for practical purposes, zero. That is a pretty uncommon condition. So it would seem to me more sensible to just do the -xtreg- analyses, and then, if any of them show sigma_u near enough to zero, go back and run it as -regress- if you wish. Similarly, the choice between -xtreg, re-, and -xtreg, fe- should either be made a priori from modeling considerations, or decided by a Hausman test. If you wish to do the latter, it isn't hard to build the Hausman test into the loop. But just running a lot of different analyses with a lot of different variables, sounds a lot like an exercise in p-hacking.

    Comment


    • #3
      Thanks s lot for the code, I will test it asap (i.e. test whether I described my problem accurately enough).

      The truth is I had no understanding of econometrics and stata and the lot until 6 weeks ago. I am teaching myself with books and youtube, so I hope you forgive the naivety

      I was told to use those three models and compare the results and choosing the most significant one. In my opinion, I shouldn't be doing it this way either, but there we are.
      The core issue is that my data is not identically distributed. The survey data covers 2011, 2012, 2013, and 2015 but they asked completely different people every time. Now, unfortunately some of the ID numbers were used multiple times across the years, even though the individuals are not the same. If I assign new ID numbers (with the command generate seqnum=_n) the FE and RE models obviously don't work anymore. When I told this to my professor, he asked me to run the models with the old ID numbers anyway and use those results.
      Since he ignored me pointing this out and he is the one grading this eventually, I gave up. But I would love to find a correct solution for this.
      So Clyde, Mr Schechter if you're still there, you're help would be much appreciated!

      Kai

      Comment


      • #4
        Well, I don't want to take sides in a dispute without hearing both sides. (Actually, it's not my business to intervene in other people's affairs even if I do hear both sides.) So I'm just giving you advice about what I perceive to be better statistical practice, and I base that advice on what you tell me about the problem, taking it as a correct description (unless it is obviously not possible or seems very strange.)

        If I understand your data correctly, you do not actually have panel data. You have three waves of a survey and there are different people in each wave of the survey. (I would imagine that, by chance alone, there may be a small number of people who appear in more than one wave, but that this would be a negligible amount.) If this is what has happened, then there is no reason to use -xt- analyses at all. The data should be analyzed with ordinary least squares regression. You probably should include i.wave (or i.year or whatever variable indicates which wave a response comes from) among the regressors if it is likely that the outcomes are, in part, time-dependent.

        If you re-assign ID numbers within waves, so that now there is a "person 1" in each wave, and a "person 2" in each wave, etc., but those really aren't the same people from wave to wave, then you are basically assigning people into haphazard groups. If the ordering of the participants in the data within each wave is random, then you are grouping people together at random. In that case, we would expect that in an -xtreg- analysis, whether FE or RE, (mis-)applied to this data, the sigma_u and rho should be close to zero, and the results will be similar to what you get from -regress-. If that is not the case, then it suggests that there may be something informative about the ordering of the participants within waves in your data set--and you might want to carefully read the documentation of the original data set to understand what that might be. It also could just happen by chance, but that would be uncommon, a true Type I error as it were.

        As you so rightly note, to run multiple models and then select the one with the smallest p-value, or just selecting those with p < 0.05 as "the results" is, to put it most charitably, poor statistical practice, and in the view of some it is scientific misconduct and research fraud.

        I take it you are a student and probably not in a position to push back strongly at what you are told to do, other than perhaps through persistent questioning. It is good, however, that you are also seeking independent sources of knowledge and are able to recognize when you are being given bad advice. (No hubris here--I make mistakes, too. But because this is a public forum, others can jump in to point out the errors and then we all learn from them.) The important thing is to not adopt bad statistical habits just because they are prevalent in your environment and being pushed on you by those in authority. Your status as a student is temporary, and you will eventually function autonomously and in collaboration with others who respect your judgments. When that time comes, remember what is right and what isn't, and do what is right. And if you become responsible for teaching others, don't push them to do things that are widely done but known to be wrong.



        Comment


        • #5
          Thanks a lot once again!! So, first of all, yes I don't have panel data, I should've said so from the start, but rather pooled cross-sectional data. And yes, they are different people every year with the off chance of individuals being asked more than once over the years (they claim to have adopted a multistage, stratified, probability proportional to size (PPS)).

          For the second part, I did not re-assign the population when I ran the FE and RE models. So in year 2011 and 2012 the native ID numbers start with 1 and go up to 6+k and 9+k, respectively, there are some large gaps so the ID numbers 400 to 600 were not used. In 2013 and 2015 the same numbering was used, but starting with 5000. All in all, when I ran the FE/RE models stata tells me that out of the 31,686 observation, it somehow made 20,812 groups. So in other words, a little more than 20,000 interviewees share an ID-number with at least one other interviewee of a different year. This was such a blatantly incorrect method of comparing variations of the year, that I assumed I misunderstood the way FE/RE are supposed to be used.

          So, I obviously don't want to use fraudulent and incorrect methods, and I completely agree that assigning people with a set of IDs that only serve to have a random measure of comparison, will lead to useless results, to say the least. I guess I will follow my first intuition and your advice, to only adopt an OLS model and try to convince my Professor that this is fine.

          Thank you for sacrificing your time! I am indeed a student and at my university is not overly helpful. Setting high standards without giving basic methodology courses is jut one of the issues. But I guess I didn't have to choose to use econometric models either and as such make it more difficult than necessary. However, I think I'm onto something and I am really enjoying broadening my understanding and as long as there are people like you willing to help, I am confident that this will work out, regardless of what university I am at.

          Comment

          Working...
          X