Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata freezes in a multivariate command- what may be the problem?

    This is something weird. I was working with a dataset of 40,000 obs without any problems. My do file work fine. But then use a bigger version of this dataset with 10,000 obs more - the final dataset was appox. 50,000. Now when I try to run my do file, it work but only to a certain point. When I try to do my final regression (multivariate) of my do file, Stata freezes (keep thinking after many hours). This is my regression : logistic EverMammo RaceL1 RaceL2 AgeYears i.Educationrc2 Incomerc i.WhereLiverc Symptoms FirstDegree HadBreastCancerrc HealthInsurance5. This regression worked perfectly using the fist data set. I even run this do file, command by command using other computers thinking that the problem was my computer (memory, RAM) but it freezes with all computer. There must be a bug somewhere. I even imported the excel file again into Stata and do everything again but still freezes in the same command.

    Does anyone know what may be causing this? I need to update my number using the full dataset.

    Thank you,
    Marvin


  • #2
    What is the difference between the two datasets, other than the file size? Can you run the regression with the second dataset after dropping 10,000, 20,000 or 30,000 observations? You can also try to find out where exactly Stata stops with the help of trace:
    Code:
    set trace on

    Comment


    • #3
      Well.. hey are supposted to be the same databases.That is, the first one is data until Quarter 3 and then they send me the same dataset including Q4. Same variables, values, etc. However, it seems that the Q4 dataset has an issue. When I use the Q4 dataset and delete -lets say 50 percent of the sample, it still freezes. So I believe it is not the bumb er of observation the problem. all The do file works ok using the Q4 dataset, it only stops in the commands I mentione. I dont know what else to do.

      Comment


      • #4
        Marvin:
        did you rule out any sort of lack of convergence issues?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Have you tried cf?
          Code:
          help cf

          Comment


          • #6
            Can you specify how can I do that, Friedrich?

            What do you mean by convergence. The two datasets show have the same variable names, values, etc. and in fact they do because my do file works perfectly for the new (Q4) dataset. it just stop in that long regression.

            Is there a way that I can send you any of you guys my two datasets to see if my do file work in your computer?

            thank you very much!

            Comment


            • #7
              Marvin:
              by lack of convergence I mean something like logistic regession backs-up indefinitely. However, when this occurs, Stata repors -(back-up)- nearby the iteration; hence you should have noticed it.
              As Friedrich pointed out, -cf- entry in Stata 13.1 .pdf manual reports some useful examples about datasets comparison.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                I tried the cf command and I only got this:


                . use clean2014data_for_Analayis.dta

                . cf _all using 2014_Analysis_partial.dta, verbose
                master has 45987 obs., using 38584
                r(9);


                I guess the two datasets are identical on with different sample sizes?

                Comment


                • #9
                  Any ideas what can be the problem or how can I solve it?

                  Comment


                  • #10
                    In the absence of other information the diagnosis has to be

                    1. Your larger dataset is not just larger but also somehow more difficult to fit with the model you propose.

                    2. The way forward is to reconsider the model or whether the dataset is somehow problematic for your model.

                    Your variables were said to be

                    Code:
                    EverMammo RaceL1 RaceL2 AgeYears i.Educationrc2 Incomerc i.WhereLiverc Symptoms FirstDegree HadBreastCancerrc HealthInsurance5
                    So, simple questions:

                    How many categories are implied by i.Educationrc2 i.WhereLiverc?

                    Have you inspected the distributions of these variables? Are any category frequencies noticeably small?

                    Have you looked at scatter plots of these variables?

                    Have you tried simpler models? Which predictors cause the model fit to go slow?

                    Are you working with more statistically-minded colleagues to get advice?

                    Comment


                    • #11
                      Marvin - At this point it would be helpful for you to show us, from your Stata log, the logistic command and the output it produces prior to your terminating the process. See the FAQ linked to at the top of this page, specifically section 12, for instructions on using a CODE block when copying and pasting the output from the log to make it easily readable (like Friedrich did in posts #2 and #5 above).

                      Writing "it freezes" means different things to different readers in different circumstances. We need more detail on what the logistic command was told to do, and what it reported it had done, while it continued running.

                      Comment

                      Working...
                      X