Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • codebook and continuous variables

    Hi there,

    A seemingly simple thing that I can't find an answer on!

    Of several variables in my dataset, I have no_children (number of children), age (continuous) and time (lived in Sweden, continuous). When I run codebook for all three variables, age and time are presented as continuous (with mean and percentiles given) but no_children is presented as a frequency table. I would stata to recognize this as a continuous variable. All these variables appear to be similarly constructed. I went back to SPSS, and made sure that they were all scale variables. I worry that when I build the model, that no_children will be used a categorical variable instead of a continuous variable.

    Can anyone help?

    Thanks so much!

    M

  • #2
    The treatment of a variable as categorical or continuous in the codebook command is dependent only on the value specified for the tabulate() option as discussed in the output of
    Code:
    help codebook
    and this has no bearing at all on how the variable is treated in your analysis.

    For an understanding of how Stata differentiates between treating a variable as continuous and categorical in modeling commands, see the output of
    Code:
    help factor variables

    Comment


    • #3
      The help file for -codebook- will introduce you to the -tabulate(#)- option which enables you to control this. If, as I assume, you did not specify anything for it, the default is 9, meaning that any variable with 9 or fewer distinct values is treated by -codebook- as a discrete variable. Apparently this is true of the no_children variable--which does not surprise me. Setting a smaller value for this option will resolve your problem.

      As for your worry, this behavior of -codebook- is, almost unique to that command.* For running regressions, you need to read -help fvvarlist- to know the rules for discrete vs continuous. All variables in the regression are assumed continuous unless prefixed with i., except for variables in interaction terms, which are assumed discrete unless prefixed with c. It is a good idea to prefix all right-hand side variables in a regression with c. or i. accordingly, to avoid errors and make the command explicit.

      * I believe there is some other command somewhere in Stata that is used to create lists of variables that has some default criterion for distinguishing discrete and continuous and relies on the number of distinct values to do that. But at the moment I can't remember what that command is.

      Added: Crossed with #2.

      Added later: The command I couldn't remember before is -vl-, which is actually a suite of commands, designed for use with -lasso-. And it, too, has an option that lets you specify a threshold for the number of values at or below which a variable is considered categorical.
      Last edited by Clyde Schechter; 29 Jul 2022, 18:52.

      Comment

      Working...
      X