Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transformation data into log

    hi
    i run xtreg with fixed model , my panel data unbalanced.
    1) i want transform my variables to log but my variables contain dummy variables and interaction variable and continues variables. so i ask if should transform all variable or non normal variables and if the answer all variables should transform into log , what i can do with dummy variable leave it without transformation .
    2)i have 3 independend variables persistence , overfirm , and interaction betwee persistence * overfirm. should standardize the persistence and overfirm beforn taken the interaction between them to avoid multicollinearity.
    3) i have 3 independent variables and 13 control variables. my independent variable too many value and my dependent contain small and large vaule are this problem or not?

    thank you
    3)

  • #2
    Welcome to the Stata Forum / Statalist.

    I fear I didn't understand your message. The text was not clear enough to me,

    That said, you seem to wish to logtransform "all" the variables, albeit no clear reason for that matter was given. I mean, the fact the distribution of the predictor is not normal won't necessarily prompt you to logtransform it. That being said, one thing is for sure: do not change anything with regards to the binary variables. After all, what would be the reason for that?

    To end, please take some time to read the FAQ. There you will find instructions on how to share data/command/output. You may wish to use - dataex - for that matter.

    P.S.: again with regards to the dummies, I assume you understand what happens if you try to log transform variables with 1 and 0 values. But if you have any doubt about that, please type:

    Code:
    display ln(1)
    display ln(0)
    Last edited by Marcos Almeida; 14 Jan 2018, 15:08.
    Best regards,

    Marcos

    Comment


    • #3
      i want transform my variables to logtransform but some variables is dummy , can transform my variables to logtransform without dummy variables . i read when transform variables to logtransform it must make the transformation to all the variables in linear regression

      Comment


      • #4
        when i run xtreg and i wish using log transformation to overcome some problem that related to the assumption of linear regression can transforn some variables not all variables

        Comment


        • #5
          i read when transform variables to logtransform it must make the transformation to all the variables in linear regression
          I don't know where you read that, but that is wrong.

          You certainly cannot log-transform indicator ("dummy") variables, because they take on a zero value, and log(0) is undefined.

          But even with regards to the continuous variables, each one should be considered separately and a decision made to transform or not. You should also have a clear reason for doing any transformations.

          2)i have 3 independend variables persistence , overfirm , and interaction betwee persistence * overfirm. should standardize the persistence and overfirm beforn taken the interaction between them to avoid multicollinearity.
          No, this is generally not necessary. When you calculate an interaction term, you often get some degree of multicolinearity among them, but usually it is not large enough to be a problem. If it is a problem, I would recommend you center but not standardize the variables involved. Centering will largely eliminate the multicolinearity issue and it preserves their units of measurement, so that marginal effects calculated will still be intelligible. If you standardize the variables, you convert them to having arbitrary and meaningless units, and when you then talk about an "effect" nobody can tell what it means because nobody knows what the units of the variables are.

          3) i have 3 independent variables and 13 control variables. my independent variable too many value and my dependent contain small and large vaule are this problem or not?
          What do you mean by "too many value?" There is no limit on the number of values a variable can take on. I think perhaps it would help if you showed some examples of the large and small values of these variables. Or perhaps you could show the output of the -summarize- command applied to the variables you are worried about.

          Comment


          • #6
            thank you clyde for reply
            I have 3 predictor that should be logged and 2 indicator variable that is (1,0)(1,0). The latter (a) cannot be logged (b) should not be logged. (Indeed any transformation of an indicator variable to any two distinct values has no important effect.)

            So, can i transform some predictors to log and the rest left as is when using xtreg fe or I must transform all the variable (dependent and indpendendent although my independent include 2 indicator variables) in xtreg. i read when i make tranformation must to be all all the variables (dependent and indpendendent) but my independent include 2 indicator variables.

            Comment


            • #7
              i read when i make tranformation must to be all all the variables (dependent and indpendendent) but my independent include 2 indicator variables.
              As I said in #5, I don't know where you read this, but it is wrong, wrong, wrong. I don't know how to say it any more clearly.

              Comment


              • #8
                Hend,

                I think that every knowledgeable person on this board will agree that you do not necessarily need to transform all your variables. I certainly agree with Marcos and Clyde.

                You may already know this. But it sounded to me like you were going to manually calculate an interaction term, then include that in your model. Instead, you should use the factor variable syntax. For example:

                Code:
                regress y i.persistence i.overfirm i.persistence#i.overfirm
                i. tells Stata this is a categorical variable. # means an interaction term. If one of the variables were continuous, you can use c. as a prefix. For example, say persistence was continuous:

                Code:
                regress y persistence i.overfirm c.persistence#i.overfirm
                Please ignore if you already knew this.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9

                  Hend:
                  As an aside to the previous helpful and commendable comments, the following textbook offers three example of linear regeression models with and without log transformation of the dependent and/or independent variables:
                  https://www.pearson.com/us/higher-ed...321278876.html pages 267-273
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    i wish using log transformation to overcome some problem that related to the assumption of linear regression
                    As also remarked in #2, "the fact the distribution of the predictor is not normal won't necessarily prompt you to logtransform it".

                    Let me abridge what has been commented so far:


                    1) A dummy variable should be kept "as is". Had you typed the pair of short commands I shared with you in #2, you'd have understood the reason immediately.

                    2) There shall never be an indication to logtransform all variables (dummies + continuous ones) at once.

                    3) The main "problem" you mentioned (non normal pattern of distribution of a given predictor) is not a great argument to logtransform it.

                    That being said, I gather you wish to delve with a longitudinal study. Being this so, please take some time to grasp the core-knowledge related to OLS regression, then the core-knowledge related to panel data models beforing plunging in the vast Ocean of panel data models.
                    Last edited by Marcos Almeida; 15 Jan 2018, 03:41.
                    Best regards,

                    Marcos

                    Comment


                    • #11
                      Hi Stata community, thanks for all

                      what type of data transfornation is suitable for high kurtosis data?

                      Comment


                      • #12
                        You should not be transforming variables based on their distributions. Data transformations for regression variables should be done to, and only to, linearize the relationship between the predictors and the outcomes. The distributions of the variables is irrelevant. You can have variables with very unusual distributions, but they might still be linearly related to each other--in which case no transformation would be appropriate. If you want to figure out what to do with your variables, I suggest you start with the -graph twoway scatter- command to see what linearities and non-linearities there are. The shapes of those relationships determine whether you need any transformations, and, if you do, will suggest what they might be.

                        Comment


                        • #13
                          hi clyde this is graph twoway scatter betwee new

                          Comment


                          • #14
                            hi clyde this is graph twoway scatter betwee newinvestment and smoothness . can make transformation or not .
                            Attached Files

                            Comment


                            • #15
                              Naturally, you can do it. The test is whether it works. There might be an approximate linear relationship hidden in there.

                              But it isn't easy to see what transformation makes sense. Your investment variable can be negative, so plain logarithms don't apply.

                              It isn't easy to see whether smoothness has zero or negative values.

                              A generalised linear model with log link isn't fazed by some zero or negative values in the response, as the idea is that just that mean response is positive.

                              Comment

                              Working...
                              X