Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about using log on variables

    Dear Statalist community


    I am doing a regression in which the dependent variable is log of general government spending as a percentage of GDP. My independent variable is log of total immigration.

    My question is
    1. When I change the dependent variable using log of total general government spending, the correlation between the dependent and independent variables change tremendously from figure 1 to figure 2. Therefore.... I would like to ask whether I understand correctly that I can use either of them, but I have to interpret them correctly?

    2. Am I wrong in using different kinds of units for dependent variable and independent variable as one is a percentage of GDP while another is total number.

    3. I am trying to divide immigration into skilled and unskilled. However I am struggling on how to do so. I am thinking of doing it similarly to the paper by René Böheim and Karin May (http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf ) they state that....

    'The definition of immigrants for our empirical analysis is the foreign-born of working age, or, where data on the foreign-born were not available, foreigners.12 We collected data on immigrants by skill, where we use two categories of skill (low and high) derived from the International Standard Classification of Education (ISCED) 1997.13 We define as the low skilled immigrants those whose highest educational level is secondary education or less (ISCED level 4 or less). The high-skilled have attained the first or second stage of tertiary education (ISCED level 5 or 6). Low- and high-skilled immigrants are expressed in percent of the total population. High-skilled and low-skilled immigrants are expressed as shares in the total population.'

    However I do-not understand how they combine two data together to get low-skilled immigrants?






    FIGURE1 (dependent variable as a log government spending as a percentage of GDP)


    FIGURE2 (dependent variable as a log total government spending )





    Thank you very much
    Guest
    Last edited by sladmin; 02 May 2018, 08:03. Reason: anonymize poster

  • #2
    1. Figure 2 does not show in #1, at least not on my computer. But total GDP expenditure and % of GDP expenditure by government are completely different variables. Why would you expect there to be any similarity in how those variables relate to log immigration (or anything else)? You certainly need to properly interpret anything you do. From figure 1 it seems like a linear regression of log immigration on log government expenditures as % of GDP will be reasonable, perhaps not interesting, but reasonable. Without seeing figure 2, I can't say if a linear relationship looks like a feasible specification between log immigration and log total GDP.

    2. For question 2 the answer is: no, this is not a problem. Not at all. Not even worth thinking about.

    3. It doesn't sound like they do combine anything together to get low skilled immigrants. It says they just use the figures for ISCED level 4 or less. Perhaps that entails adding up some numbers, assuming that they have separate counts for ISCED levels 1, 2, 3, and 4 or something like that. But it doesn't sound like anything more complicated than that. Have you seen the data source they used? Do you know how it is organized?

    Comment


    • #3
      To Clyde's excellent points I add a footnote. Your logarithms of percents are evidently natural logs as (for example) log base 10 of 3 would correspond to 1000%. So, we can reverse engineer your range in terms of percent. Here's the command and the graph.

      Code:
      twoway function log(x), ra(`=exp(2.4)' `=exp(3.3)') ytitle(ln of percent) xtitle(percent)
      Click image for larger version

Name:	ln_is_linear.png
Views:	1
Size:	19.9 KB
ID:	1434499




      The principle is generic (and applies to logarithms in any base -- and indeed yet more generally). When the ratio of maximum to minimum is not very large, logarithms are close to linear, and there is little obvious gain in a transformation.

      Conversely, when percents vary over a range from nearly 0 to nearly 100, then I'd tend to consider logit as port of first call if nonlinearity were evident.
      Last edited by Nick Cox; 15 Mar 2018, 02:23.

      Comment


      • #4
        Dear Clyde and Nick

        Thank you very much! I have already taken all of your suggestions in consideration.

        Thank you again
        Guest
        Last edited by sladmin; 02 May 2018, 08:03. Reason: anonymize poster

        Comment

        Working...
        X