Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normalizing Panel data by (population *√ area )

    Hello,

    I am working on my dissertation on the effectiveness of private participation on electricity sector outcomes. for context, I am working with an unbalanced dataset comprised from multiple data sources. My supervisor has advised that to make better comparisons its best I normalize my data (considering I'm working with big countries like Nigeria (large area and population) and much smaller countries like Guinea. He advised that I normalize the data by(population *√ area), but I am not sure what commands to use and how to go about it in general as I have searched for the past couple days without much luck.

    Any assistance will be greatly appreciated.

    Amina

  • #2
    It's actually straightforward once you have the variables population and land area in your dataset. Suppose you start with the following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year str12 country int GDP float pop
    2012 "China"        8532 1354.2
    2012 "India"        1828 1265.8
    2012 "Indonesia"     918  248.5
    2012 "Korea, Rep."  1278   50.2
    2012 "Saudi Arabia"  736   29.2
    end
    lab var GDP "GDP in billions of $"
    lab var pop "Population in millions"
    You need to get the area corresponding to these countries, e.g., from https://en.wikipedia.org/wiki/List_o...encies_by_area.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 country float area
    "China"        9596.961  
    "India"        3287.263
    "Indonesia"    1904.569
    "Korea, Rep."  100.210  
    "Saudi Arabia" 2149.690
    end
    lab var area "Area in thousands of km2"
    Then you merge this variable with your data and create the normalized variables:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 country float area
    "China"        9596.961  
    "India"        3287.263
    "Indonesia"    1904.569
    "Korea, Rep."  100.210  
    "Saudi Arabia" 2149.690
    end
    lab var area "Area in thousands of km2"
    tempfile area
    save `area'
    
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year str12 country int GDP float pop
    2012 "China"        8532 1354.2
    2012 "India"        1828 1265.8
    2012 "Indonesia"     918  248.5
    2012 "Korea, Rep."  1278   50.2
    2012 "Saudi Arabia"  736   29.2
    end
    lab var GDP "GDP in billions of $"
    lab var pop "Population in millions"
    merge m:1 country using `area', nogen
    
    *CREATE NORMALIZED GDP
    gen GDPN= GDP/ (pop*(sqrt(area)))

    Res.:

    Code:
    . l
    
         +-----------------------------------------------------------+
         | year        country    GDP      pop       area       GDPN |
         |-----------------------------------------------------------|
      1. | 2012          China   8532   1354.2   9596.961   .0643134 |
      2. | 2012          India   1828   1265.8   3287.263    .025188 |
      3. | 2012      Indonesia    918    248.5   1904.569   .0846482 |
      4. | 2012    Korea, Rep.   1278     50.2     100.21   2.543148 |
      5. | 2012   Saudi Arabia    736     29.2    2149.69   .5436345 |
         +-----------------------------------------------------------+
    So above, S. Korea has the highest normalized GDP followed by Saudi Arabia, which sounds about right given that we are looking at this variable on a per-capita basis.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      It's actually straightforward once you have the variables population and land area in your dataset. Suppose you start with the following:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year str12 country int GDP float pop
      2012 "China" 8532 1354.2
      2012 "India" 1828 1265.8
      2012 "Indonesia" 918 248.5
      2012 "Korea, Rep." 1278 50.2
      2012 "Saudi Arabia" 736 29.2
      end
      lab var GDP "GDP in billions of $"
      lab var pop "Population in millions"
      You need to get the area corresponding to these countries, e.g., from https://en.wikipedia.org/wiki/List_o...encies_by_area.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str12 country float area
      "China" 9596.961
      "India" 3287.263
      "Indonesia" 1904.569
      "Korea, Rep." 100.210
      "Saudi Arabia" 2149.690
      end
      lab var area "Area in thousands of km2"
      Then you merge this variable with your data and create the normalized variables:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str12 country float area
      "China" 9596.961
      "India" 3287.263
      "Indonesia" 1904.569
      "Korea, Rep." 100.210
      "Saudi Arabia" 2149.690
      end
      lab var area "Area in thousands of km2"
      tempfile area
      save `area'
      
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year str12 country int GDP float pop
      2012 "China" 8532 1354.2
      2012 "India" 1828 1265.8
      2012 "Indonesia" 918 248.5
      2012 "Korea, Rep." 1278 50.2
      2012 "Saudi Arabia" 736 29.2
      end
      lab var GDP "GDP in billions of $"
      lab var pop "Population in millions"
      merge m:1 country using `area', nogen
      
      *CREATE NORMALIZED GDP
      gen GDPN= GDP/ (pop*(sqrt(area)))

      Res.:

      Code:
      . l
      
      +-----------------------------------------------------------+
      | year country GDP pop area GDPN |
      |-----------------------------------------------------------|
      1. | 2012 China 8532 1354.2 9596.961 .0643134 |
      2. | 2012 India 1828 1265.8 3287.263 .025188 |
      3. | 2012 Indonesia 918 248.5 1904.569 .0846482 |
      4. | 2012 Korea, Rep. 1278 50.2 100.21 2.543148 |
      5. | 2012 Saudi Arabia 736 29.2 2149.69 .5436345 |
      +-----------------------------------------------------------+
      So above, S. Korea has the highest normalized GDP followed by Saudi Arabia, which sounds about right given that we are looking at this variable on a per-capita basis.
      Thank you so so much Andrew, I appreciate this more than you know, just implemented and my regression outputs are making much more sense. Thanks once again sir

      Comment

      Working...
      X