Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum of *_var

    Good day everybody,

    with my data set I would like to check whether the tax rate (*_PIT) for countries has changed depending on how they participated in the world wars (1) neutral, 2) joined later, 3) from the beginning).
    Now I have assigned a war status (variable _WAR) to each country/year (0=not at war, 1=participant in war). For example, for the USA this would mean:
    Code:
    1914 0
    1915 0
    1916 0
    1917 1
    1918 1
    The USA would therefore be considered as category 2 "joined later".

    In the first step, I would like to assign the countries to the categories. I think the code could look like this:
    Code:
    *Neutral, late entry, belligerent
    unab prefix : *_WAR 
    local prefix : subinstr local prefix "_WAR" "", all 
    
    foreach p of local prefix { 
          gen `p'_NEU = 0 if sum(`p'_WAR) == 0 & `p'_WAR !=. & year > 1914 & year < 1918
          gen `p'_LAT = 1 if sum(`p'_WAR) >= 1 & sum(`p'_WAR) <= 4 & `p'_WAR !=. & year > 1914 & year < 1918
    ​​      gen `p'_BEL = 2 if sum(`p'_WAR) == 5 & `p'_WAR !=. & year > 1914 & year < 1918
    }
    I would then like to use an average variable to examine, by category (0,1,2), how taxes have developed in each country, including beyond the period after the First World War till 1933
    Code:
    *Average Variables
    egen AVG_NEU = rowmean(*_PIT) if *_NEU == 0 & year > 1910 & year < 1933
    I have separated the First and Second World Wars from each other because I can't think of anything at the moment to represent this in code. So this becomes two steps.

    Apart from the fact that the code does not work.
    Have I made a mistake somewhere in my thinking about how it could be better designed?

  • #2
    There are a number of syntax problems with the code you have crafted, but the larger problem is that, as best I can tell, you are using a wide data structure that is just not well suited to working this kind of problem. I assume your data looks something like this at the start:
    Code:
        year   us_war   uk_war   switze~r  
        1913        0        0          0  
        1914        0        1          0  
        1915        0        1          0  
        1916        0        1          0  
        1917        1        1          0  
        1918        1        1          0  
        1919        0        0          0
    Moreover your approach seeks to create three separate indicator variables for neutral, late entry, and belligerent status for each country--which actually just makes subsequent work even harder and is unnecessary when a single 3-level variable serves the purpose.

    It can be done, but it is complicated and quite error prone to do so. Life is much easier if the data is reorganized to a long layout:
    Code:
        year       country   war  
        1913   switzerland     0  
        1914   switzerland     0  
        1915   switzerland     0  
        1916   switzerland     0  
        1917   switzerland     0  
        1918   switzerland     0  
        1919   switzerland     0  
        1913            uk     0  
        1914            uk     1  
        1915            uk     1  
        1916            uk     1  
        1917            uk     1  
        1918            uk     1  
        1919            uk     0  
        1913            us     0  
        1914            us     0  
        1915            us     0  
        1916            us     0  
        1917            us     1  
        1918            us     1  
        1919            us     0
    So the following code begins by doing the wide to long transformation. Once that's done, calculating the classification variable becomes easy.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(year us_war uk_war switzerland_war)
    1913 0 0 0
    1914 0 1 0
    1915 0 1 0
    1916 0 1 0
    1917 1 1 0
    1918 1 1 0
    1919 0 0 0
    end
    
    
    reshape long @_war, i(year)  j(country) string
    rename _war war
    
    gen byte war_year = inrange(year, 1914, 1918)
    
    label define classification    0    "Neutral"    1    "Late Entry"    2    "Belligerent"
    by country (year), sort: gen years_participation = sum(cond(war_year, war, .))
    by country (years_participation), sort: gen byte classification:classification = 0 ///
        if years_participation[_N] == 0
    by country (years_participation): replace classification = ///
        cond(years_participation[_N] < 5, 1, 2) if missing(classification)
    sort country year
    Note that I have ignored the whole issue of tax variables here, as you don't give enough of a description of them for me to confidently guess what they might be like. I suspect that the -reshape- command will probably also need a @_pit term before the comma, though it's not entirely clear to me.

    And as you didn't post example data that represents the whole problem, my code may prove inadequate in your full data sample due to complications I have not foreseen. Anyway, this should point you in the right direction. And if my code needs fixes that you need help with, when you post back, please use the -dataex- command (as I have done above) to show an example of your data set that reflects whatever problems you are encountering.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you very much for your help! It helps me enormously.
      Sorry for the delayed reply, I was pushing other focuses of the thesis because I was having trouble at this point.

      I have revised the data structure according to your suggestion. At the moment I have not yet transferred all countries with data correctly, because I want to wait until the code works. Therefore only Australia (AU) and Austria (AT) are completely included. It now looks like this:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year str2 country byte war double pit
      1914 "AU" 1      .
      1915 "AU" 1      .
      1916 "AU" 1     25
      1917 "AU" 1  31.25
      1918 "AU" 1  31.25
      1919 "AU" .  40.63
      1920 "AU" .  40.63
      1921 "AU" .  42.65
      1922 "AU" .  42.65
      1923 "AU" .  38.38
      1924 "AU" .  38.38
      1925 "AU" .   34.5
      1926 "AU" .     30
      1927 "AU" .     30
      1928 "AU" .     27
      1929 "AU" .     27
      1930 "AU" .  30.35
      1931 "AU" .  33.78
      1932 "AU" .  35.48
      1933 "AU" .  35.48
      1934 "AU" .  32.83
      1935 "AU" .  32.83
      1936 "AU" .  31.88
      1937 "AU" .  28.69
      1938 "AU" .  28.69
      1939 "AU" .  32.99
      1940 "AU" .  36.29
      1941 "AU" .     50
      1942 "AU" .  82.25
      1943 "AU" .     90
      1944 "AU" .     92
      1945 "AU" .     92
      1914 "AT" 1   6.68
      1915 "AT" 1   6.68
      1916 "AT" 1 14.695
      1917 "AT" 1 14.695
      1918 "AT" 1 14.695
      1919 "AT" . 14.965
      1920 "AT" .     60
      1921 "AT" .     60
      1922 "AT" .      .
      1923 "AT" .      .
      1924 "AT" .     45
      1925 "AT" .     45
      1926 "AT" .     45
      1927 "AT" .     45
      1928 "AT" .     45
      1929 "AT" .     45
      1930 "AT" .     45
      1931 "AT" .     45
      1932 "AT" .  53.45
      1933 "AT" .  53.45
      1934 "AT" .   55.9
      1935 "AT" .   55.9
      1936 "AT" .   55.9
      1937 "AT" .   55.9
      1938 "AT" .     50
      1939 "AT" 1     50
      1940 "AT" 1     50
      1941 "AT" 1     50
      1942 "AT" 1     50
      1943 "AT" 1     50
      1944 "AT" 1     50
      1945 "AT" 1     50
      1914 "AL" 1      .
      end

      You are right, I did not write enough about the tax variable. It consists of a percentage rate from 0 to 100%, as indicated at the top of the structure. To relate the tax variable and war participation, I have considered the following approach;
      To examine the effect of war participation on tax, I want to observe the tax development in the period after. This means in the first step to divide the states into the 3 categories (participant, late entry, neutral) for the period 1914 to 1918 (as already done thanks to your code). The second step is to add the tax variable by plotting the tax development of the 3 categories in the period from 1914 to 1933.
      Code:
      twoway (line classification year if classification==0, lpattern(dash) legend(label(1 "Neutral States"))) (line classification year if classification==1, legend(label(2 "Late War Entry"))) (line classification year if classification==2, legend(label(3 "Belligerent"))) if year>1914 & year<1933, xscale(log) ytitle("PIT in %") xtitle("Tax development of the state classification in %") xlabel(1914(2)1933) ylabel(0(10)80)
      Interestingly, after running the code, the tax/PIT variable becomes an empty ".". Unfortunately, this means that I can no longer use it. Would you kindly tell me how I can prevent this?
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year str2 country byte war double pit byte war_year float years_participation byte classification
      1915 "" 0 . 1  0 2
      1915 "" 0 . 1  0 2
      1915 "" . . 1  0 2
      1915 "" . . 1  0 2
      1915 "" . . 1  0 2
      1915 "" 1 . 1  1 2
      1915 "" 0 . 1  1 2
      1915 "" 0 . 1  1 2
      1915 "" 0 . 1  1 2
      1915 "" . . 1  1 2
      1915 "" . . 1  1 2
      1915 "" . . 1  1 2
      1915 "" . . 1  1 2
      1915 "" 0 . 1  1 2
      1915 "" 1 . 1  2 2
      1915 "" 1 . 1  3 2
      1915 "" . . 1  3 2
      1915 "" 1 . 1  4 2
      1915 "" . . 1  4 2
      1915 "" 1 . 1  5 2
      1915 "" . . 1  5 2
      1915 "" 1 . 1  6 2
      1915 "" 1 . 1  7 2
      1915 "" 1 . 1  8 2
      1915 "" 1 . 1  9 2
      1915 "" 0 . 1  9 2
      1915 "" . . 1  9 2
      1915 "" 1 . 1 10 2
      1915 "" 1 . 1 11 2
      1915 "" 0 . 1 11 2
      1915 "" . . 1 11 2
      1915 "" 0 . 1 11 2
      1915 "" 1 . 1 12 2
      1915 "" 1 . 1 13 2
      1915 "" 0 . 1 13 2
      1915 "" 1 . 1 14 2
      1915 "" 1 . 1 15 2
      1915 "" 1 . 1 16 2
      1915 "" 1 . 1 17 2
      1915 "" . . 1 17 2
      1916 "" . . 1 17 2
      1916 "" 1 . 1 18 2
      1916 "" 1 . 1 19 2
      1916 "" . . 1 19 2
      1916 "" 0 . 1 19 2
      1916 "" 1 . 1 20 2
      1916 "" 1 . 1 21 2
      1916 "" 0 . 1 21 2
      1916 "" 0 . 1 21 2
      1916 "" . . 1 21 2
      1916 "" 0 . 1 21 2
      1916 "" 1 . 1 22 2
      1916 "" . . 1 22 2
      1916 "" . . 1 22 2
      1916 "" 0 . 1 22 2
      1916 "" 0 . 1 22 2
      1916 "" . . 1 22 2
      1916 "" 1 . 1 23 2
      1916 "" 1 . 1 24 2
      1916 "" 1 . 1 25 2
      1916 "" 0 . 1 25 2
      1916 "" 1 . 1 26 2
      1916 "" 1 . 1 27 2
      1916 "" . . 1 27 2
      1916 "" 1 . 1 28 2
      1916 "" . . 1 28 2
      1916 "" 1 . 1 29 2
      1916 "" 1 . 1 30 2
      1916 "" . . 1 30 2
      1916 "" 1 . 1 31 2
      1916 "" . . 1 31 2
      1916 "" 1 . 1 32 2
      1916 "" 1 . 1 33 2
      1916 "" 1 . 1 34 2
      1916 "" . . 1 34 2
      1916 "" 0 . 1 34 2
      1916 "" 0 . 1 34 2
      1916 "" . . 1 34 2
      1916 "" 0 . 1 34 2
      1916 "" 1 . 1 35 2
      1917 "" 1 . 1 36 2
      1917 "" 1 . 1 37 2
      1917 "" 1 . 1 38 2
      1917 "" . . 1 38 2
      1917 "" . . 1 38 2
      1917 "" 1 . 1 39 2
      1917 "" 1 . 1 40 2
      1917 "" . . 1 40 2
      1917 "" 1 . 1 41 2
      1917 "" 1 . 1 42 2
      1917 "" 1 . 1 43 2
      1917 "" . . 1 43 2
      1917 "" 1 . 1 44 2
      1917 "" 1 . 1 45 2
      1917 "" 1 . 1 46 2
      1917 "" 1 . 1 47 2
      1917 "" 0 . 1 47 2
      1917 "" . . 1 47 2
      1917 "" 0 . 1 47 2
      1917 "" 0 . 1 47 2
      end
      label values classification classification
      label def classification 2 "Belligerent", modify

      I use Stata/BE 17.0

      Comment


      • #4
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int year str2 country double pit
        1914 "AU"      .
        1915 "AU"      .
        1916 "AU"     25
        1917 "AU"  31.25
        1918 "AU"  31.25
        1919 "AU"  40.63
        1920 "AU"  40.63
        1921 "AU"  42.65
        1922 "AU"  42.65
        1923 "AU"  38.38
        1924 "AU"  38.38
        1925 "AU"   34.5
        1926 "AU"     30
        1927 "AU"     30
        1928 "AU"     27
        1929 "AU"     27
        1930 "AU"  30.35
        1931 "AU"  33.78
        1932 "AU"  35.48
        1933 "AU"  35.48
        1934 "AU"  32.83
        1935 "AU"  32.83
        1936 "AU"  31.88
        1937 "AU"  28.69
        1938 "AU"  28.69
        1939 "AU"  32.99
        1940 "AU"  36.29
        1941 "AU"     50
        1942 "AU"  82.25
        1943 "AU"     90
        1944 "AU"     92
        1945 "AU"     92
        1914 "AT"   6.68
        1915 "AT"   6.68
        1916 "AT" 14.695
        1917 "AT" 14.695
        1918 "AT" 14.695
        1919 "AT" 14.965
        1920 "AT"     60
        1921 "AT"     60
        1922 "AT"      .
        1923 "AT"      .
        1924 "AT"     45
        1925 "AT"     45
        1926 "AT"     45
        1927 "AT"     45
        1928 "AT"     45
        1929 "AT"     45
        1930 "AT"     45
        1931 "AT"     45
        1932 "AT"  53.45
        1933 "AT"  53.45
        1934 "AT"   55.9
        1935 "AT"   55.9
        1936 "AT"   55.9
        1937 "AT"   55.9
        1938 "AT"     50
        1939 "AT"     50
        1940 "AT"     50
        1941 "AT"     50
        1942 "AT"     50
        1943 "AT"     50
        1944 "AT"     50
        1945 "AT"     50
        end
        
        gen war = 0
        replace war = 1 if inrange(year, 1914, 1918)
        replace war = 2 if inrange(year, 1939, 1945) & country != "US"
        replace war = 2 if inrange(year, 1943, 1945) & country == "US"
        
        label define classification    0    "Neutral"    1    "Late Entry"    2    "Belligerent"
        by country war (year), sort: gen years_participation = sum(war != 0)
        by country war (years_participation), sort: gen byte classification:classification = 0 ///
            if years_participation[_N] == 0
        by country war (years_participation): replace classification = ///
            cond(years_participation[_N] < 5, 1, 2) if missing(classification)
        sort country year

        Comment


        • #5
          First of all, thank you very much for your help!

          I have a few questions about the code;
          1) According to the output, there are 10 states that joined the wars later. However, a few are missing. For example, Argentia only joins the war in 1944. According to Stata output, this state only has the classes "Neutral" and "Belligerent". Or Bulgaria, which only entered the war in 1915 (also only "Neutral" or "Belligerent" according to Stata). Do you know why that might be?

          Code:
           
          1939 AR 0
          1940 AR 0
          1941 AR 0
          1942 AR 0
          1943 AR 0
          1944 AR 1
          1945 AR 1
          2) In order to link the development of taxes after the world wars with the situation during the war, I would like to create a new average variable.
          The new variable should only use the tax data from 1914 to 1933 if the states were neutral (classification==0) during the years 1914 to 1918.
          I have thought about the following code, but it does not lead to the right result:
          Code:
          egen AVG_PIT_NEU = mean(pit) if inrange(year, 1914, 1918) && classification==0

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int year str2 country byte participant double pit
          1910 "AU" .      .
          1911 "AU" .      .
          1912 "AU" .      .
          1913 "AU" .      .
          1914 "AU" 1      .
          1915 "AU" 1      .
          1916 "AU" 1     25
          1917 "AU" 1  31.25
          1918 "AU" 1  31.25
          1919 "AU" .  40.63
          1920 "AU" .  40.63
          1921 "AU" .  42.65
          1922 "AU" .  42.65
          1923 "AU" .  38.38
          1924 "AU" .  38.38
          1925 "AU" .   34.5
          1926 "AU" .     30
          1927 "AU" .     30
          1928 "AU" .     27
          1929 "AU" .     27
          1930 "AU" .  30.35
          1931 "AU" .  33.78
          1932 "AU" .  35.48
          1933 "AU" .  35.48
          1934 "AU" .  32.83
          1935 "AU" .  32.83
          1936 "AU" .  31.88
          1937 "AU" .  28.69
          1938 "AU" .  28.69
          1939 "AU" .  32.99
          1940 "AU" .  36.29
          1941 "AU" .     50
          1942 "AU" .  82.25
          1943 "AU" .     90
          1944 "AU" .     92
          1945 "AU" .     92
          1946 "AU" .     92
          1947 "AU" .     72
          1948 "AU" .     67
          1949 "AU" .     67
          1950 "AU" .     67
          1951 "AU" .     75
          1952 "AU" .     75
          1953 "AU" .     75
          1954 "AU" .     70
          1955 "AU" .     67
          1910 "AT" .  4.988
          1911 "AT" .  4.988
          1912 "AT" .  4.988
          1913 "AT" .  4.988
          1914 "AT" 1   6.68
          1915 "AT" 1   6.68
          1916 "AT" 1 14.695
          1917 "AT" 1 14.695
          1918 "AT" 1 14.695
          1919 "AT" . 14.965
          1920 "AT" .     60
          1921 "AT" .     60
          1922 "AT" .      .
          1923 "AT" .      .
          1924 "AT" .     45
          1925 "AT" .     45
          1926 "AT" .     45
          1927 "AT" .     45
          1928 "AT" .     45
          1929 "AT" .     45
          1930 "AT" .     45
          1931 "AT" .     45
          1932 "AT" .  53.45
          1933 "AT" .  53.45
          1934 "AT" .   55.9
          1935 "AT" .   55.9
          1936 "AT" .   55.9
          1937 "AT" .   55.9
          1938 "AT" .     50
          1946 "AT" .     50
          1947 "AT" .     50
          1948 "AT" .     50
          1949 "AT" .     50
          1950 "AT" .     50
          1951 "AT" .     50
          1952 "AT" .     50
          1953 "AT" .     50
          1954 "AT" .     60
          1955 "AT" .     47
          1910 "AL" .      .
          1911 "AL" .      .
          1912 "AL" .      .
          1913 "AL" .      .
          1914 "AL" 1      .
          1915 "AL" 1      .
          1916 "AL" 1      .
          1917 "AL" 1      .
          1918 "AL" 1      .
          1919 "AL" .      .
          1920 "AL" .      .
          1921 "AL" .      .
          1922 "AL" .      .
          1923 "AL" .      .
          1924 "AL" .      .
          end
          Thank you for your help!

          Comment


          • #6
            I cannot replicate the problem you are pointing out in 1). When I run the data you show with the code that I have been using, it correctlyl identifies AR as Late Entry in 1944 and 1945:
            Code:
            . clear
            
            . input int year str2 country byte war
            
                     year    country       war
              1. 1939 "AR"       0
              2. 1940 "AR"       0
              3. 1941 "AR"       0
              4. 1942 "AR"       0
              5. 1943 "AR"       0
              6. 1944 "AR"       1
              7. 1945 "AR"       1
              8. end
            
            .
            . label define classification    0    "Neutral"    1    "Late Entry"    2    "Belligerent"
            
            . by country war (year), sort: gen years_participation = sum(war != 0)
            
            . by country war (years_participation), sort: gen byte classification:classification = 0 ///
            >     if years_participation[_N] == 0
            (2 missing values generated)
            
            . by country war (years_participation): replace classification = ///
            >     cond(years_participation[_N] < 5, 1, 2) if missing(classification)
            (2 real changes made)
            
            . sort country year
            
            .
            . list, noobs clean
            
                year   country   war   years_~n   classi~n  
                1939        AR     0          0    Neutral  
                1940        AR     0          0    Neutral  
                1941        AR     0          0    Neutral  
                1942        AR     0          0    Neutral  
                1943        AR     0          0    Neutral  
                1944        AR     1          1   Late Ent  
                1945        AR     1          2   Late Ent  
            
            .
            I infer that you have somehow modified the code from what I originally wrote. Or the data: coding the war participation variable as 1 vs . is WRONG for this code. It MUST be 1 vs 0. (More generally, avoid 1/. coding of dichotomous variables in Stata--it usually leads to trouble.)

            2) I'm not certain I understand what you want to do. And you've confused some things by introducing this new variable named participant that, I guess, is supposed to be what we have been, up to now, calling war. Moreover, in your example data there are no countries that remained neutral from 1914 through 1918. But perhaps what you need is this:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input int year str2 country byte participant double pit
            1910 "AU" .      .
            1911 "AU" .      .
            1912 "AU" .      .
            1913 "AU" .      .
            1914 "AU" 1      .
            1915 "AU" 1      .
            1916 "AU" 1     25
            1917 "AU" 1  31.25
            1918 "AU" 1  31.25
            1919 "AU" .  40.63
            1920 "AU" .  40.63
            1921 "AU" .  42.65
            1922 "AU" .  42.65
            1923 "AU" .  38.38
            1924 "AU" .  38.38
            1925 "AU" .   34.5
            1926 "AU" .     30
            1927 "AU" .     30
            1928 "AU" .     27
            1929 "AU" .     27
            1930 "AU" .  30.35
            1931 "AU" .  33.78
            1932 "AU" .  35.48
            1933 "AU" .  35.48
            1934 "AU" .  32.83
            1935 "AU" .  32.83
            1936 "AU" .  31.88
            1937 "AU" .  28.69
            1938 "AU" .  28.69
            1939 "AU" .  32.99
            1940 "AU" .  36.29
            1941 "AU" .     50
            1942 "AU" .  82.25
            1943 "AU" .     90
            1944 "AU" .     92
            1945 "AU" .     92
            1946 "AU" .     92
            1947 "AU" .     72
            1948 "AU" .     67
            1949 "AU" .     67
            1950 "AU" .     67
            1951 "AU" .     75
            1952 "AU" .     75
            1953 "AU" .     75
            1954 "AU" .     70
            1955 "AU" .     67
            1910 "AT" .  4.988
            1911 "AT" .  4.988
            1912 "AT" .  4.988
            1913 "AT" .  4.988
            1914 "AT" 1   6.68
            1915 "AT" 1   6.68
            1916 "AT" 1 14.695
            1917 "AT" 1 14.695
            1918 "AT" 1 14.695
            1919 "AT" . 14.965
            1920 "AT" .     60
            1921 "AT" .     60
            1922 "AT" .      .
            1923 "AT" .      .
            1924 "AT" .     45
            1925 "AT" .     45
            1926 "AT" .     45
            1927 "AT" .     45
            1928 "AT" .     45
            1929 "AT" .     45
            1930 "AT" .     45
            1931 "AT" .     45
            1932 "AT" .  53.45
            1933 "AT" .  53.45
            1934 "AT" .   55.9
            1935 "AT" .   55.9
            1936 "AT" .   55.9
            1937 "AT" .   55.9
            1938 "AT" .     50
            1946 "AT" .     50
            1947 "AT" .     50
            1948 "AT" .     50
            1949 "AT" .     50
            1950 "AT" .     50
            1951 "AT" .     50
            1952 "AT" .     50
            1953 "AT" .     50
            1954 "AT" .     60
            1955 "AT" .     47
            1910 "AL" .      .
            1911 "AL" .      .
            1912 "AL" .      .
            1913 "AL" .      .
            1914 "AL" 1      .
            1915 "AL" 1      .
            1916 "AL" 1      .
            1917 "AL" 1      .
            1918 "AL" 1      .
            1919 "AL" .      .
            1920 "AL" .      .
            1921 "AL" .      .
            1922 "AL" .      .
            1923 "AL" .      .
            1924 "AL" .      .
            end
            
            replace participant = 0 if missing(participant)
            rename participant war
            
            label define classification    0    "Neutral"    1    "Late Entry"    2    "Belligerent"
            by country war (year), sort: gen years_participation = sum(war != 0)
            by country war (years_participation), sort: gen byte classification:classification = 0 ///
                if years_participation[_N] == 0
            by country war (years_participation): replace classification = ///
                cond(years_participation[_N] < 5, 1, 2) if missing(classification)
            sort country year
            
            by country (year), sort: egen neutral_ww1 ///
                = min(cond(inrange(year, 1914, 1918), classification == 0, .))
            by country (year): egen avg_pit_neu ///
                = mean(cond(inrange(year, 1914, 1933), pit, .)) if neutral_ww1
            As I said, as this example data contains no countries that were neutral 1914 through 1918, the end result here is exclusively missing values for avg_pit_neu. But perhaps in your full data set, there are some neutral countries, and then you will see results for them.

            Comment


            • #7
              First of all, thank you very much for your continued assistance. It helps me tremendously.

              Thanks for the advice. I have checked my database, as far as I can see there are no "." in it.

              1) Indeed, I have repeated it with the code from post #4 and #6. Depending on what I use, AR is categorized differently. Here are the results:
              Code:
              clear all
              capture log close
              set more off
              
              cd "U:\Dokumente\Thesis"
              use Taxes_Masterfile.dta
              
              label variable year "Year"
              label variable pit "Personal income tax rate"
              label variable participant "Participant status"
              
              
              #4
              gen war = 0
              replace war = 1 if inrange(year, 1914, 1918)
              replace war = 2 if inrange(year, 1939, 1945) & country != "US"
              replace war = 2 if inrange(year, 1943, 1945) & country == "US"
              
              label define classification    0    "Neutral"    1    "Late Entry"    2    "Belligerent"
              by country war (year), sort: gen years_participation = sum(war != 0)
              by country war (years_participation), sort: gen byte classification:classification = 0 if years_participation[_N] == 0
              by country war (years_participation): replace classification = cond(years_participation[_N] < 5, 1, 2) if missing(classification)
              sort country year
              list if country=="AR"
              
                      year   country   partic~t   pit   war   years_~n   classi~n
               76. | 1939        AR          0     .     2          1   Belliger |
                    |-------------------------------------------------------------|
                77. | 1940        AR          0     .     2          2   Belliger |
                78. | 1941        AR          0     .     2          3   Belliger |
                79. | 1942        AR          0     .     2          4   Belliger |
                80. | 1943        AR          0     .     2          5   Belliger |
                81. | 1944        AR          1     .     2          6   Belliger |
                    |-------------------------------------------------------------|
                82. | 1945        AR          1     .     2          7   Belliger |
              
              ///
              
              
              #6
              replace participant = 0 if missing(participant)
              rename participant war
              
              label define classification    0    "Neutral"    1    "Late Entry"    2    "Belligerent"
              by country war (year), sort: gen years_participation = sum(war != 0)
              by country war (years_participation), sort: gen byte classification:classification = 0 if years_participation[_N] == 0
              by country war (years_participation): replace classification = cond(years_participation[_N] < 5, 1, 2) if missing(classification)
              sort country year
              
              by country (year), sort: egen neutral_ww1 = min(cond(inrange(year, 1914, 1918), classification == 0, .))
              by country (year): egen avg_pit_neu = mean(cond(inrange(year, 1914, 1933), pit, .)) if neutral_ww1
              list if country=="AR"
              
                      year   country   war   pit   years_~n   classi~n   neutra~1   avg_pi~u
              76. | 1939        AR     0     .          0    Neutral          1          . |
                    |------------------------------------------------------------------------|
                77. | 1940        AR     0     .          0    Neutral          1          . |
                78. | 1941        AR     0     .          0    Neutral          1          . |
                79. | 1942        AR     0     .          0    Neutral          1          . |
                80. | 1943        AR     0     .          0    Neutral          1          . |
                81. | 1944        AR     1     .          1   Late Ent          1          .
              However seems like something is wrong with Bulgaria. When I run it with the code from post #4 or #6, I get the following results:

              Code:
              #4
              228. | 1914 BG 0 . 1 1 Belliger |
              |-------------------------------------------------------------|
              229. | 1915 BG 1 . 1 2 Belliger |
              230. | 1916 BG 1 . 1 3 Belliger |
              231. | 1917 BG 1 . 1 4 Belliger |
              232. | 1918 BG 1 . 1 5 Belliger
              
              #6
              228. | 1914 BG 0 . 0 Neutral 0 . |
              |------------------------------------------------------------------------|
              229. | 1915 BG 1 . 1 Belliger 0 . |
              230. | 1916 BG 1 . 2 Belliger 0 . |
              231. | 1917 BG 1 . 3 Belliger 0 . |
              232. | 1918 BG 1 . 4 Belliger 0 . |
              However, Bulgaria should be marked as "Late Entry". Or am I missing something?


              2) Correct, I renamed "war" to "participant". If I understand your post #4 correctly, "war" is asking whether it is WWI or WWII. Which makes sense with regard to my question. Thanks for that. But this made me think that the original allocation was missing. So I created "participants" as a variable. I guess it is redundant due to "label define classification". Sorry, I'm not that deeply familiar with Stata.
              I guess you removed the categorization of "war" for the world wars from the code from post #4 to post #6 because I assume it's redundant?

              I apologize if I have not clearly described what I want to do. I would like to generate a new average tax variables for "neutra", "late entry" and "belligerent" states.
              For example, for neutral states; Only if a country was neutral in WWI, then all available tax data (1914-1933) should be used. A new average variable is to be created from all these tax data of the neutral countries, with which I would like to trace the development of the tax rate.
              The idea is to show whether the status during the war (neutral, late entry, belligerent) had an influence on the tax policy of these country classes.

              Unfortunately -dataex- does not show any countries that are neutral from 1914 to 1918. ("Listed 100 out of 1764 observations")

              When I want to create a graphic from avg_pit_new, it is a bit messed up:
              Code:
              twoway (line avg_pit_neu year, legend(label(1 "Neutral States")))
              Click image for larger version

Name:	Tax_Changes.jpg
Views:	1
Size:	61.9 KB
ID:	1745320






              I'm a bit confused. If an average variable is formed from the tax rates (1914-1933) of all countries that were neutral during the First World War, then the graph should not jump back and forth like this.
              Would rowmean lead to the desired result? Unfortunately, Stata tells me that this is not possible in this combination ("factor-variable and time-series operators not allowed")
              Last edited by Tobias Kampf; 02 Mar 2024, 17:08.

              Comment


              • #8
                We need a new -dataex- example. What you showed above doesn't help me because I can't tell which numbers are which variables. (OK, it's obvious what's year and what's country, but the rest are not.) -dataex- will show whatever you tell it to show. If you specify nothing, you get the first 100 observations in the data set. But -datex- allowed you to specify -if- and -in- conditions. So -browse- in your data set to find countries that meet the conditions in which we are getting incorrect results and note down the countries. If the countries were "XX", "ZW", and "RT" (obviously made up--use the actual countries that have problems) you can run -dataex if inlist(country, "XX", "ZW", "RT")- and you will get an example consisting precisely of those countries. You can specify up to 9 countries this way. (If there are more than 9 problem countries, pick one representative of each kind of problem. I don't have the sense that there are more than 9 different kinds of problem here.) In addition to the problem country exemplars, it would be good to include one country where everything is working fine, just so I can check that whatever I change doesn't break what's already working.

                You are correct that I erred in changing participant to war. I forgot that war was meant to distinguish the two world wars and the non war periods. We need that variable. But we also need a variable that says whether a country is participating in the war that year. So make sure both are included in your -dataex- example.

                I have checked my database, as far as I can see there are no "." in it.
                But look at the example data you showed in #5. The variable you called participant is coded 1/.--it should be 1/0. If the .'s in the -dataex- output of #5 don't come from your data set, I can't imagine how they got there. -dataex- doesn't make things up.

                As for your graph, while I understand that the graph you got is wrong, I don't know what would be right. You are trying to plot avg_pit_neu against year. But avg_pit_neu takes on a different value in each country in each year. So any way you try to plot this data with a line graph you will have the graph going crazy threading its way, in some order, through multiple points (one for each country) in each year and across years. You can control the order in which it does that, but no matter what you do, with this data you will be getting a zig-zag plot of some kind. If what you intended to do is further average the variable avg_pit_neu over all of the neutral countries, well I don't know if that makes sense, but it's easy enough to do:
                Code:
                preserve
                keep if neutral_ww1
                collapse (mean) avg_pit_neu, by(year)
                graph twoway line avg_pit_neu year, sort
                restore
                Would rowmean lead to the desired result? Unfortunately, Stata tells me that this is not possible in this combination ("factor-variable and time-series operators not allowed")
                The -egen, rowmean()- function creates a variable with the average of the values of several different variables. But the values of pit that you need to average are all part of a single variable. The values of pit look like a column, not a row. -egen, rowmean()- would work here if we had, say, a separate variable for each year: pit1914, pit1915, pit1916, etc. But we don't, and there would be nothing gained from creating such variables.

                Note: The Stata terminology is variables and observations, not columns and rows. While it's just a matter of nomenclature, I think it's important to try to avoid the column/row terminology, because Stata is not a spreadsheet, and speaking of rows and columns encourages you to think of it as if it were one. Many things that work well in spreadsheets, like laying out the data with a separate row for each country and each year's data in a separate column, work poorly in Stata in most situations. Similarly, 1/blank coding for yes/no variables is commonly used in spreadsheets, and given that spreadsheets are there to make visually grasping the data easy, that is really optimal. But for Stata's calculations, it is an obstacle. Most new Stata users have previous experience working with data in spreadsheets. But the sooner they set aside spreadsheet practices, the sooner they start getting proficient with Stata.


                Comment


                • #9
                  There should be a total of 9 countries, with a later entry. Depending on whether they joined later in WWI or WWII there should be up to 18 entries in total. Stata returns 6 entries.

                  I have used the following code:
                  Code:
                  **Start
                  clear all
                  capture log close
                  set more off
                  
                  cd "U:\Dokumente\Thesis"
                  use Taxes_Masterfile.dta
                  
                  //
                  label variable year "Year"
                  label variable pit "Personal income tax rate"
                  label variable participant "Participant status"
                  
                  
                  //
                  replace participant = 0 if missing(participant)
                  
                  label define classification    0    "Neutral"    1    "Late Entry"    2    "Belligerent"
                  by country participant (year), sort: gen years_participation = sum(participant != 0)
                  by country participant (years_participation), sort: gen byte classification:classification = 0 if years_participation[_N] == 0
                  by country participant (years_participation): replace classification = cond(years_participation[_N] < 5, 1, 2) if missing(classification)
                  sort country year
                  
                  by country (year), sort: egen neutral_ww1 = min(cond(inrange(year, 1914, 1918), classification == 0, .))
                  by country (year): egen avg_pit_neu = mean(cond(inrange(year, 1914, 1933), pit, .)) if neutral_ww1
                  
                  tab country year if classification == 1
                  
                            |                          Year
                     country |      1916       1917       1918       1944       1945 |     Total
                  -----------+-------------------------------------------------------+----------
                          AR |         0          0          0          1          1 |         2
                          EG |         0          0          0          0          1 |         1
                          PT |         1          1          1          0          0 |         3
                  -----------+-------------------------------------------------------+----------
                       Total |         1          1          1          1          2 |         6
                  
                  
                  tab classification participant
                  
                  classificat |  Participant status
                          ion |         0          1 |     Total
                  ------------+----------------------+----------
                      Neutral |     1,452          0 |     1,452
                   Late Entry |         0          6 |         6
                  Belligerent |         0        306 |       306
                  ------------+----------------------+----------
                        Total |     1,452        312 |     1,764
                  France (FR) is the controlling country, where everything works. We've Problems for the USA, Belgium, Bulgaria, Chile (CL), Czech Republic, Greek, Italy and Peru.
                  Code:
                  dataex if inlist(country, "FR" "US", "BG", "BE" "CL" "CZ" "GR" "IT" "PE")
                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input int year str2 country byte participant double pit float years_participation byte classification float(neutral_ww1 avg_pit_neu)
                  1910 "BE" 0    .  0 0 0 .
                  1911 "BE" 0    .  0 0 0 .
                  1912 "BE" 0    .  0 0 0 .
                  1913 "BE" 0    .  0 0 0 .
                  1914 "BE" 1    .  1 2 0 .
                  1915 "BE" 1    .  2 2 0 .
                  1916 "BE" 1    .  3 2 0 .
                  1917 "BE" 1    .  4 2 0 .
                  1918 "BE" 1    .  5 2 0 .
                  1919 "BE" 0    .  0 0 0 .
                  1920 "BE" 0   20  0 0 0 .
                  1921 "BE" 0   20  0 0 0 .
                  1922 "BE" 0   20  0 0 0 .
                  1923 "BE" 0   20  0 0 0 .
                  1924 "BE" 0   40  0 0 0 .
                  1925 "BE" 0   40  0 0 0 .
                  1926 "BE" 0   40  0 0 0 .
                  1927 "BE" 0   40  0 0 0 .
                  1928 "BE" 0   40  0 0 0 .
                  1929 "BE" 0   40  0 0 0 .
                  1930 "BE" 0   40  0 0 0 .
                  1931 "BE" 0   24  0 0 0 .
                  1932 "BE" 0   24  0 0 0 .
                  1933 "BE" 0   24  0 0 0 .
                  1934 "BE" 0 32.5  0 0 0 .
                  1935 "BE" 0 32.5  0 0 0 .
                  1936 "BE" 0 32.5  0 0 0 .
                  1937 "BE" 0 32.5  0 0 0 .
                  1938 "BE" 0 32.5  0 0 0 .
                  1939 "BE" 0   35  0 0 0 .
                  1940 "BE" 1   35  6 2 0 .
                  1941 "BE" 1   35  7 2 0 .
                  1942 "BE" 1   35  8 2 0 .
                  1943 "BE" 1   35  9 2 0 .
                  1944 "BE" 1   35 10 2 0 .
                  1945 "BE" 1   35 11 2 0 .
                  1946 "BE" 0   35  0 0 0 .
                  1947 "BE" 0   35  0 0 0 .
                  1948 "BE" 0   54  0 0 0 .
                  1949 "BE" 0   54  0 0 0 .
                  1950 "BE" 0   54  0 0 0 .
                  1951 "BE" 0   80  0 0 0 .
                  1952 "BE" 0   80  0 0 0 .
                  1953 "BE" 0   80  0 0 0 .
                  1954 "BE" 0   80  0 0 0 .
                  1955 "BE" 0   80  0 0 0 .
                  1910 "BG" 0    .  0 0 0 .
                  1911 "BG" 0    .  0 0 0 .
                  1912 "BG" 0    .  0 0 0 .
                  1913 "BG" 0    .  0 0 0 .
                  1914 "BG" 0    .  0 0 0 .
                  1915 "BG" 1    .  1 2 0 .
                  1916 "BG" 1    .  2 2 0 .
                  1917 "BG" 1    .  3 2 0 .
                  1918 "BG" 1    .  4 2 0 .
                  1919 "BG" 0    .  0 0 0 .
                  1920 "BG" 0    .  0 0 0 .
                  1921 "BG" 0    .  0 0 0 .
                  1922 "BG" 0    .  0 0 0 .
                  1923 "BG" 0    .  0 0 0 .
                  1924 "BG" 0    .  0 0 0 .
                  1925 "BG" 0    .  0 0 0 .
                  1926 "BG" 0    .  0 0 0 .
                  1927 "BG" 0    .  0 0 0 .
                  1928 "BG" 0    .  0 0 0 .
                  1929 "BG" 0    .  0 0 0 .
                  1930 "BG" 0    .  0 0 0 .
                  1931 "BG" 0    .  0 0 0 .
                  1932 "BG" 0    .  0 0 0 .
                  1933 "BG" 0    .  0 0 0 .
                  1934 "BG" 0    .  0 0 0 .
                  1935 "BG" 0    .  0 0 0 .
                  1936 "BG" 0    .  0 0 0 .
                  1937 "BG" 0    .  0 0 0 .
                  1938 "BG" 0    .  0 0 0 .
                  1939 "BG" 0    .  0 0 0 .
                  1940 "BG" 0    .  0 0 0 .
                  1941 "BG" 1    .  5 2 0 .
                  1942 "BG" 1    .  6 2 0 .
                  1943 "BG" 1    .  7 2 0 .
                  1944 "BG" 1    .  8 2 0 .
                  1945 "BG" 1    .  9 2 0 .
                  1946 "BG" 0    .  0 0 0 .
                  1947 "BG" 0    .  0 0 0 .
                  1948 "BG" 0    .  0 0 0 .
                  1949 "BG" 0    .  0 0 0 .
                  1950 "BG" 0    .  0 0 0 .
                  1951 "BG" 0    .  0 0 0 .
                  1952 "BG" 0    .  0 0 0 .
                  1953 "BG" 0    .  0 0 0 .
                  1954 "BG" 0    .  0 0 0 .
                  1955 "BG" 0    .  0 0 0 .
                  1910 "FR" 0    .  0 0 0 .
                  1911 "FR" 0    .  0 0 0 .
                  1912 "FR" 0    .  0 0 0 .
                  1913 "FR" 0    .  0 0 0 .
                  1914 "FR" 1    .  1 2 0 .
                  1915 "FR" 1    2  2 2 0 .
                  1916 "FR" 1    2  3 2 0 .
                  1917 "FR" 1   10  4 2 0 .
                  end
                  label values classification classification
                  label def classification 0 "Neutral", modify
                  label def classification 2 "Belligerent", modify
                  I hope this is correct and that it helps.


                  Ah, I see what my problem was. The data set is an imported Excel table. There are no "." displayed in it, but empty fields. I only noticed the dot when I did a "browse" in Stata. As you recommended, this should be avoided.
                  I have corrected everything in the variable "participant". The "." are replaced by 0. I apologize for my mistake.
                  Furthermore, shouldn't this also lead to errors in the tax rate? On the one hand, for some years there are no values for a country. But giving this non-existing data a "0" would mess up the calculation. On the other hand, the values are not defined, and therefore ".", which has already led to errors among the variable "participant".


                  Exactly, I want to determine the variable avg_pit_neu for all neutral countries.
                  The average tax rate of all neutral countries is to be determined for each year. If I do this manually, the result is a ( chronologically ) constant, fluctuating variable. I would like to achieve the same result with Stata. So in the direction of your suggested code.

                  However, I get the following graphic:
                  Code:
                   preserve
                  keep if neutral_ww1
                  collapse (mean) avg_pit_new, by(year)
                  graph twoway line avg_pit_new year, sort
                  restore
                    
                  Click image for larger version

Name:	avg_pit_neu.jpg
Views:	1
Size:	17.6 KB
ID:	1745603
                  If I roughly calculate a general average tax rat of all neutral countries, the graph should fluctuate more.

                  Thanks for the explanation. So far, I have mainly worked with Excel. For this research work, however, I need better evaluations and have therefore switched to Stata. I hope to become more familiar with the terminology and application.

                  Comment


                  • #10
                    OK, things went awry because I confused the variables participant (which was in the original data) and war (which I created but then forgot about). The calculation of classification has to be done separately for each war, and the code failed to do that because the -by- prefix included participant instead of war. Here is corrected code:

                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input int year str2 country byte participant double pit
                    1910 "BE" 0    .
                    1911 "BE" 0    .
                    1912 "BE" 0    .
                    1913 "BE" 0    .
                    1914 "BE" 1    .
                    1915 "BE" 1    .
                    1916 "BE" 1    .
                    1917 "BE" 1    .
                    1918 "BE" 1    .
                    1919 "BE" 0    .
                    1920 "BE" 0   20
                    1921 "BE" 0   20
                    1922 "BE" 0   20
                    1923 "BE" 0   20
                    1924 "BE" 0   40
                    1925 "BE" 0   40
                    1926 "BE" 0   40
                    1927 "BE" 0   40
                    1928 "BE" 0   40
                    1929 "BE" 0   40
                    1930 "BE" 0   40
                    1931 "BE" 0   24
                    1932 "BE" 0   24
                    1933 "BE" 0   24
                    1934 "BE" 0 32.5
                    1935 "BE" 0 32.5
                    1936 "BE" 0 32.5
                    1937 "BE" 0 32.5
                    1938 "BE" 0 32.5
                    1939 "BE" 0   35
                    1940 "BE" 1   35
                    1941 "BE" 1   35
                    1942 "BE" 1   35
                    1943 "BE" 1   35
                    1944 "BE" 1   35
                    1945 "BE" 1   35
                    1946 "BE" 0   35
                    1947 "BE" 0   35
                    1948 "BE" 0   54
                    1949 "BE" 0   54
                    1950 "BE" 0   54
                    1951 "BE" 0   80
                    1952 "BE" 0   80
                    1953 "BE" 0   80
                    1954 "BE" 0   80
                    1955 "BE" 0   80
                    1910 "BG" 0    .
                    1911 "BG" 0    .
                    1912 "BG" 0    .
                    1913 "BG" 0    .
                    1914 "BG" 0    .
                    1915 "BG" 1    .
                    1916 "BG" 1    .
                    1917 "BG" 1    .
                    1918 "BG" 1    .
                    1919 "BG" 0    .
                    1920 "BG" 0    .
                    1921 "BG" 0    .
                    1922 "BG" 0    .
                    1923 "BG" 0    .
                    1924 "BG" 0    .
                    1925 "BG" 0    .
                    1926 "BG" 0    .
                    1927 "BG" 0    .
                    1928 "BG" 0    .
                    1929 "BG" 0    .
                    1930 "BG" 0    .
                    1931 "BG" 0    .
                    1932 "BG" 0    .
                    1933 "BG" 0    .
                    1934 "BG" 0    .
                    1935 "BG" 0    .
                    1936 "BG" 0    .
                    1937 "BG" 0    .
                    1938 "BG" 0    .
                    1939 "BG" 0    .
                    1940 "BG" 0    .
                    1941 "BG" 1    .
                    1942 "BG" 1    .
                    1943 "BG" 1    .
                    1944 "BG" 1    .
                    1945 "BG" 1    .
                    1946 "BG" 0    .
                    1947 "BG" 0    .
                    1948 "BG" 0    .
                    1949 "BG" 0    .
                    1950 "BG" 0    .
                    1951 "BG" 0    .
                    1952 "BG" 0    .
                    1953 "BG" 0    .
                    1954 "BG" 0    .
                    1955 "BG" 0    .
                    1910 "FR" 0    .
                    1911 "FR" 0    .
                    1912 "FR" 0    .
                    1913 "FR" 0    .
                    1914 "FR" 1    .
                    1915 "FR" 1    2
                    1916 "FR" 1    2
                    1917 "FR" 1   10
                    end
                    
                    
                    //
                    label variable year "Year"
                    label variable pit "Personal income tax rate"
                    label variable participant "Participant status"
                    
                    
                    //
                    replace participant = 0 if missing(participant)
                    
                    gen byte war = 1 if inrange(year, 1914, 1918)
                    replace war = 2 if inrange(year, 1939, 1945)
                    
                    by country war (year), sort: gen years_participation = sum(participant != 0)
                    by country war (years_participation), sort: gen byte classification:classification = 0 if years_participation[_N] == 0
                    by country war (years_participation): replace classification = cond(years_participation[_N] < 5, 1, 2) if missing(classification)
                    label values classification classification
                    label def classification 0 "Neutral", modify
                    label def classification 1 "Late Entry", modify
                    label def classification 2 "Belligerent", modify
                    This gets the classifications right.

                    Concerning replacing missing values with 0, let me clarify. The point I was trying to make is that with yes/no variables, it is problematic to code those as 1/missing value. They should be coded as 1/0, with missing values reserved for those observations where it really isn't known whether the correct value should be yes or no. Missing values should always be used to denote unknown/unavailable/not applicable information, in any variable. They should not be used as a code for a known value. So, you are right to replace the missings in participant with 0, because we know the participation status of every country in every year. But you are also right not to replace missing values of pit in situations where we don't know them or where they might not be applicable.

                    I'm not 100% certain I understand what you are looking for in the average tax value. It seems you want a value for each year that represents the average across all of the neutral countries. In that case, you would do:
                    Code:
                    by country (year), sort: egen neutral_ww1 = min(cond(inrange(year, 1914, 1918), classification == 0, .))
                    by year (country), sort: egen avg_pit_neu = mean(cond(inrange(year, 1914, 1933) & neutral_ww1, pit, .))
                    I can't test that in the example data provided because all of the countries in it are either belligerent or late entry for WW1. But I'm pretty sure this code does what I understand you to want. Of course, if I still don't understand what you want properly, we'll have to try again!

                    Comment


                    • #11
                      Once again, many thanks for the code and help.
                      This makes World War I look correct. But when I transferred the same code to World War II, I realized that something can't be right. Allegedly WW2 has no late participants. Here is the graphic.
                      Click image for larger version

Name:	Taxes-Changes_WWI_States.jpg
Views:	1
Size:	36.8 KB
ID:	1746240

                      Click image for larger version

Name:	Taxes-Changes_WWII_States.jpg
Views:	1
Size:	34.2 KB
ID:	1746241



                      Generated with the following code:
                      Code:
                      **Start
                      clear all
                      capture log close
                      set more off
                      
                      cd "U:\Dokumente\Thesis"
                      use Taxes_Masterfile.dta
                      
                      label variable year "Year"
                      label variable pit "Personal income tax rate"
                      label variable participant "Participant status"
                      
                      
                      //Changes during WWI and WWII
                      replace participant = 0 if missing(participant)
                      
                      gen byte war = 1 if inrange(year, 1914, 1918)
                      replace war = 2 if inrange(year, 1939, 1945)
                      
                      by country war (year), sort: gen years_participation = sum(participant != 0)
                      by country war (years_participation), sort: gen byte classification:classification = 0 if years_participation[_N] == 0
                      by country war (years_participation): replace classification = cond(years_participation[_N] < 5, 1, 2) if missing(classification)
                      label values classification classification
                      label def classification 0 "Neutral", modify
                      label def classification 1 "Late Entry", modify
                      label def classification 2 "Belligerent", modify
                      
                      
                      *WWI
                      by country (year), sort: egen neutral_ww1 = min(cond(inrange(year, 1914, 1918), classification == 0, .))
                      by year (country), sort: egen avg_ww1_pit_neu = mean(cond(inrange(year, 1912, 1933) & neutral_ww1, pit, .))
                      
                      by country (year), sort: egen late_ww1 = min(cond(inrange(year, 1914, 1918), classification == 1, .))
                      by year (country), sort: egen avg_ww1_pit_late = mean(cond(inrange(year, 1912, 1933) & late_ww1, pit, .))
                      
                      by country (year), sort: egen belligerent_ww1 = min(cond(inrange(year, 1914, 1918), classification == 2, .))
                      by year (country), sort: egen avg_ww1_pit_belligerent = mean(cond(inrange(year, 1912, 1933) & belligerent_ww1, pit, .))
                      
                      by country (year), sort: egen all_ww1 = min(cond(inrange(year, 1914, 1918), classification, .))
                      by year (country), sort: egen avg_ww1_pit_all = mean(cond(inrange(year, 1912, 1933) & all_ww1, pit, .))
                      
                      
                      *WWII
                      by country (year), sort: egen neutral_ww2 = min(cond(inrange(year, 1939, 1945), classification == 0, .))
                      by year (country), sort: egen avg_ww2_pit_neu = mean(cond(inrange(year, 1937, 1960) & neutral_ww2, pit, .))
                      
                      by country (year), sort: egen late_ww2 = min(cond(inrange(year, 1939, 1945), classification == 1, .))
                      by year (country), sort: egen avg_ww2_pit_late = mean(cond(inrange(year, 1937, 1960) & late_ww2, pit, .))
                      
                      by country (year), sort: egen belligerent_ww2 = min(cond(inrange(year, 1939, 1945), classification == 2, .))
                      by year (country), sort: egen avg_ww2_pit_belligerent = mean(cond(inrange(year, 1937, 1960) & belligerent_ww2, pit, .))
                      
                      by country (year), sort: egen all_ww2 = min(cond(inrange(year, 1939, 1945), classification, .))
                      by year (country), sort: egen avg_ww2_pit_all = mean(cond(inrange(year, 1937, 1960) & all_ww2, pit, .))
                      
                      
                      *Label
                      label variable years_participation "Years in war"
                      label variable war "Which World War"
                      label variable avg_ww1_pit_neu "Average PIT WWW1-Neutral States"
                      label variable avg_ww1_pit_late "Average PIT WWW1-Late Entry States"
                      label variable avg_ww1_pit_belligerent "Average PIT WWW1-Belligerent States"
                      label variable avg_ww1_pit_all "Average PIT WWW1-All States"
                      
                      label variable avg_ww2_pit_neu "Average PIT WWW2-Neutral States"
                      label variable avg_ww2_pit_late "Average PIT WWW2-Late Entry States"
                      label variable avg_ww2_pit_belligerent "Average PIT WWW2-Belligerent States"
                      label variable avg_ww2_pit_all "Average PIT WWW2-All States"
                      
                      tab classification participant
                      
                      tab classification participant if year == 1945
                      
                      *PIT of the classes from 1937 to 1960
                      twoway (line avg_ww2_pit_neu year, legend(label(1 "Neutral States"))) (line avg_ww2_pit_late year, legend(label(2 "Late War Entry"))) (line avg_ww2_pit_belligerent year, legend(label(3 "Belligerent"))) (line avg_ww2_pit_all year, lpattern(dash) legend(label(4 "Average of all States"))) if year>1935 & year<1960, xscale(log) ytitle("Personal income tax in %") xtitle("Tax development in WW2") xlabel(1937(2)1960) ylabel(0(10)70) xline(1939) xline(1945)
                      If we ask Stata who actually joined later during WW2, we get the following message:
                      Code:
                       tab year country if classification==1 & year > 1939 & year < 1945
                      
                                 |        country
                            Year |        AR         EG |     Total
                      -----------+----------------------+----------
                            1940 |         1          1 |         2
                            1941 |         1          1 |         2
                            1942 |         1          1 |         2
                            1943 |         1          1 |         2
                            1944 |         1          1 |         2
                      -----------+----------------------+----------
                           Total |         5          5 |        10
                      Peru, USA and Greece should actually be included.

                      I tried to create some information again via -dataex-. Unfortunately, Stata refuses to give me the output
                      Code:
                      . dataex if inlist(country, "FR" "US", "GR", "PE")
                      input statement exceeds linesize limit. Try specifying fewer variables
                      Therefore I uploaded my .dta to my university's server, maybe that will help more in locating the problem.



                      Originally posted by Clyde Schechter View Post
                      Concerning replacing missing values with 0, let me clarify. The point I was trying to make is that with yes/no variables, it is problematic to code those as 1/missing value. They should be coded as 1/0, with missing values reserved for those observations where it really isn't known whether the correct value should be yes or no. Missing values should always be used to denote unknown/unavailable/not applicable information, in any variable. They should not be used as a code for a known value. So, you are right to replace the missings in participant with 0, because we know the participation status of every country in every year. But you are also right not to replace missing values of pit in situations where we don't know them or where they might not be applicable.
                      All right, thanks for the explanation!


                      Originally posted by Clyde Schechter View Post
                      I'm not 100% certain I understand what you are looking for in the average tax value. It seems you want a value for each year that represents the average across all of the neutral countries. In that case, you would do:
                      Using the average tax value of the different classes of countries (neutral, late entry, belligerent) I would want to see if the class-status has an effect on tax rates.


                      Originally posted by Clyde Schechter View Post
                      I can't test that in the example data provided because all of the countries in it are either belligerent or late entry for WW1. But I'm pretty sure this code does what I understand you to want. Of course, if I still don't understand what you want properly, we'll have to try again!
                      Looking at the output-grahpic for WWI, I'd say it's exactly what I need. Thank you!

                      Last edited by Tobias Kampf; 11 Mar 2024, 09:36.

                      Comment


                      • #12
                        The message you are getting from -dataex- arises because you are trying to show all the variables in your data set, and it happens that there are more of them than can be accommodated by -dataex-. But we don't need all of them. Please do this:
                        Code:
                        dataex year country participant pit if inlist(country, "FR" "US", "GR", "PE")
                        That will give all of the variables that we need to work with for the present purpose. I appreciate your effort in posting your full data set to a server, but it is my practice to not download anything from people I do not know.

                        Allegedly WW2 has no late participants. Here is the graphic.
                        I don't understand: the graphic for WW2 clearly shows a curve of results for the late entry group. What appears to be missing from the graph is the average of all countries curve.

                        Nevertheless, if Greece and Peru are not being recognized as late entrants, then something is wrong with either the code or the data. If you respond with the -dataex- requested, I'll try to figure it out.

                        Comment


                        • #13
                          I see. All right, here's the output:
                          Code:
                          * Example generated by -dataex-. For more info, type help dataex
                          clear
                          input int year str2 country byte participant double pit
                          1910 "FR" 0     .
                          1910 "GR" 0     .
                          1910 "PE" 0     .
                          1911 "FR" 0     .
                          1911 "GR" 0     .
                          1911 "PE" 0     .
                          1912 "FR" 0     .
                          1912 "GR" 0     .
                          1912 "PE" 0     .
                          1913 "FR" 0     .
                          1913 "GR" 0     .
                          1913 "PE" 0     .
                          1914 "FR" 1     .
                          1914 "GR" 0     .
                          1914 "PE" 0     .
                          1915 "FR" 1     2
                          1915 "GR" 0     .
                          1915 "PE" 0     .
                          1916 "FR" 1     2
                          1916 "GR" 0     .
                          1916 "PE" 0     .
                          1917 "FR" 1    10
                          1917 "GR" 1     .
                          1917 "PE" 0     .
                          1918 "FR" 1    20
                          1918 "GR" 1     .
                          1918 "PE" 0     .
                          1919 "FR" 0    20
                          1919 "GR" 0     .
                          1919 "PE" 0     .
                          1920 "FR" 0    50
                          1920 "GR" 0     .
                          1920 "PE" 0     .
                          1921 "FR" 0    50
                          1921 "GR" 0     .
                          1921 "PE" 0     .
                          1922 "FR" 0    50
                          1922 "GR" 0     .
                          1922 "PE" 0     .
                          1923 "FR" 0    60
                          1923 "GR" 0     .
                          1923 "PE" 0     .
                          1924 "FR" 0    72
                          1924 "GR" 0     .
                          1924 "PE" 0     .
                          1925 "FR" 0    60
                          1925 "GR" 0     .
                          1925 "PE" 0     .
                          1926 "FR" 0    60
                          1926 "GR" 0     .
                          1926 "PE" 0     .
                          1927 "FR" 0    30
                          1927 "GR" 0     .
                          1927 "PE" 0     .
                          1928 "FR" 0    30
                          1928 "GR" 0     .
                          1928 "PE" 0     .
                          1929 "FR" 0 33.33
                          1929 "GR" 0     .
                          1929 "PE" 0     .
                          1930 "FR" 0 33.33
                          1930 "GR" 0     .
                          1930 "PE" 0     .
                          1931 "FR" 0 33.33
                          1931 "GR" 0     .
                          1931 "PE" 0     .
                          1932 "FR" 0 36.67
                          1932 "GR" 0     .
                          1932 "PE" 0     .
                          1933 "FR" 0 36.67
                          1933 "GR" 0     .
                          1933 "PE" 0     .
                          1934 "FR" 0    24
                          1934 "GR" 0     .
                          1934 "PE" 0     .
                          1935 "FR" 0    24
                          1935 "GR" 0     .
                          1935 "PE" 0     .
                          1936 "FR" 0    24
                          1936 "GR" 0     .
                          1936 "PE" 0     .
                          1937 "FR" 0    40
                          1937 "GR" 0     .
                          1937 "PE" 0     .
                          1938 "FR" 0    40
                          1938 "GR" 0     .
                          1938 "PE" 0     .
                          1939 "FR" 1    40
                          1939 "GR" 0     .
                          1939 "PE" 0     .
                          1940 "FR" 1    40
                          1940 "GR" 1     .
                          1940 "PE" 0     .
                          1941 "FR" 1    40
                          1941 "GR" 1     .
                          1941 "PE" 1     .
                          1942 "FR" 1    40
                          1942 "GR" 1     .
                          1942 "PE" 1     .
                          1943 "FR" 1    70
                          end
                          Originally posted by Clyde Schechter View Post
                          I don't understand: the graphic for WW2 clearly shows a curve of results for the late entry group. What appears to be missing from the graph is the average of all countries curve.

                          Nevertheless, if Greece and Peru are not being recognized as late entrants, then something is wrong with either the code or the data. If you respond with the -dataex- requested, I'll try to figure it out.
                          What irritates me is that according to the legend, Belligerent should be green, Average of all States should be dashes and some curve is missing.

                          Comment


                          • #14
                            In the new example data, the countries are properly classified as to their participation in WW2:
                            Code:
                            . dtable i.classification if war == 2, by(country, nototals)
                            
                            -----------------------------------------------
                                                        country            
                                               FR         GR         PE    
                            -----------------------------------------------
                            N               5 (38.5%)  4 (30.8%)  4 (30.8%)
                            classification                                 
                              Late Entry     0 (0.0%) 4 (100.0%) 4 (100.0%)
                              Belligerent  5 (100.0%)   0 (0.0%)   0 (0.0%)
                            -----------------------------------------------
                            However, the command
                            Code:
                            by country (year), sort: egen all_ww2 = min(cond(inrange(year, 1939, 1945), classification, .))
                            may do what you want. If a country was neutral in WW2, its classification in the years 1939 through 1945 will be 0 (because that is the code for neutral). Consequently all_ww2 will be set to zero for these countries, and they will be excluded from the subsequent calculation of avg_ww2_pit_all. If that is what you want (including only the belligerents and late entries in the average), then the code looks right. But if you intend to include the neutrals in the average of all (which sounds like what "all" might mean), then it needs to be changed. As I don't know your intention here, I'll leave it to you to process this. (The same, by the way, will be true for WW1.)

                            It is hard for me to troubleshoot the graph problem you are having because the current data example includes no neutrals (and so it shouldn't have a line for neutrals). I do notice, however, that for the late entering countries, GR, and PE, the variable pit is always missing! Consequently there is no line for late entrants either. The graph consists only of the line for belligerents and the average of all line. But since there are no neutrals in the data, and the late entrants have no pit values, the average of all line is the same as the belligerents line and overlies it. (As one is dashed and the other solid, they both look dashed in the graph.) Anyway, I think what you have here is a data problem, not a code problem.

                            Comment


                            • #15
                              Originally posted by Clyde Schechter View Post
                              In the new example data, the countries are properly classified as to their participation in WW2:
                              Code:
                              . dtable i.classification if war == 2, by(country, nototals)
                              
                              -----------------------------------------------
                              country
                              FR GR PE
                              -----------------------------------------------
                              N 5 (38.5%) 4 (30.8%) 4 (30.8%)
                              classification
                              Late Entry 0 (0.0%) 4 (100.0%) 4 (100.0%)
                              Belligerent 5 (100.0%) 0 (0.0%) 0 (0.0%)
                              -----------------------------------------------
                              However, the command
                              Code:
                              by country (year), sort: egen all_ww2 = min(cond(inrange(year, 1939, 1945), classification, .))
                              may do what you want. If a country was neutral in WW2, its classification in the years 1939 through 1945 will be 0 (because that is the code for neutral). Consequently all_ww2 will be set to zero for these countries, and they will be excluded from the subsequent calculation of avg_ww2_pit_all. If that is what you want (including only the belligerents and late entries in the average), then the code looks right. But if you intend to include the neutrals in the average of all (which sounds like what "all" might mean), then it needs to be changed. As I don't know your intention here, I'll leave it to you to process this. (The same, by the way, will be true for WW1.)
                              I see. Though I want to include all 3 classifications in the calculation of avg_ww2_pit_all (i.e. belligerent countries, neutrals and the late entries should be included in the average). I assumed that the neutrals would be included in the average of all, due to "classification" in "by country (year), [...]". Do I understand correctly that because neutral countries are declared as 0, they are excluded?



                              Originally posted by Clyde Schechter View Post
                              It is hard for me to troubleshoot the graph problem you are having because the current data example includes no neutrals (and so it shouldn't have a line for neutrals). I do notice, however, that for the late entering countries, GR, and PE, the variable pit is always missing! Consequently there is no line for late entrants either. The graph consists only of the line for belligerents and the average of all line. But since there are no neutrals in the data, and the late entrants have no pit values, the average of all line is the same as the belligerents line and overlies it. (As one is dashed and the other solid, they both look dashed in the graph.) Anyway, I think what you have here is a data problem, not a code problem.

                              That's right, I hadn't thought of that. Unfortunately, the values for PE & GR are not available. Sorry, that was my mistake.
                              I also checked whether the lines are superimposed. You are absolutely right. That explains it.

                              I'm still stumbling a bit over the classification because I think something is not quite right. For example, the USA:
                              Code:
                              . tab participant  year if country=="US" & year > 1938 & year < 1946
                              
                              Participan |                                     Year
                                t status |      1939       1940       1941       1942       1943       1944       1945 |     Total
                              -----------+-----------------------------------------------------------------------------+----------
                                       0 |         1          1          0          0          0          0          0 |         2
                                       1 |         0          0          1          1          1          1          1 |         5
                              -----------+-----------------------------------------------------------------------------+----------
                                   Total |         1          1          1          1          1          1          1 |         7
                              
                              
                              
                              . tab classification year if country=="US" & year > 1938 & year < 1946
                              
                              classificat |                                     Year
                                      ion |      1939       1940       1941       1942       1943       1944       1945 |     Total
                              ------------+-----------------------------------------------------------------------------+----------
                              Belligerent |         1          1          1          1          1          1          1 |         7
                              ------------+-----------------------------------------------------------------------------+----------
                                    Total |         1          1          1          1          1          1          1 |         7

                              I may be making a mistake or reading the output incorrectly. But doesn't it tell me that the USA counts as a belligerent even though it only joined later?

                              I have included your code addition for the US from post #4 and the graphical results vary more. Of course, this is because the US has annual PIT values and therefore has a greater impact on the result given the limited data available.
                              Code:
                              gen war = 0
                              replace war = 1 if inrange(year, 1914, 1918)
                              replace war = 2 if inrange(year, 1939, 1945) & country != "US"
                              replace war = 2 if inrange(year, 1943, 1945) & country == "US"

                              -dataex-, if I do not make the adjustments for the US in the code:
                              Code:
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input int year str2 country byte(classification participant) double pit
                              1910 "FR" 0 0     .
                              1910 "US" 0 0     .
                              1911 "FR" 0 0     .
                              1911 "US" 0 0     .
                              1912 "FR" 0 0     .
                              1912 "US" 0 0     .
                              1913 "FR" 0 0     .
                              1913 "US" 0 0     7
                              1914 "FR" 2 1     .
                              1914 "US" 1 0     7
                              1915 "FR" 2 1     2
                              1915 "US" 1 0     7
                              1916 "FR" 2 1     2
                              1916 "US" 1 0    15
                              1917 "FR" 2 1    10
                              1917 "US" 1 1    67
                              1918 "FR" 2 1    20
                              1918 "US" 1 1    77
                              1919 "FR" 0 0    20
                              1919 "US" 0 0    73
                              1920 "FR" 0 0    50
                              1920 "US" 0 0    73
                              1921 "FR" 0 0    50
                              1921 "US" 0 0    73
                              1922 "FR" 0 0    50
                              1922 "US" 0 0    56
                              1923 "FR" 0 0    60
                              1923 "US" 0 0    56
                              1924 "FR" 0 0    72
                              1924 "US" 0 0    46
                              1925 "FR" 0 0    60
                              1925 "US" 0 0    25
                              1926 "FR" 0 0    60
                              1926 "US" 0 0    25
                              1927 "FR" 0 0    30
                              1927 "US" 0 0    25
                              1928 "FR" 0 0    30
                              1928 "US" 0 0    25
                              1929 "FR" 0 0 33.33
                              1929 "US" 0 0    24
                              1930 "FR" 0 0 33.33
                              1930 "US" 0 0    25
                              1931 "FR" 0 0 33.33
                              1931 "US" 0 0    25
                              1932 "FR" 0 0 36.67
                              1932 "US" 0 0    63
                              1933 "FR" 0 0 36.67
                              1933 "US" 0 0    63
                              1934 "FR" 0 0    24
                              1934 "US" 0 0    63
                              1935 "FR" 0 0    24
                              1935 "US" 0 0    63
                              1936 "FR" 0 0    24
                              1936 "US" 0 0    79
                              1937 "FR" 0 0    40
                              1937 "US" 0 0    79
                              1938 "FR" 0 0    40
                              1938 "US" 0 0    79
                              1939 "FR" 2 1    40
                              1939 "US" 2 0    79
                              1940 "FR" 2 1    40
                              1940 "US" 2 0  81.1
                              1941 "FR" 2 1    40
                              1941 "US" 2 1    81
                              1942 "FR" 2 1    40
                              1942 "US" 2 1    88
                              1943 "FR" 2 1    70
                              1943 "US" 2 1    88
                              1944 "FR" 2 1    70
                              1944 "US" 2 1    94
                              1945 "FR" 2 1    70
                              1945 "US" 2 1    94
                              1946 "FR" 0 0    60
                              1946 "US" 0 0 86.45
                              1947 "FR" 0 0    60
                              1947 "US" 0 0 86.45
                              1948 "FR" 0 0    60
                              1948 "US" 0 0 82.13
                              1949 "FR" 0 0    60
                              1949 "US" 0 0 82.13
                              1950 "FR" 0 0    60
                              1950 "US" 0 0    91
                              1951 "FR" 0 0    60
                              1951 "US" 0 0    91
                              1952 "FR" 0 0    60
                              1952 "US" 0 0    92
                              1953 "FR" 0 0    60
                              1953 "US" 0 0    92
                              1954 "FR" 0 0    60
                              1954 "US" 0 0    91
                              1955 "FR" 0 0    60
                              1955 "US" 0 0    91
                              end
                              label values classification classification
                              label def classification 0 "Neutral", modify
                              label def classification 1 "Late Entry", modify
                              label def classification 2 "Belligerent", modify
                              Last edited by Tobias Kampf; 13 Mar 2024, 06:36.

                              Comment

                              Working...
                              X