Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate Average Values for Current Country and Match them to the Origin Country of the Respondent

    I am working with the time-series data of World Values Survey appended from 1980-2022. I am interested in calculating belief about tax evasion amongst immigrants. For that I have the following data -

    TaxEvasion CurrentCountry ImmigrantDummy AverageTaxEvasion OriginCountry
    0 Israel 0 1.87 .
    0 US 0 1.94 .
    1 UK 1 . Bangaldesh
    2 India 1 . Sri Lanka
    5 Pakistan 1 . Afghanistan
    8 Pakistan 1 . US
    10 Mexico 1 . US
    3 Israel 1 . Russia
    8 China 0 3.98 Israel

    AverageTaxEvasion is calculated by:

    egen AvgTaxEvasion31 = mean(TaxEvasion) if ImmigrantDummy == 0, by(CurrentCountry)


    Now the average taxation is calculated according to the non-immigrant population data.

    and it corresponds to Current Country Values so Israel current country is 1.87 in terms of AvgTaxEvasion and 1.94 in the US for AvgTaxEvasion Score.


    I want this score to correspond next to the OriginCountry, so when US comes in the OriginCountry, it should display the AvgTaxEvasion Score of 1.94. Similary for Israel and other countries.

    To achieve this, I have tried 1:m merge but it is wrong, can you please help, it should look like this -


    TaxEvasion CurrentCountry ImmigrantDummy AverageTaxEvasion OriginCountry
    4 Israel 0 .
    8 US 0 .
    1 UK 1 . Bangaldesh
    2 India 1 . Sri Lanka
    5 Pakistan 1 . Afghanistan
    8 Pakistan 1 1.94 US
    10 Mexico 1 1.94 US
    3 Israel 1 . Russia
    8 China 0 1.87 Israel


    Additionally,
    I want to do this so that i can run regression: TaxEvasion = ImmigrantDummy##AvgTaxEvasion + controls to see if being from an origin country where average tax evasion is high leads an immigrant to have opinion about evading taxes even higher than an immigrant being from an origincountry where average tax evasion is low. So the hypothesis is that the interaction term will have a high statistically significant variable.

    Is this the right way of doing this or am I thinking about even the regression wrong?

    Any help will go a long way. THANK YOU SO MUCH.


Working...
X