Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help on reshaping data for more than one "i"

    Hi all,

    I am doing a research on measuring banking competition across EU28 countries. I was trying to reshape the dataset from wide to long but it was not successful. I have tried the following command:


    Code:
    reshape long roaa tr, i(id) j(year)
    OR

    Code:
    reshape long roaa  tr, i(bankname country) j(year)
    Basically, an error "variable id does not uniquely identify the observations " came up. I have tried to look at other posts in the forum but it does not seem to apply to my cases. May anyone gives me some advice? I will be very appreciated.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id str11 country str159 bankname double(roaa2016 roaa2015 roaa2014 roaa2013 roaa2012 roaa2011) long(tr2016 tr2015 tr2014 tr2013 tr2012 tr2011)
     1 "Austria" "3-Banken Wohnbaubank AG"                               .0009  .0012   .001      .      .     .    271     280     353      .       .       .
     2 "Austria" "A1 Bank AG"                                            .0627 -.0139 -.0684 -.0545  .0027 .0217  11703    8709   10246  10233    8994    9174
     3 "Austria" "ABS Factoring Bank AG"                                 .0054  .0319  .0134  .0154      .     .   3856    6064    6055   7242       .       .
     4 "Austria" "AirPlus Air Travel Card Vertriebsgesellschaft m.b.H."  .0696  .0633  .0611  .0524      .     .   8442    8504    9757  11023       .       .
     5 "Austria" "Allgemeine Sparkasse Ober”sterreich Bank AG"          .0017  .0053  .0017  .0037  .0039 .0026 262682  290356  248891 314435  277866  240277
     6 "Austria" "Alpenlaendische Garantie -Gesellschaft M.B.H."             .      .      .      .      . .0001     66       .       .    124      65      89
     7 "Austria" "Austrian Anadi Bank AG"                                .0034  .0012  .0002  .0013    .01 .0013  67522   68994   64074  82783   87550   97515
     8 "Austria" "Autobank AG"                                           .0003  .0001      0 -.0029      0 .0017  10036    9922   11136   8439    8135    9981
     9 "Austria" "Banco do Brasil AG"                                   -.0134 -.0089   .007  .0052      .     .  10126   19625   52707  31918       .       .
    10 "Austria" "Bank Austria Creditanstalt AG"                             .   .002  -.016 -.0134 -.0025 -.004      . 3194147 1343306 878307 2023218 1832307
    end
    I want the dataset to look like the following:

    Code:
     
    id country bankname year roaa tr
    1 Austria 3-Banken Wohnbaubank AG 2011
    1 Austria 3-Banken Wohnbaubank AG 2012
    1 Austria 3-Banken Wohnbaubank AG 2013
    1 Austria 3-Banken Wohnbaubank AG 2014
    1 Austria 3-Banken Wohnbaubank AG 2015
    1 Austria 3-Banken Wohnbaubank AG 2016
    AND SO ON



    Best regards and many thanks,
    Cedric
    Last edited by Cedric Hung; 26 Jul 2018, 13:52.

  • #2
    The commands work fine with the example dataset you gave.

    Have you tried to look if there are actual duplicates in your id? You can use:
    Code:
    duplicates list
    to check this.

    Comment


    • #3
      Originally posted by Jesse Tielens View Post
      The commands work fine with the example dataset you gave.

      Have you tried to look if there are actual duplicates in your id? You can use:
      Code:
      duplicates list
      to check this.
      Hi Jesse,

      Thanks for your reply. I have checked the id variable. All the id assigned to each bank is different.

      Regards,
      Cedric

      Comment


      • #4
        Adding to Jesse's advice, since reshape long reported that id did not uniquely identify observations (in the first example) and presumably (since you do not tell us) bankname and country do not uniquely identify observations (in the second example), you will want to use something like
        Code:
        duplicates report id
        or
        Code:
        duplicates report bankname country
        as a quick check on the extent of your problems with duplicates. Then
        Code:
        duplicates list id
        or
        Code:
        duplicates list bankname country
        will show you complete observations.

        It's import to include the variable names on the duplicates commands, since it's possible that you have two observations with the same id but different values for your roaa* and tr* variables.

        To improve the presentation of your future posts, and increase the likelihood of receiving a helpful answer, you should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

        Comment


        • #5
          My post #4 crossed with post #3.

          While it may be true that

          All the id assigned to each bank is different
          that doesn't mean that you don't have the same bank, with the same id, twice in your dataset.

          Stata has told you that id is not distinct - that some id's appear on more than one observation in your dataset. I trust Stata, you should too. Try the commands recommended in post #4.

          Comment


          • #6
            Originally posted by William Lisowski View Post
            Adding to Jesse's advice, since reshape long reported that id did not uniquely identify observations (in the first example) and presumably (since you do not tell us) bankname and country do not uniquely identify observations (in the second example), you will want to use something like
            Code:
            duplicates report id
            or
            Code:
            duplicates report bankname country
            as a quick check on the extent of your problems with duplicates. Then
            Code:
            duplicates list id
            or
            Code:
            duplicates list bankname country
            will show you complete observations.

            It's import to include the variable names on the duplicates commands, since it's possible that you have two observations with the same id but different values for your roaa* and tr* variables.

            To improve the presentation of your future posts, and increase the likelihood of receiving a helpful answer, you should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

            The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
            Hi William,

            Thanks so much for the specification. I have identified the problem now and I think the reshape function is working now.


            I have one more question. Since I have 28 countries in the dataset, should I make 28 dummy variable to identify the countries so that it will be easier to conduct regression in each country separately?

            Many thanks,
            Cedric
            Last edited by Cedric Hung; 26 Jul 2018, 14:49.

            Comment


            • #7
              Originally posted by Cedric Hung View Post
              Since I have 28 countries in the dataset, should I make 28 dummy variable to identify the countries so that it will be easier to conduct regression in each country separately?
              No, there's a better way.

              You should read the output of help encode to create a categorical variable that will take the values 1-28, and which will have value labels associated with it so that the numeric value will display as the country name wherever value labels are supported. Then you can easily loop over the countries to run the 28 separate regressions.
              Code:
              forvalues c=1/28 {
                  regress y x if countrycode==`c'
                  }
              Then, should you ever want to run a single regression with all 28 countries, you can read the output of help factor variables to learn how to have Stata use the categorical variable to create "virtual" dummy variables (more correctly, indicator variables) for the individual countries.
              Code:
              regress y x i.countrycode

              Comment

              Working...
              X