Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining observations for text analysis

    Dear Stata users

    I am analysis speeches and have imported txt files into stata. However, each txt file has been imported with multiple rows and I need it one row per speech (txt file) in order to carry out text analysis. I have spent today and yesterday reading the forum and trying out some of the code provided by answers similar to this thread but have got nowhere. Would someone be able to help me put all the text from multiple rows for each speech into one observation/cell please?

    Here is an example of my data. I have 26 speeches (id)

    Click image for larger version

Name:	Screenshot 2021-06-17 142117.png
Views:	1
Size:	65.9 KB
ID:	1615105




    NB: Tried to use dataex to show my problem but there were too many characters even after specifying fewer observations.

    Many thanks

  • #2
    You can do the following to provide a data example:

    Code:
    foreach var in text id{
        gen `var'2= substr(`var', 1, 200)
    }
    dataex text2 id2 in 1/30

    Comment


    • #3
      If the observations are in the right order after the import, the following may work. Otherwise, provide an example following #2.

      Code:
      gen oid= sum(id!=id[_n-1])
      gen which=_n
      bys id (which): replace which=_n
      reshape wide text id, i(oid) j(which)
      gen wanted_id= id1
      egen wanted_text= concat(text*), punct(" ")
      drop id? text?
      Last edited by Andrew Musau; 17 Jun 2021, 07:56.

      Comment


      • #4
        Thank you for your response and help. Here's the dataex:

        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str200 text2 str34 id2
        "MARCH 31, 2021 - ONLY THE DELIVERY IS AUTHENTIC ADDRESS BY THE PRESIDENT OF THE REPUBLIC My dear compatriots, from metropolitan France, overseas and abroad. The health crisis we are going through, thi" "2021.03.31 ftv macron address"
        "am speaking to you, it is to call for the mobilization of everyone for this month of April where a lot is being played out. Where are we today ? After a hard confinement during the first wave in sprin" "2021.03.31 ftv macron address"
        "each one to have the good behaviors vis-a-vis the virus, to limit its contacts, to be tested at the first symptom and to isolate oneself when one is positive. Safety, balance, responsibility: these ar" "2021.03.31 ftv macron address"
        "additional restrictions for nearly twenty departments to the national curfew which was already in force. Two weeks after the implementation of these measures, the figures are clear.. Yes, this strateg" "2021.03.31 ftv macron address"
        "is to say our children, we must, for the months to come, each provide an extra effort. This is what I am asking of us collectively this evening. An effort by caregivers first of all, to increase our r" "2021.03.31 ftv macron address"
        "deprogramming, they are and will be supported in the coming days by additional reinforcements. The number of intensive care beds has already been increased to 7,000. And here I want to thank the medic" "2021.03.31 ftv macron address"
        "entire metropolitan area, it is because no metropolitan area is spared today: everywhere the virus is circulating quickly, more and more quickly, and hospitalizations are increasing everywhere. And th" "2021.03.31 ftv macron address"
        "cross-border workers. So some were militating, I know, for the generalized return of the certificate as in March 2020. We did not choose this option. Because the irresponsibility of a few must not rui" "2021.03.31 ftv macron address"
        "it comes to schools - and this is sort of the third effort that we will be making in the coming weeks - we all need to be aware of our duties towards our youth. We can congratulate ourselves, in our c" "2021.03.31 ftv macron address"
        "necessary with suitable gauges. I know, believe me, what this reorganization involves profound changes for parents of students and for families. But it is the most suitable solution to curb the virus " "2021.03.31 ftv macron address"
        "need it the most, that is to say the oldest, the most fragile, those who are most at risk of developing serious forms. And I fully assume this priority, this order that we have given because it is the" "2021.03.31 ftv macron address"
        "hypertension or overweight, you can make an appointment today with your doctor, nurse or pharmacist who will vaccinate you directly with the vaccine. Astra Zeneca. It started last week and we will spe" "2021.03.31 ftv macron address"
        "April 16, the first dates will be granted to people between 60 and 70 years old. From May 15, the first meetings will be open for our fellow citizens who are between 50 and 60 years old. And from mid-" "2021.03.31 ftv macron address"
        "Find meeting places, shops. To rediscover this French art of living that are the restaurants and cafes that we love so much. I will get back to you soon to specify a reopening agenda, and so that ever" "2021.03.31 ftv macron address"
        "today, for the coming month, we must mobilize. To mobilize for our elders and the most vulnerable and to mobilize for our children, to protect them and allow them to continue to learn and prepare for " "2021.03.31 ftv macron address"
        "Speech by Jean Castex: press conference on measures against Covid-19 Published on: 03/19/2021 (Only the pronouncement is authentic) My dear fellow citizens, ladies and gentlemen, Every week, every Thu" "2021.03.18 fpress conference jc"
        "wave, we know the cause: the arrival of the British variant which now represents nearly three quarters of contaminations. This British variant, we knew - and I told you - it was more contagious. We ar" "2021.03.18 fpress conference jc"
        "contain the level of epidemic progression thanks to the measures adopted, measures which were not light and easy measures. Namely, since January 16, the curfew from 6 p.m. across the country, combined" "2021.03.18 fpress conference jc"
        "reached, it should be very quickly given the rates of progression that we are recording . It is true that the epidemic situation is not equivalent between the different departments, but the critical s" "2021.03.18 fpress conference jc"
        "Pas-de-Calais, the situation has stabilized but is struggling to improve. In the Alpes-Maritimes, a real decrease observed at first seems to be easing in recent days. I would like to salute the mobili" "2021.03.18 fpress conference jc"
        "approach, not automatically apply the same measures as yesterday, take into account what this epidemic taught us. In the 16 departments affected by these new measures, we will maintain the bias of lea" "2021.03.18 fpress conference jc"
        "only those subject to enhanced measures. This choice to restrict the possibilities of leaving home less however must be accompanied byThis choice to restrict the possibilities of leaving home less, ho" "2021.03.18 fpress conference jc"
        "responsibility, I would even say common sense and certainly not infantilization: let us freely take advantage of outdoor spaces but be very rigorous in banning private gatherings or in public spaces. " "2021.03.18 fpress conference jc"
        "mask is much better respected in subways and trains than in personal vehicles. It is not a question of banning carpooling - that would be totally excessive - but I want to remind all those who practic" "2021.03.18 fpress conference jc"
        "vaccine. It is a prudent step, of course, but essential in terms of trust and transparency. A few minutes ago, the European Medicines Agency gave the opinion we were expecting. He confirms that the As" "2021.03.18 fpress conference jc"
        "the national territory and to use them as quickly as possible. We sounded the general mobilization so that we go even faster, to increase the vaccination points, to vaccinate relentlessly every day of" "2021.03.18 fpress conference jc"
        "now these variants, continue its journey without doing everything possible to hamper it, as we have always tried to do so far by taking pragmatic, proportionate and territorialized measures and by ens" "2021.03.18 fpress conference jc"
        "Published on: 03/04/2021 (Only the pronouncement is authentic) My dear fellow citizens, ladies and gentlemen, As every week, we meet together to take stock of the evolution of the Covid epidemic and t" "2021.03.04 fpress conference jc"
        "this because it is still an increase but it is clear that we are not confronted, at least not at this stage, with an exponential increase in the epidemic, as some models show it. predicted and as we h" "2021.03.04 fpress conference jc"
        "shows that the pressure is not easing and that it weighs heavily on our caregivers, especially since it has been going on for weeks or even months. The second worrying element is that these national f" "2021.03.04 fpress conference jc"
        end
        [/CODE]


        Rather than reshaping, I think it's that I want each speech in one cell not one row, sorry I should have been more specific.

        Many thanks

        Comment


        • #5
          I believe that's what the code in #3 does. At the end, you need to keep only the wanted variables.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str200 text2 str34 id2
          "MARCH 31, 2021 - ONLY THE DELIVERY IS AUTHENTIC ADDRESS BY THE PRESIDENT OF THE REPUBLIC My dear compatriots, from metropolitan France, overseas and abroad. The health crisis we are going through, thi" "2021.03.31 ftv macron address"
          "am speaking to you, it is to call for the mobilization of everyone for this month of April where a lot is being played out. Where are we today ? After a hard confinement during the first wave in sprin" "2021.03.31 ftv macron address"
          "each one to have the good behaviors vis-a-vis the virus, to limit its contacts, to be tested at the first symptom and to isolate oneself when one is positive. Safety, balance, responsibility: these ar" "2021.03.31 ftv macron address"
          "additional restrictions for nearly twenty departments to the national curfew which was already in force. Two weeks after the implementation of these measures, the figures are clear.. Yes, this strateg" "2021.03.31 ftv macron address"
          "is to say our children, we must, for the months to come, each provide an extra effort. This is what I am asking of us collectively this evening. An effort by caregivers first of all, to increase our r" "2021.03.31 ftv macron address"
          "deprogramming, they are and will be supported in the coming days by additional reinforcements. The number of intensive care beds has already been increased to 7,000. And here I want to thank the medic" "2021.03.31 ftv macron address"
          "entire metropolitan area, it is because no metropolitan area is spared today: everywhere the virus is circulating quickly, more and more quickly, and hospitalizations are increasing everywhere. And th" "2021.03.31 ftv macron address"
          "cross-border workers. So some were militating, I know, for the generalized return of the certificate as in March 2020. We did not choose this option. Because the irresponsibility of a few must not rui" "2021.03.31 ftv macron address"
          "it comes to schools - and this is sort of the third effort that we will be making in the coming weeks - we all need to be aware of our duties towards our youth. We can congratulate ourselves, in our c" "2021.03.31 ftv macron address"
          "necessary with suitable gauges. I know, believe me, what this reorganization involves profound changes for parents of students and for families. But it is the most suitable solution to curb the virus " "2021.03.31 ftv macron address"
          "need it the most, that is to say the oldest, the most fragile, those who are most at risk of developing serious forms. And I fully assume this priority, this order that we have given because it is the" "2021.03.31 ftv macron address"
          "hypertension or overweight, you can make an appointment today with your doctor, nurse or pharmacist who will vaccinate you directly with the vaccine. Astra Zeneca. It started last week and we will spe" "2021.03.31 ftv macron address"
          "April 16, the first dates will be granted to people between 60 and 70 years old. From May 15, the first meetings will be open for our fellow citizens who are between 50 and 60 years old. And from mid-" "2021.03.31 ftv macron address"
          "Find meeting places, shops. To rediscover this French art of living that are the restaurants and cafes that we love so much. I will get back to you soon to specify a reopening agenda, and so that ever" "2021.03.31 ftv macron address"
          "today, for the coming month, we must mobilize. To mobilize for our elders and the most vulnerable and to mobilize for our children, to protect them and allow them to continue to learn and prepare for " "2021.03.31 ftv macron address"
          "Speech by Jean Castex: press conference on measures against Covid-19 Published on: 03/19/2021 (Only the pronouncement is authentic) My dear fellow citizens, ladies and gentlemen, Every week, every Thu" "2021.03.18 fpress conference jc"
          "wave, we know the cause: the arrival of the British variant which now represents nearly three quarters of contaminations. This British variant, we knew - and I told you - it was more contagious. We ar" "2021.03.18 fpress conference jc"
          "contain the level of epidemic progression thanks to the measures adopted, measures which were not light and easy measures. Namely, since January 16, the curfew from 6 p.m. across the country, combined" "2021.03.18 fpress conference jc"
          "reached, it should be very quickly given the rates of progression that we are recording . It is true that the epidemic situation is not equivalent between the different departments, but the critical s" "2021.03.18 fpress conference jc"
          "Pas-de-Calais, the situation has stabilized but is struggling to improve. In the Alpes-Maritimes, a real decrease observed at first seems to be easing in recent days. I would like to salute the mobili" "2021.03.18 fpress conference jc"
          "approach, not automatically apply the same measures as yesterday, take into account what this epidemic taught us. In the 16 departments affected by these new measures, we will maintain the bias of lea" "2021.03.18 fpress conference jc"
          "only those subject to enhanced measures. This choice to restrict the possibilities of leaving home less however must be accompanied byThis choice to restrict the possibilities of leaving home less, ho" "2021.03.18 fpress conference jc"
          "responsibility, I would even say common sense and certainly not infantilization: let us freely take advantage of outdoor spaces but be very rigorous in banning private gatherings or in public spaces. " "2021.03.18 fpress conference jc"
          "mask is much better respected in subways and trains than in personal vehicles. It is not a question of banning carpooling - that would be totally excessive - but I want to remind all those who practic" "2021.03.18 fpress conference jc"
          "vaccine. It is a prudent step, of course, but essential in terms of trust and transparency. A few minutes ago, the European Medicines Agency gave the opinion we were expecting. He confirms that the As" "2021.03.18 fpress conference jc"
          "the national territory and to use them as quickly as possible. We sounded the general mobilization so that we go even faster, to increase the vaccination points, to vaccinate relentlessly every day of" "2021.03.18 fpress conference jc"
          "now these variants, continue its journey without doing everything possible to hamper it, as we have always tried to do so far by taking pragmatic, proportionate and territorialized measures and by ens" "2021.03.18 fpress conference jc"
          "Published on: 03/04/2021 (Only the pronouncement is authentic) My dear fellow citizens, ladies and gentlemen, As every week, we meet together to take stock of the evolution of the Covid epidemic and t" "2021.03.04 fpress conference jc"
          "this because it is still an increase but it is clear that we are not confronted, at least not at this stage, with an exponential increase in the epidemic, as some models show it. predicted and as we h" "2021.03.04 fpress conference jc"
          "shows that the pressure is not easing and that it weighs heavily on our caregivers, especially since it has been going on for weeks or even months. The second worrying element is that these national f" "2021.03.04 fpress conference jc"
          end
          
          rename (*2) (*)
          gen oid= sum(id!=id[_n-1])
          gen which=_n
          bys id (which): replace which=_n
          reshape wide text id, i(oid) j(which)
          gen wanted_id= id1
          egen wanted_text= concat(text*), punct(" ")
          keep oid wanted_*
          Res.:

          Code:
          . l wanted_text, sepby(oid) notrim
          
                                                                                                                                                                
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                    wanted_text  
            1.   MARCH 31, 2021 - ONLY THE DELIVERY IS AUTHENTIC ADDRESS BY THE PRESIDENT OF THE REPUBLIC My dear compatriots, from metropolitan France, oversea
          > s and abroad. The health crisis we are going through, thi am speaking to you, it is to call for the mobilization of everyone for this month of April
          >  where a lot is being played out. Where are we today ? After a hard confinement during the first wave in sprin each one to have the good behaviors v
          > is-a-vis the virus, to limit its contacts, to be tested at the first symptom and to isolate oneself when one is positive. Safety, balance, responsib
          > ility: these ar additional restrictions for nearly twenty departments to the national curfew which was already in force. Two weeks after the impleme
          > ntation of these measures, the figures are clear.. Yes, this strateg is to say our children, we must, for the months to come, each provide an extra 
          > effort. This is what I am asking of us collectively this evening. An effort by caregivers first of all, to increase our r deprogramming, they are an
          > d will be supported in the coming days by additional reinforcements. The number of intensive care beds has already been increased to 7,000. And here
          >  I want to thank the medic entire metropolitan area, it is because no metropolitan area is spared today: everywhere the virus is circulating quickly
          > , more and more quickly, and hospitalizations are increasing everywhere. And th cross-border workers. So some were militating, I know, for the gener
          > alized return of the certificate as in March 2020. We did not choose this option. Because the irresponsibility of a few must not rui it comes to sch
          > ools - and this is sort of the third effort that we will be making in the coming weeks - we all need to be aware of our duties towards our youth. We
          >  can congratulate ourselves, in our c necessary with suitable gauges. I know, believe me, what this reorganization involves profound changes for par
          > ents of students and for families. But it is the most suitable solution to curb the virus  need it the most, that is to say th  
            2.   Speech by Jean Castex: press conference on measures against Covid-19 Published on: 03/19/2021 (Only the pronouncement is authentic) My dear fel
          > low citizens, ladies and gentlemen, Every week, every Thu wave, we know the cause: the arrival of the British variant which now represents nearly th
          > ree quarters of contaminations. This British variant, we knew - and I told you - it was more contagious. We ar contain the level of epidemic progres
          > sion thanks to the measures adopted, measures which were not light and easy measures. Namely, since January 16, the curfew from 6 p.m. across the co
          > untry, combined reached, it should be very quickly given the rates of progression that we are recording . It is true that the epidemic situation is 
          > not equivalent between the different departments, but the critical s Pas-de-Calais, the situation has stabilized but is struggling to improve. In th
          > e Alpes-Maritimes, a real decrease observed at first seems to be easing in recent days. I would like to salute the mobili approach, not automaticall
          > y apply the same measures as yesterday, take into account what this epidemic taught us. In the 16 departments affected by these new measures, we wil
          > l maintain the bias of lea only those subject to enhanced measures. This choice to restrict the possibilities of leaving home less however must be a
          > ccompanied byThis choice to restrict the possibilities of leaving home less, ho responsibility, I would even say common sense and certainly not infa
          > ntilization: let us freely take advantage of outdoor spaces but be very rigorous in banning private gatherings or in public spaces.  mask is much be
          > tter respected in subways and trains than in personal vehicles. It is not a question of banning carpooling - that would be totally excessive - but I
          >  want to remind all those who practic vaccine. It is a prudent step, of course, but essential in terms of trust and transparency. A few minutes ago,
          >  the European Medicines Agency gave the opinion we were expecting. He confirms that the As the national territory and to use t  
            3.                                                                                                                                                  
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                                                     
          >                                                                                                                     Published on: 03/04/2021 (Only t
          > he pronouncement is authentic) My dear fellow citizens, ladies and gentlemen, As every week, we meet together to take stock of the evolution of the 
          > Covid epidemic and t this because it is still an increase but it is clear that we are not confronted, at least not at this stage, with an exponentia
          > l increase in the epidemic, as some models show it. predicted and as we h shows that the pressure is not easing and that it weighs heavily on our ca
          > regivers, especially since it has been going on for weeks or even months. The second worrying element is that these national f  
          
          .

          Comment


          • #6
            Thank you so much!! I really appreciate your help

            Comment

            Working...
            X