Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble with storing/importing very long strings (strL)

    Hi all,

    I am trying to import very large text string variables into Stata (pages of text into a single variable). I see on the -help limits- that strL allows 2,000,000,000 bytes, which is plenty of space for the data I have. However, when I create a strL variable and go to import my data, Stata is still cutting the data short (an observation of text gets cut off exactly at 2,500 characters/bytes). I have tried importing documents from csv, xls, xslx, and copying and pasting the text directly. Nothing works. It feels like I have tried everything but I cannot find a solution nor find anything online.

    Thanks for your help!
    Roger

  • #2
    You don't show us the code you are using, so I cannot comment on what you may have done wrong, and you don't describe the layout of your file, so it's difficult to suggest code that may work for you.

    Here is a technique that will read in a text file one line at a time, but the line size is limited to the length of a macro (see help limits for details).
    Code:
    // create sample data
    clear
    set obs 2
    generate strL lengthy = "--------- " * 10000
    replace       lengthy = "123456789 " * 10000 in 2
    outfile lengthy using lengthy.txt, noquote replace
    
    // read sample data lines into a strL variable
    clear
    display c(macrolen)
    generate strL lengthy = ""
    file open longish using lengthy.txt, read text
    local n 0
    file read longish line
    while r(eof)==0 {
        quietly set obs `++n'
        quietly replace lengthy = `"`macval(line)'"' in l
        file read longish line
        }
    file close longish
    describe
    list
    Code:
    . // read sample data lines into a strL variable
    . clear
    
    . display c(macrolen)
    4227143
    
    . generate strL lengthy = ""
    
    . file open longish using lengthy.txt, read text
    
    . local n 0
    
    . file read longish line
    
    . while r(eof)==0 {
      2.     quietly set obs `++n'
      3.     quietly replace lengthy = `"`macval(line)'"' in l
      4.     file read longish line
      5.     }
    
    . file close longish
    
    . describe
    
    Contains data
      obs:             2                          
     vars:             1                          
     size:       200,162                          
    ------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ------------------------------------------------------------------------------------------------
    lengthy         strL    %9s                  
    ------------------------------------------------------------------------------------------------
    Sorted by:
         Note: Dataset has changed since last saved.
    
    . list
    
         +-----------------------------------------------------------------------------------------+
         | lengthy                                                                                 |
         |-----------------------------------------------------------------------------------------|
      1. | --------- --------- --------- --------- --------- --------- --------- --------- -----.. |
      2. | 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 12345.. |
         +-----------------------------------------------------------------------------------------+
    Last edited by William Lisowski; 08 May 2019, 09:58.

    Comment


    • #3
      Thank for the help. I will try to be clearer here. Say this is my code:

      Code:
      clear
      set obs 2
      generate strL lengthy = "--------- " * 10000
      replace lengthy = "Cyclic loading experiments revealed that permanent deformations were introduced in the samples even when they were loaded within the limits of the ‘elastic’ regime (Fig. 2c). Both the load and unload portions of each subsequent cycle displayed an increase in modulus of elasticity with a total increase of about 20% after five cycles. Such self-reinforcing behaviour is well known for aligned polymer chains and other fibrous materials23, where tensile loading can lead to a macromolecular/fibril alignment along the load direction and a mechanically stiffer sample. Similarly, stretching graphene oxide paper should lead to a better alignment of the two-dimensional lamellae and thus also the individual graphene oxide sheets, increase their contact and interactions, and result in a stiffer material. This behaviour of graphene oxide paper is in stark contrast to that of flexible graphite foil for which the elasticity modulus decreases upon stress cycling5. Interestingly, the stress–strain curves for graphene oxide paper samples often displayed ‘washboard’ patterns and sometimes even sharp upturns (Supplementary Information 3), manifested as a sequence of the peaks in the derivative (δσ/δε) of the stress–strain curve (Fig. 2d). Similar local-reinforcing behaviour was observed during basal plane shear in single-crystal graphite24 and in the material produced by layer-by-layer assembly of montmorillonite clay platelets and polyelectrolytes25. However, if the sample was loaded into the plastic regime (Fig. 2b) and failed, then the stiffness of the reloaded segments at low strain was similar to that of the original sample just before its failure. These results indicate that the loss of material stiffness is not a local effect, but rather a homogeneous softening of the paper upon loading in this manner. In exceptional situations, the stress–strain response had several consecutive steps each with a large change in elongation (Fig. 2e), suggesting a slide-and-lock mechanism whereby the individual ‘nanoplates’ that make up the macroscopic sample slide and then ‘click’ into place when progressively stressed. Given that water molecules are present between graphene oxide sheets (see above) one would expect the mechanical properties of graphene oxide paper to depend strongly on its water content. Indeed, as the moisture content of graphene oxide paper decreases with increasing temperature (see thermogravimetric analysis curve, Supplementary Information 6), the modulus increases (from 17 to 25 GPa for the same sample shown in Fig. 2f–h). As expected, the loss of water is also accompanied by slow contraction of the graphene oxide paper (Fig. 2i). Simultaneously, the magnitude of permanent deformation decreases for each loading cycle conducted at 40, 90 and 120 °C, respectively (Fig. 2g, h). This water-related behaviour is similar in cellulose-based paper: a wet sheet has lower strength and stiffness than does a dry one26. In addition to tensile tests, we performed bending experiments (see Supplementary Information 5) for several samples of graphene oxide paper with varying thicknesses t. A strip of a graphene oxide paper was bent so that a simple curve was formed (Fig. 4b), and then compressed between two parallel plates until a kink (or more than one) was formed (Fig. 4b, c). We measured the radius of curvature R for such a strip just before the loss of structural stability (that is, kink formation). According to the solution for pure uniform bending of a bar comprised of an isotropically homogeneous material27, the positive (or negative) normal strain εx at the outer (or inner) bar surface is . The linear fitting of experimental points (red line in Fig. 4a) gives the average normal strain value εx ≈ 1.1 ± 0.1%. As the ultimate tensile strain of graphene oxide paper is only 0.6% (see above), it can sustain more deformation during bending than during uniaxial tension."
      I then look at the data editor and see the cell observation where the text is entered begins correctly, but ends at the phrase "the modulus increases" part way through. In other words, it maxes out at 2500 characters. Do you know why the entire text is not copied into the cell? I have noticed when it's under 2500 characters then the cell in the data viewer is red, but when it is over 2500 characters it turns grey and ends early.

      Comment


      • #4
        Your problem is explained in the documentation for the edit command (which is what runs the Data Editor) found in the Stata Data Management Reference Manual PDF included in your Stata installation and most easily accessed by clicking the link at the top of the output of help edit.

        Technical note

        Stata can store long strings in the strL storage type. Although the strL type can hold very long strings, these strings may only be edited if they are 2045 characters or less. Similarly, strLs that hold binary data may not be edited. For more information on storage types, see [D] data types
        So, as the output below shows, in this case your data was correctly loaded into the strL variable in its entirety.
        Code:
        . clear
        
        . set obs 1
        number of observations (_N) was 0, now 1
        
        . generate strL lengthy = "Cyclic loading experiments revealed that permanent deformations were
        > introduced in the samples even when they were loaded within the limits of the ‘elastic’ regime
        ... intervening lines deleted ...
        >  value εx ≈ 1.1 ± 0.1%. As the ultimate tensile strain of graphene oxide paper is only 0.6% (s
        > ee above), it can sustain more deformation during bending than during uniaxial tension."
        
        . generate len = length(lengthy)
        
        . display len[1]
        3962
        
        . display lengthy[1]
        Cyclic loading experiments revealed that permanent deformations were introduced in the samples e
        > ven when they were loaded within the limits of the ‘elastic’ regime (Fig. 2c). Both the load a
        ... intervening lines deleted ...
        > imental points (red line in Fig. 4a) gives the average normal strain value εx ≈ 1.1 ± 0.1%. As
        >  the ultimate tensile strain of graphene oxide paper is only 0.6% (see above), it can sustain
        > more deformation during bending than during uniaxial tension.
        Worth noting: using the list command truncates the display of the variable, also. I didn't look into whether it was possible to affect where it is truncated.

        Comment


        • #5
          You're right. This fixed the problem. Thank you William!

          Comment

          Working...
          X