Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Storing compiled AssociativeArray instance as member variable of class in a Mata library?

    Greetings,
    I am cautiously wading my way into the ocean of Mata class programming, so my apologies if I’m off on a completely wrong approach to the problem I describe below, and/or conveying too much detail on my process.

    As part of a larger project, I'm creating a set of helper functions that convert among different country identifiers: names, 2-character ISO codes, 3-character ISO codes, numeric ISO codes (all based on ISO3166), and "safenames," which are no-whitespace, ASCII-only versions of the country names.

    I begin with a csv file containing the codes for each country (iso_name_translation_witha3.csv):

    Code:
    name,alpha2,iso,safename,alpha3
    Afghanistan,AF,4,Afghanistan,AFG
    Albania,AL,8,Albania,ALB
    Antarctica,AQ,10,Antarctica,ATA
    Algeria,DZ,12,Algeria,DZA
      .
      .
    Côte d’Ivoire,CI,384,CotedIvoire,CIV
      .
      .

    I create a set of class AssociativeArray from the csv and save them to a file (country_iso_map.mox) by running make_country_iso_AA:

    Code:
    program make_country_iso_AA
          clear
          // import, forcing iso to string
          import delimited using iso_name_translation_witha3.csv, stringcols(_all)
          capture erase country_iso_map.mox
          mata:doit()
    
          // demonstrate the AssociativeArrays work as expected:
          mata: Alpha3.get("CIV")
    end
    
    version 18.0
    mata:
    mata set matastrict on
    
    void doit()
    {
          class AssociativeArray scalar Country, Alpha2, Alpha3, Safename, ISO
          string matrix X
    
          X = st_sdata( . , ("name", "alpha2", "alpha3", "safename", "iso") )
    
          for (i=1; i<=rows(X); i++) {
                Country.put(  X[i,1] , (X[i,.]) )
                Alpha2.put(   X[i,2] , (X[i,.]) )
                Alpha3.put(   X[i,3] , (X[i,.]) )
                Safename.put( X[i,4] , (X[i,.]) )
                ISO.put(      X[i,5] , (X[i,.]) )
          }
    
          fh = fopen("country_iso_map.mox", "w")
          fputmatrix(fh, Country)
          fputmatrix(fh, Alpha2)
          fputmatrix(fh, Alpha3)
          fputmatrix(fh, Safename)
          fputmatrix(fh, ISO)
          fclose(fh)
    }
    end
    I can then load those AssociativeArrays:

    Code:
    clear *
    mata
    fh = fopen("country_iso_map.mox", "r")
    Country  =fgetmatrix(fh)
    Alpha2   =fgetmatrix(fh)
    Alpha3   =fgetmatrix(fh)
    Safename =fgetmatrix(fh)
    ISO      =fgetmatrix(fh)
    fclose(fh)
    And they work as I expect:

    Code:
    : Alpha3.get("CIV")
                       1               2               3               4               5
        +---------------------------------------------------------------------------------+
      1 |  Côte d’Ivoire              CI             CIV     CotedIvoire             384  |
        +---------------------------------------------------------------------------------+
    
    : ISO.get("384")
                       1               2               3               4               5
        +---------------------------------------------------------------------------------+
      1 |  Côte d’Ivoire              CI             CIV     CotedIvoire             384  |
        +---------------------------------------------------------------------------------+
    Then I include include those AssociativeArray instances (Country, Alpha2, Alpha3, Safename, and ISO) as static members of a class that also contains member functions for translation among the identifiers. I can achieve this by reading in the country_iso_map.mox file when the class is instantiated, though this feels inefficient:

    Code:
    *! version 1.0.0 scale_set_utility_functions.mata
    version 18
    mata:
    
    class countryConverter
    {
          static class  AssociativeArray scalar Country, Alpha2, Alpha3, Safename, ISO
          void          new()
    
          real   scalar isoofname()
          string scalar a2ofname(), a3ofname(), safenameofname()
    }
    
    void countryConverter::new() {
          // only need to load AAs once b/c static
          if (!Country.N() ) {
                display("Loading country_iso_map.mox")
                // todo: put country_iso_map.mox on adopath so we can find it
                fh = fopen("country_iso_map.mox", "r")
                Country  =fgetmatrix(fh)
                Alpha2   =fgetmatrix(fh)
                Alpha3   =fgetmatrix(fh)
                Safename =fgetmatrix(fh)
                ISO      =fgetmatrix(fh)
                fclose(fh)
          }
    }
    
    string scalar countryConverter::a2ofname(string scalar countryname)
    {
          return (this.Country.get(countryname)[1,2])
    }
    string scalar countryConverter::a3ofname(string scalar countryname)
    {
          return (this.Country.get(countryname)[1,3])
    }
    string scalar countryConverter::safenameofname(string scalar countryname)
    {
          return (this.Country.get(countryname)[1,4])
    }
    real scalar countryConverter::isoofname(string scalar countryname)
    {
          return ( strtoreal(this.Country.get(countryname)[1,5]) )
    }
    end
    Finally, I create a Mata library with the do-file make_lscaleset.do

    Code:
    *! make_lscaleset.do
    // version number intentionally omitted
    clear all
    do "scale_set_utility_functions.mata"
    lmbuild lscaleset.mlib, replace
    And I can use my helper functions:

    Code:
    . mata
    ------------------------------------------------- mata (type end to exit) -------------------------------------------
    : cc=countryConverter()
    Loading country_iso_map.mox
    
    : cc2=countryConverter()
    
    : cc2.a3ofname("Côte d’Ivoire")
      CIV
    
    : cc2.isoofname("Côte d’Ivoire")
      384
    
    : eltype(cc2.isoofname("Côte d’Ivoire"))
      real

    So this does work, however, two things bother me:

    1. Is there a way to store the compiled AssociativeArray instances in the Mata library, so they don't have to be read from a separate file?

    2. This is inefficient because I am creating five AssociativeArrays with identical contents--only the keys vary. Surely there is a better way--maybe involving pointers, which are only barely beginning to make sense for me.

    Anyway, thank you for reading this far!

  • #2
    Originally posted by Nicholas Winter View Post
    1. Is there a way to store the compiled AssociativeArray instances in the Mata library, so they don't have to be read from a separate file?
    I don't know for certain, but I suspect that the answer's no. As far as I am aware, Mata libraries store only function definitions, and so for classes, it won't store the instantiated objects. For example, it would store your countryConverter as its automatically generated corresponding function, countryConverter(), which contains the class definition, but no state.

    2. This is inefficient because I am creating five AssociativeArrays with identical contents--only the keys vary. Surely there is a better way--maybe involving pointers, which are only barely beginning to make sense for me.
    You don't need pointers for this: Mata's associative arrays can have keys with more than one dimension, and so you can have a two-dimensional key, in your case an array whose first dimension is the country name (that is, the contents of the variable name name) and the second dimension corresponds to the variable name (that is "alpha2", "iso", "safename" and "alpha3"). Thus, you can instantiate a single associative array whose values would be the contents of those variables for each country name in the first element of the two-dimensional key. In addition, you can store anything as the value, and so you don't need to convert the ISO country identifier number back and forth to string.

    In the code below, I illustrate the single associative array with a two-dimensional key as described above along with the ability to store the state of an instantiation of the converter class containing the utility public methods (I've named mine "ISONation") in a Mata matrix storage file (following you, I use a.mox file name suffix).

    The code has three sections delimited by comments, first for storing the class definition in a local .mlib file (which, because it contains a member variable containing an instance of Mata's associative array class, also redundantly stores the class definition, AssociativeArray()), then for instantiating and storing the object in an .mox file, and finally for clearing memory and reading the object and its data back from the .mox file.
    Code:
    version 18
    
    clear *
    
    *
    * Store conversion class definition in .mlib file
    *
    
    mata:
    mata set matastrict on
    
    class ISONation {
        private:
            class AssociativeArray scalar a
            void new()
        public:
            real scalar isoofname()
            string scalar a2ofname(), a3ofname(), safenameofname()
            void test()
    }
    void function ISONation::new() {
    
        string rowvector SVarnames
        SVarnames = "name", "alpha2", "safename", "alpha3"
    
        transmorphic matrix SData, RData
        st_sview(SData=(.), ., SVarnames)
        st_view(RData=(.), ., "iso")
    
        a.reinit("string", 2)
    
        string scalar cou
        real scalar row, col
        for (row=1; row<=st_nobs(); row++) {
            cou = SData[row, 1]
            a.put((cou, "iso"), RData[row, 1])
            for (col=2; col<=cols(SData); col++) {
                a.put((cou, SVarnames[col]), SData[row, col])
            }
        }
    }
    string scalar function ISONation::a2ofname(string scalar countryname)
        return(a.get( (countryname, "alpha2") ))
    string scalar function ISONation::a3ofname(string scalar countryname)
        return(a.get( (countryname, "alpha3") ))
    string scalar function ISONation::safenameofname(string scalar countryname)
        return(a.get( (countryname, "safename") ))
    real scalar function ISONation::isoofname(string scalar countryname)
        return(a.get( (countryname, "iso") ))
    void function ISONation::test() {
    
        string colvector Keys
        Keys = uniqrows(a.keys()[., 1])
    
        real scalar row
        for (row=1; row<=rows(Keys); row++) {
            Keys[row]
            a2ofname(Keys[row])
            a3ofname(Keys[row])
            safenameofname(Keys[row])
            isoofname(Keys[row])
            printf("\n")
        }
    }
    
    end
    
    lmbuild lisonation, dir(.)
    
    *
    * Store instantiated object in .mox file
    *
    
    mata:
    mata set matastrict on
    
    void function storEm() {
    
        class ISONation scalar b
    
        real scalar file_handle
        file_handle = fopen("isonation.mox", "w")
        fputmatrix(file_handle, b)
        fclose(file_handle)
    }
    
    end
    
    import delimited "F:\iso_name_translation_witha3.csv"
    
    mata: storEm()
    
    *
    * Test
    *
    
    clear *
    
    mata: 
    
    file_handle = fopen("isonation.mox", "r")
    b = fgetmatrix(file_handle)
    fclose(file_handle)
    
    b.test()
    
    unlink("lisonation.mlib")
    unlink("isonation.mox")
    
    end
    
    exit
    Complete do-file and associated log file are attached if you're interested further.
    Attached Files

    Comment


    • #3
      Thank you!

      For my first question, that makes sense, though disappointing.

      The two-dimensional AssociativeArray is useful, but doesn't really get at what I was asking with my second question. I think I was not clear -- I should have made explicit in my question that I plan a set of converters not just from name, but from the other identifiers: nameofa2(), nameofa3(), nameofiso(), a2ofiso(), a3ofiso(), isoofsafename(), etc., etc. That is why I created the five different AA's (Country, Alpha2, Alpha3, ISO, Safename), so, e.g., nameofa3() would use the Alpha3 array the way that a3ofname() uses Country. So I have five AssociativeArrays, each with the same set of values but a different set of keys; I'm wondering if there is a more efficient way to handle that.

      Underlying my speculation on pointers was this idea: there's one regular Mata matrix with one row per country, and collumns for name, a2, a3, safename, and Siso. Then there's a set of Associative Arrays. The Name Associative Array has keys with country names ("Afghanistan","Algeria", etc.), and values that are a pointer to the relevant row in the Mata matrix. That would mean storing the matrix once, rather than essentially duplicating it five times.

      With only 250-odd countries in the world, that bit of optimization probably doesn't really matter in practice, but as a learning exercise I'm trying to figure out if I can do better.

      Comment


      • #4
        Originally posted by Nicholas Winter View Post
        I think I was not clear -- I should have made explicit in my question that I plan a set of converters not just from name, but from the other identifiers. . .
        Yeah, I didn't get that.

        Because the number of rows is so small and because of the 1:1 cardinality, you could probably get away with a string array (Mata string matrix) and functions that wrap select() operating on it.

        But my first choice would have been to approach the problem by means of a relational database with stored procedures.

        Comment


        • #5
          Makes sense. Thank you!

          Comment


          • #6
            Fyi - there is a package on SSC called -isocodes- that provides some of this data by literally hardcoding it in.

            As a I recall, there was a related issue that popped up on the Statalist a number of years ago. I am drawing a blank as to what it was in relation to, but I believe the way the author handled it was to append (or smuggle) in the data at the end of their ado-file. Since Stata will stop executing a file once -exit- it reached, you can place any such text afterwards, which some authors will typically include a changelog. If memory serves, the ado-file bootstrapped itself by reading the same ado-file as input using -import- or similar. This could be one approach to consider.

            Comment


            • #7
              Are the tricks that Ben Jann uses in his -crosswalk- (SSC) relevant/useful for Nicholas's task?

              Comment

              Working...
              X