Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bundling DLLs in .pkgs

    Greetings, Statalist,

    I am wrapping libsvm for Stata, and I have hit a snag. I would like to bundle libsvm with my wrapper on Windows and OS X to make a one-step installation. I discovered the .pkg "g PLATFORM path install_name" command, which should let me distribute the bundled files as appropriate per platform, but I discovered that the only files Stata considers part of the package (and what it installs when you do "net install") are ones with these extensions
    Code:
    .ado
    .class
    .dlg
    .hlp
    .idlg
    .ihlp
    .jar
    .key
    .mata
    .mlib
    .maint
    .mnu
    .pdf
    .png
    .resource
    .scheme
    .plugin
    .stbcal
    .sthlp
    .style
    Also even if I did convince Stata to download the DLL, it will get put into the adopath, which is not the system library path. So I am obviously going about this the wrong way.

    I can think of three options, don't like any of them, and I'm looking for suggestions:
    1. statically link libsvm
    2. distribute libsvm as, say, libsvm.resource, rename it on first run to libsvm.dll or libsvm.dylib as appropriate, using Stata's command, and either
      1. load it with dlopen()
      2. somehow edit the system library path (PATH on Windows, DYLD_FALLBACK_LIBRARY_PATH on OS X) to include the adopath
    I would rather not statically link because it means forking libsvm, dlopen() is a pain, and it looks like .ados can only read, not write, their environ (using the "environment" command). Is there a fourth way?
    Last edited by Nick Guenther; 22 May 2015, 22:34.

  • #2
    I've also wondered about this but sadly I don't have much beyond what you suggest. Hopefully the (few?) plugin writers that read Statalist may give a better answer, or in any case Statacorp Tech support may be able to give good advice based on past experience with users.

    Best,
    S

    Comment


    • #3
      Since it seems you have access to the source, why not include compile instructions for users to compile the source on their own systems? Maybe looking at the source code for other plugin based user written programs could be helpful, but there also seems to be a Java binary that might be a bit easier to use with javacall.

      Comment


      • #4
        My target user base is working sociogists and econometrics people, people who are almost certainly using Windows and have better things to do than learn what compilation is. I have compilation instructions written, but I've segmented them off for contributors, not users. That is why one-click install is so important to me. Actually, I could distribute binaries with special manual install instructions, which is slightly easier, but history teaches me that anything requiring manual user intervention is a source of bugs.

        Just knowing that this is shakey territory is useful, if disappointing, information. Hopefully the Stata devs will notice this thread and have a better idea.

        Comment


        • #5
          In the worst case scenario, I'd assume you could use copy to get the library from the web and copy it to the appropriate directory, but then you may also run into issues coming from file system permissions. While the number of folks using windows in private industry is certainly massive, I think you may find many users that work on some type of *nix based platform that could easily call gcc from the shell (or you could shell out in the ado to compile the library as well).

          Comment


          • #6
            Nick, some thoughts:
            1. Don't link it statically; you won't have the time to re-release your package every time the DLL changes, and it allows me to always link it to a different version of the DLL by just changing the .dll file (not that many people will do it though). Also, what build of the DLL are you using? Are you compiling it yourself? If so, maybe just copying other builts such as http://www.lfd.uci.edu/~gohlke/pythonlibs/#libsvm can save you time..
            2. I don't see any reason to add it to ENVPATH. First, because in many systems you *can't* change it (been there), and second because it's usually already quite polluted. Now, I have no experience with how dlopen works with Stata plugins, so I don't know if having to use dlopen will be a showstopper for you.
            3. Now, a problem (besides dlopen) still remains: renaming the file. Do we *really* need to rename it, or can you dlopen a file with another extension? (e.g. "libsvm.dll.resource" ?). If the later, then it would be prefered because otherwise since you can update the package, that means that every time the package is run it needs to check if theres is a new version that needs to be renamed.
            Best,
            Sergio

            Comment


            • #7
              I surely would like to link dynamically and against the system copy for all the above reasons. libsvm is in all major Linux distributions and it's in both macports and brew. It's really just Windows that is the pain, with no reliable package manager. Plus, I am wary of telling people on OS X they have to first install brew just to use this plugin. By the way, the libsvm developers distribute libsvm.dll for Windows in their releases, but I was planning to compile it myself since they only provide the 32 bit version. I think the worst case is more like I package the .dll for Windows, which Stata will interpret as an ancillary file, and instruct people to "net get" everytime they want to work on an svm problem, or I give instructions about how to where to drop system DLLs in Windows.

              Sergio, so far it seems I have full access to the entire platform C API, so I have no doubt dlopen()/LoadLibrary() will work. It will just be tedious and I am really not looking forward to grinding it out. Both Windows and *nix can take full paths and should be able to load arbitrary names. Afterall, this is just what Stata does when it loads a shared library named ".plugin". I didn't think about updating: you're right, leaving the filenames as alone as possible will upset Stata the least; if I did rename it, Stata would not know to uninstall it.

              Whatever I decide on will be up on Github, and maybe it will be able to repay your help in the future!

              Comment


              • #8
                I'm primarily a Windows user (dabble in Linux), and have helped people with Linux-specific stuff (even set up dual-boot just to use a specific command efficiently). No stranger to compiling things.

                However, for security reasons, I *can't* drop random .dlls on my work computer anymore, nor re-configure to dual-boot. In many, many environments, users don't have admin rights, or they're running Stata off a virtual machine (without admin rights).

                If you want broad usage, need to figure a way around it.

                Comment


                • #9
                  Ben,

                  When you talk about dropping dlls do you mean installing or copying them into the system folders? If the dll is named as dll.something and installed into the ADO path then that would be ok?


                  Nick,

                  I share your pain. I use Python on Windows a lot and if it were not for the precompiled binaries that I linked before it would be reaaaally hard for Python users to install packages. Do note that the Python guys distribute x64 versions, so you can use those.

                  Also, maybe you can just use the system versions if available, and only use the DLL if c(os)=="Windows"?
                  Thanks again for the effort; I would love to start coding/porting more stuff from C but the lack of shoulders on which to stand has prevented me from doing so until now, so hopefully your effort will change that!



                  Comment


                  • #10
                    Nick, see amap package here:
                    http://go.worldbank.org/ODS4NKQBB0
                    Best, Sergiy Radyakin

                    Comment


                    • #11
                      As I suspected, running without admin rights or write permissions on C:\Program Files (x86)\Stata14, the installation of amap failed. I can put things into ado/plus no problem; have added plenty of packages. But complicated ones requiring .dlls and such fail. I'm not surr what needs to go into C:\Program Files (x86)\Stata14, and what needs to go into the c:\windows\ folder, but in any case, failure.

                      I could try putting .dlls into /ado/plus, but that's not the way it's supposed to work, and I'm not sure if it can work.

                      Comment


                      • #12
                        That's not what I meant.
                        Keep your plugin as a bridge between Stata and the full package, which is installed as a whole separately.
                        Admin rights are required for registering the OCX component inside. Of course there is no other way!
                        This is similar to packages e.g. exporting to LaTeX, etc. The whole LaTeX is not to go to the ado directory. Only your plugin that consumes it.
                        IMHO nothing should go to Stata14 folder. Your files should go to ADO\PLUS to be uninstallable with ado commands, or be uninstallable using OS infrastructure.
                        Best, Sergiy

                        Comment


                        • #13
                          Thanks for the tip, Sergiy. I considered running an installer on first run as you do, but I am worried about people doing automated Stata deployments with 'net install', disconnecting, and then discovering that their package is not fully installed. If I have to, I will go that route, but I am hoping that just wrapping a library doesn't need to be so complicated. libsvm is a library: it is to be loaded into RAM with Stata, not shell-invoked like you would with LaTeX or your shapefile program. A similar wrapper for R does just that: it first writes out the entire dataset to disk then invokes svmlight on it, which sounds like all kinds of slow for the large datasets people would tend to use SVM on.

                          How does amap handle uninstallation? Does Stata provide a hook at 'net uninstall' time so that you can clean up the .exe's? You souls have convinced me that sticking to only files listed in the .pkg is the ideal. Now I just need to figure out how to manage that.

                          Comment


                          • #14
                            Sadly Stata does not provide a hook for that. Or for adding things to the menu. Or for a bunch of other useful tasks. It does shuffle the files around, but never executes anything for installing/uninstalling.

                            For amap one has to uninstall using Control Panel first, then remove the package with ado uninstall.

                            For automated Stata deployments have all the packages part of the deployment. See net from folder

                            Finally you can install packages via standalone installers, like an older version of usespss did, see screenshots here:
                            http://www.adeptanalytics.org/radyak...in_usespss.htm
                            Of course multiple packages can also be combined into a single installation, as I did in slide 19 here:
                            www.stata.com/meeting/dcconf09/dc09_lokshin.ppt

                            Of course you have platform restrictions then, etc.

                            Consult Stata tech support. They will explain how they had that planned.

                            If anyone is listening, this important functionality is also missing:
                            1. create a temporary folder;
                            2. create a temporary file with a prescribed extension;
                            3. create a bunch of temporary files with same name and desired extensions;
                            4. add/remove items to the menu upon installation of packages;
                            5. hook for uninstall;
                            6. declare dependency;
                            7. hook for uninstall of dependency;
                            8. etc
                            Best, Sergiy


                            Comment


                            • #15
                              That prepackaged libsvm binary is meant for python. You cannot even install it without having python up and some knowledge of pip, and the dll gets installed to C:\Python*\Lib\site-packages\libsvm.dll. This is alright for their use case, which is supporting the scipy stack. It is a good thing to look at, but I don't want to ask my users to learn pip just to install one package.

                              Here is their loader code, from svm.py:
                              Code:
                              try:
                                  dirname = path.dirname(path.abspath(__file__))
                                  if sys.platform == 'win32':
                                      libsvm = CDLL(path.join(dirname, 'libsvm.dll'))
                                  else:
                                      libsvm = CDLL(path.join(dirname, '../libsvm.so.2'))
                              except:
                              # For unix the prefix 'lib' is not considered.
                                  if find_library('svm'):
                                      libsvm = CDLL(find_library('svm'))
                                  elif find_library('libsvm'):
                                      libsvm = CDLL(find_library('libsvm'))
                                  else:
                                      raise Exception('LIBSVM library not found.')
                              The critical pieces are that python provides __file__ to let you know where you source file is in the filesystem, and the .whl's are set up to install libsvm to the same folder as svm.py, and that and Python's only interface to DLLs is dlopen(), but it has ctypes which makes doing this very stable and flexible. Stata provides adopath and adosubdir and findfile which can probably be bent to achieving what __file__ gives you, but .pkg does not allow you to choose where files go, which complicates things. I guess this is why you fell back on writing your own installer.

                              In the meantime, I have explored what more I can do with plugins. Stata's internals are obscured from C plugins, but the rest of the OS is available. I have written a setenv implementation for Stata which balances Stata's built in "local VAR : env VAR". Needs 1 through 3 could be solved with a similarly short wrapper of mktemp(3). To demonstrate the power of this approach, I have successfully experimented with modifying the Windows PATH in order to point it at libsvm.dll, like this:

                              Code:
                              webuse auto
                              
                              program _getenv, plugin
                              program _setenv, plugin
                              
                              plugin call _getenv, PATH
                              local H = "`_getenv';C:\Users\nguenthe\statasvm\deps\libsvm-3.20\windows;"
                              plugin call _setenv, PATH "`H'" //if this line is commented out, the next call fails as expected at "program _svm, plugin"
                              
                              svm foreign price-gear_ratio if !missing(rep78)
                              do tests/helpers/inspect_model.do
                              (ironically, in exploring this, I stumbled across an old post of yours, Sergiy: http://www.stata.com/statalist/archi...msg00537.html; they still haven't fixed that bug)

                              Finally, I clued in to what you're doing in amap.pkg: capitals in .pkgs override Stata's decision of which files are worthless and ancillary. I skimmed the docs too quickly; they do not explicitly say using a capital F or G overrides the list of file extensions above, just "For instance, xyz.ado would be installed in the system directories, whereas xyz.dta would be installed in the current directory." Combining the PATH hackery above (which will have to be tweaked for OS X, if I choose to bundle on OS X as well), I should be able to arrange the plugin to get loaded with something roughly like
                              Code:
                              findfile libsvm.dll
                              //--> r(fn) = c:\ado\plus/l/libsvm.dll, if your installation settings are at default
                              mata pathsplit(r(fn), libpath, dll)
                              if `libpath' not in `PATH' {
                                 plugin call _setenv, PATH "`PATH';`libpath'"
                              }
                              findfile also looks in the current directory, so this loader code can achieve something close to "python setup.py develop" and be usable both in installed and in source repository contexts.

                              Tweaking system library paths, even just for a subprocess, is generally looked down on in my experience, but it's better than all the other options:
                              1. give instructions to users for manual installation
                              2. build an installer (actually 3: Win32, Win64, and OS X) for libsvm myself such that it installs itself to the OS library directory, or installs somewhere else but adds itself to the system library path
                                1. as in amap, the installer can come along with all the needed files and even get spawned automatically from the .ado, so my concern about providing a partially installed package, like those web-installers that are popular now, is controlled.
                              1. is prone to disasters and 2. is a lot of effort both initially and for maintenance. By installing to the adopath and then gluing over the unusualness with setenv(), net uninstall works, adoupdate works, and you won't need admin rights to install the package unless your sysadmin has locked down Stata on purpose. The downside is that the users' are wasting disk space if anything else also bundles your dependencies, and your users are stuck to your release cycle; but that's par for the course on Windows and with OS X .apps, so I think this is as close to the best of all worlds.

                              I think 80% of the answer I was originally looking for, what will enable this, was the clue that "G" is different than "g".

                              Since it seems there's interest, I will keep this thread open with my progress. By the way, my advisor agreed to letting me MIT-license our code today, so all these support routines will be available to you. They're not ready yet, but I expect the very rough pseudo-Stata above to be cleaned up into a cross-platform multi-dependency shim by the end of the week.
                              Last edited by Nick Guenther; 27 May 2015, 18:14.

                              Comment

                              Working...
                              X