BaseX: Enabling Search and Link Management

The BaseX system is an XQuery-based XML database that provides powerful search features for XML.

The DFST model uses an XQuery database to provide link management and DITA-aware search features. Any XQuery database can be used bu the DFST project provides materials for using the BaseX database. Note that in the DFST model, the XQuery database is used only as a read-only database to enable search and link management, it is not used to manage the source documents. Source document management is done in a separate source code repository (e.g., git) that provides the required version control and distributed access features.

BaseX is a light-weight, pure-Java XQuery database that is easy to install and use, making it appropriate for use on individual Authors' workstations. It provides a built-in Web server as well as a general database server that can be accessed using special BaseX clients.

DITA content is loaded to BaseX through the use of git commit hooks (provided by the DITA for Small Teams project) that load the DITA content to the BaseX repository whenever it is committed to the git repository. These commit hooks keep the BaseX repository in sync with the main content repository.

The DFST uses the BaseX HTTP server, which provides both a standard BaseX server and an HTTP server that enables Web access to the database.

The basic process for setting up BaseX is:
  1. Install the BaseX HTTP server package
  2. Set the BaseX configuration file so DITA documents are parsed correctly against the DITA DTDs managed by a DITA Open Toolkit.
  3. Start the BaseX HTTP server
  4. Set up the BaseX git hooks to automatically copy documents to the BaseX database
  5. Test your setup to make sure everything is hooked up correctly

Once you have BaseX running you can test the configuration using a temporary database. It's easy to create a new database and add documents to it using the BaseX command-line client or the BaseX Web adminstration client.

Install BaseX

Installers are available for Windows (.exe download) and Mac (via Homebrew). You can also download the Zip package and simply unzip it somewhere and add the bin/ directory to your PATH environment variable.

The build-in BaseX user is "admin" with a password of "admin". If security is a concern you should at least change the administrator password if not create a separate user account for use by the git commit hooks.

One-Time BaseX Setup

In order to manage DITA documents properly, BaseX must be configured to use an XML catalog and to turn on DTD parsing. For DITA use you will normally use the catalog-dita.xml file maintained by the DITA Open Toolkit.

To configure BaseX to parse DITA documents, update the BaseX configuration file .basex in the BaseX installation directory:

  1. Find the location of the DITA Open Toolkit you will use to manage the master XML catalog file for your DITA documents.

    If this is the Open Toolkit integrated with oXygenXML, the toolkit will be in the frameworks/dita/DITA-OT directory under the oXygenXML installation directory.

    You will need the absolute path of the Open Toolkit directory, e.g., /Applications/oxygen/frameworks/dita/DITA-OT or C:\Program Files\Oxygen XML Editor 16\frameworks\dita\DITA-OT.

  2. Find the BaseX installation directory.

    The exact location will depend on your operating system and how you installed it. The default location on Windows is C:\Program Files (x86)\BaseX. For OS X and Linux the command which basexhttp should show the location (the value returned will include the bin/ directory—you want the parent of the bin/ directory).

  3. Edit the file .basex in a text editor and add the following lines at the end of the file:
    CATFILE = OT directory/catalog-dita.xml
    DTD = true
    CHOP = false

    Where OT directory is the Open Toolkit directory you got in Step 1.

    On OS X, my .basex file looks like this:
    # General Options
    DEBUG = false
    DBPATH = /Users/ekimber/apps/basex/data
    REPOPATH = /Users/ekimber/apps/basex/repo
    LANG = English
    LANGKEYS = false
    GLOBALLOCK = false
    
    # Client/Server Architecture
    HOST = localhost
    PORT = 1984
    SERVERPORT = 1984
    EVENTPORT = 1985
    USER = 
    PASSWORD = 
    SERVERHOST = 
    PROXYHOST = 
    PROXYPORT = 0
    NONPROXYHOSTS = 
    TIMEOUT = 30
    KEEPALIVE = 600
    PARALLEL = 8
    LOG = true
    LOGMSGMAXLEN = 1000
    
    # HTTP Services
    WEBPATH = /Users/ekimber/apps/basex/webapp
    RESTXQPATH = 
    HTTPLOCAL = false
    STOPPORT = 8985
    AUTHMETHOD = Basic
    
    # Local Options
    CATFILE = /Applications/oxygen/frameworks/dita/DITA-OT/catalog-dita.xml
    DTD = true
    CHOP = false
    On Windows, the CATFILE option looks like this:
    CATFILE = C:\Program Files\Oxygen XML Editor 16\frameworks\dita\DITA-OT\catalog-dita.xml
  4. Save the file.

To test this configuration you'll need to have a database, add a DITA document to it, and verify that all the default attributes were expanded on load. One indication that the configuration is correct is if it takes noticeable time to load DITA documents: BaseX has to fetch the DTDs and parse the documents with respect to them, which is much slower than just loading the XML without validating first.

There is one remaining setup task for which you need a running BaseX server: installing the DFST XQuery modules.

Start the BaseX HTTP Server

The DFST setup uses BaseX in two ways: through git commit hooks that update XML documents in the BaseX database and through a Web application that enables search and provides DITA link management services.

To support this dual use of BaseX you must run the BaseX HTTP server. The BaseX HTTP server then provides both the BaseX server (accessed through BaseX clients, such as the BaseX command-line client) and the BaseX HTTP server. You can run the server as a background service.

See the BaseX command-line options documentation for details.

To start the BaseX HTTP server:
  • OS X or Linux: Run the command basexhttp -S to start the server as a background service.
  • Windows: Use the "Start BaseX server" item in the BaseX start menu or run the command basexhttp -S

Once you have the HTTP server running you can connect to it in serveral ways, including using WebDAV, either from OxygenXML or from another WebDAV client. You should be able to set up the database as a WebDAV shared drive under all operating systems. See the BaseX documentation for details.

The DFST git commit hooks use the BaseX command-line client to connect to the BaseX server. The DFST DITA link management Web application is accessed via normal HTTP through a Web browser.

Create a BaseX Database and Load Some DITA Content

The BaseX server can manage any number of databases. A database is simply a named collection of XML documents.

To test the BaseX setup you can try loading the "garage" samples that come with the Open Toolkit:
  1. Open a command window and run the command basexclient.

    You should be prompted to enter the BaseX user ID and password (e.g., "admin", "admin").

    If the command does not work, check your PATH environment variable or change directories to the BaseX bin/ directory.

    Once you have logged in you should see the BaseX command prompt:
    c:\>basexclient
    Username: admin
    Password:
    BaseX 8.0.3 [Client]
    Try help to get more information.
    
    >
  2. Create a new database to hold the sample documents:
    > create database samples
    Database 'samples' created in 23.95 ms.
    >
  3. Use the list command to list the databases available:
    > list
    Name     Resources  Size  Input Path
    ------------------------------------
    samples  0          4532
    
    1 database(s).
    >
  4. Open the "samples" database to make it the current database:
    > open samples
    Database 'samples' was opened in 0.02 ms.
    >
  5. Add the samples documents to the database:
    > add to samples c:\Program Files\Oxygen XML Editor 16\frameworks\dita\DITA-OT\samples
    Resource(s) added in 2092.7 ms.
    >

    Set the directory to wherever your Open Toolkit actually is.

    The "to samples" part of the command adds the documents in the samples folder to a directory named "samples" in the database.

    It should take a few seconds to load the documents. If the load was instantaneous, then the catalog and DTD parsing are not set up correctly.

  6. List the files in the database:
    > list samples
    Input Path                                    Type  Content-Type     Size
    -------------------------------------------------------------------------
    samples/ant_sample/sample_all.xml             xml   application/xml  57
    samples/ant_sample/sample_docbook.xml         xml   application/xml  42
    samples/ant_sample/sample_eclipsehelp.xml     xml   application/xml  42
    samples/ant_sample/sample_htmlhelp.xml        xml   application/xml  42
    samples/ant_sample/sample_javahelp.xml        xml   application/xml  42
    samples/ant_sample/sample_odt.xml             xml   application/xml  42
    samples/ant_sample/sample_pdf.xml             xml   application/xml  42
    samples/ant_sample/sample_tocjs.xml           xml   application/xml  42
    samples/ant_sample/sample_troff.xml           xml   application/xml  42
    samples/ant_sample/sample_wordrtf.xml         xml   application/xml  42
    samples/ant_sample/sample_xhtml.xml           xml   application/xml  54
    samples/ant_sample/sample_xhtml_plus_css.xml  xml   application/xml  73
    samples/ant_sample/template_docbook.xml       xml   application/xml  38
    samples/ant_sample/template_eclipsehelp.xml   xml   application/xml  38
    samples/ant_sample/template_htmlhelp.xml      xml   application/xml  38
    samples/ant_sample/template_javahelp.xml      xml   application/xml  38
    samples/ant_sample/template_odt.xml           xml   application/xml  38
    samples/ant_sample/template_pdf.xml           xml   application/xml  38
    samples/ant_sample/template_wordrtf.xml       xml   application/xml  38
    samples/ant_sample/template_xhtml.xml         xml   application/xml  37
    samples/concepts/garageconceptsoverview.xml   xml   application/xml  17
    samples/concepts/lawnmower.xml                xml   application/xml  20
    samples/concepts/oil.xml                      xml   application/xml  27
    samples/concepts/paint.xml                    xml   application/xml  27
    samples/concepts/shelving.xml                 xml   application/xml  27
    samples/concepts/snowshovel.xml               xml   application/xml  27
    samples/concepts/toolbox.xml                  xml   application/xml  32
    samples/concepts/tools.xml                    xml   application/xml  67
    samples/concepts/waterhose.xml                xml   application/xml  27
    samples/concepts/wheelbarrow.xml              xml   application/xml  17
    samples/concepts/workbench.xml                xml   application/xml  24
    samples/concepts/wwfluid.xml                  xml   application/xml  20
    samples/tasks/changingtheoil.xml              xml   application/xml  69
    samples/tasks/garagetaskoverview.xml          xml   application/xml  17
    samples/tasks/organizing.xml                  xml   application/xml  17
    samples/tasks/shovellingsnow.xml              xml   application/xml  47
    samples/tasks/spraypainting.xml               xml   application/xml  56
    samples/tasks/takinggarbage.xml               xml   application/xml  39
    samples/tasks/washingthecar.xml               xml   application/xml  68
    
    Resources.
    >
  7. Verify that the @class attributes were correctly expanded:
    > xquery /*/@class
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic concept/concept "
    class="- topic/topic task/task "
    class="- topic/topic concept/concept "
    class="- topic/topic task/task "
    class="- topic/topic task/task "
    class="- topic/topic task/task "
    class="- topic/topic task/task "
    class="- topic/topic task/task "
    Query executed in 2.57 ms.
    >

    This XQuery simply returns all the @class attributes of all the root elements in the database. If those elements have @class attributes then all the elements will.

You have now verified that your BaseX server is correctly configured to manage DITA documents and is ready to get updates from your git repository.

If you want, you can remove the samples database:
> drop database samples
Database 'samples' was dropped.
> list
Name  Resources  Size  Input Path
---------------------------------

0 database(s).
>

Install DFST XQuery Modules

The DFST XQuery modules provide the DITA-specific link management and DITA-aware searching features you need.

To install the modules, you run the script install-modules.sh or install-modules.bat from the basex/scripts directory:
  1. Open a command window and navigate to the basex/scripts/ directory:
    c:\>cd c:\projects\dita-for-small-teams\basex\scripts
    

    Or

    c:\projects\dita-for-small-teams\basex\scripts>
  2. Run the install-modules script:
    c:\projects\dita-for-small-teams\basex\scripts>install-modules.bat admin admin
    Installing DFST XQuery packages:
    Name                                                   Version  Type      Path
    -----------------------------------------------------------------------------------------------------------------------------------
    org.dita-for-small-teams.xquery.modules.dita-utils     -        Internal  org/dita-for-small-teams/xquery/modules/dita-utils.xqm
    org.dita-for-small-teams.xquery.modules.relpath-utils  -        Internal  org/dita-for-small-teams/xquery/modules/relpath-utils.xqm
    
    2 package(s).
    
    c:\projects\dita-for-small-teams\basex\scripts>

Set Up Git Commit Hooks for BaseX Update

The DFST git commit hooks for BaseX keep BaseX in sync with your git repository.

Using the commit requires the following:
  • The BaseX bin/ directory must be in your PATH or Path environment variable. You should be able to type "basex" or "basexclient" on the command line. This allows the git hook scripts to run the required BaseX commands. You should have set this up when you installed BaseX.
  • The BaseX server connection properties must be set in the .basex file. This file can be in the main BaseX installation directory or in your user home directory. This allows the basexclient command to connect and authenticate to the BaseX server without the need for a separate configuration file.
  • The DFST git hooks must be copied to or linked from the .git/hooks/ directory in the git repositories you want to manage. These hooks keep the BaseX databases in sync with your git repositories.
To set up the git hooks, do the following:
  1. Edit the .basex configuration file, either in the main BaseX installation directory or in your home directory, and add the following settings:
    USER = admin
    PASSWORD = admin
    HOST = localhost
    PORT = 1984

    These values are the BaseX default. Your configuration must reflect any changes you made from default after you installed BaseX.

    For OS X: If you want to put the .basex configuration file in your home directory, you must delete the .basexhome from the BaseX installation directory, otherwise BaseX will not read the .basex file in your home directory.

  2. Copy the files from the DFST project commit-hooks/git/ directory to the .git/hooks directory under the root directory of your git repository.
    For OS X and Linux: Instead of copying the files you can link to the files in the DFST project. This makes it easier to keep the hooks updated. To do this, use the ln -s command like so:
    ekimber:project-01$ cd .git/hooks
    ekimber:hooks$ ln -s ~/dita-for-small-teams/commit-hooks/git/basexLoadOrUpdateBranch
    ekimber:hooks$ ln -s ~/dita-for-small-teams/commit-hooks/git/post-checkout
    ekimber:hooks$ ln -s ~/dita-for-small-teams/commit-hooks/git/post-commit
    ekimber:hooks$ ln -s ~/dita-for-small-teams/commit-hooks/git/post-merge
    ekimber:hooks$ ln -s ~/dita-for-small-teams/commit-hooks/git/recordGitStateDetails
    ekimber:hooks$ ln -s ~/dita-for-small-teams/commit-hooks/git/updateBaseXForCOmmitOrMerge
    For OS X and Linux: Make sure all the commit hook files are executeable. They should be as they come from the DFST git repository but they may not be for whatever reason. To make them executable, apply this command to the directory that contains the scripts:
    ekimber:dita-for-small-teams$ chmod a+x commit-hooks/git/*