Xapian-lite is a minimal Emacs dynamic module for Xapian. It provides a simple interface that allow you to index files and search for phrases. I initially wrote it for my note app Xeft.
;; Querying my ~40MB worth of notes.
(benchmark-run 100 (xeft-query-term "common lisp" xeft-database 0 10))
;;=> (0.031512 0 0.0)
Because it’s so basic, the dynamic module is very easy to use and also very flexible. To index files, use
(dolist (file (directory-files "my-note-dir"))
(xapian-lite-reindex-file file dbpath))
This indexes each file in my-note-dir
, saving them to the database
at dbpath
. If the database doesn’t exist yet, it is created.
To search for a term, use
(xapian-lite-query-term "search term" dbpath 0 10)
This returns a list of paths of the files that contains search term
,
ranked by relevance. The 0
and 10
means “return 10 results
starting from the 0th place”, it is essentially used for paging. If
you want all the result, use 0
and 999999
.
When a file is modified, call xeft-reindex-file
again on that file.
If a file is removed, you don’t need to remove it from the database,
it will be automatically removed. If the file has been indexed and
haven’t been modified, xeft-reindex-file
is (kind of) a no-op (i.e.
fast).
Both file path and database path must be absolute path.
Since v2.1, there’s an additional function, xapian-lite-close-database
. It closes the connection to the database. This has two effect, one, by closing the connection, another Emacs session can connect to the database; two, any uncommitted change to the database that’s still in memory are committed to disk. For example, Xeft periodically commites DB changes to disk with this function.
Since v2.1, xapian-lite can detect base64 encodings in the file, and skip them when indexing. This prevents the DB size ballooning from indexing base64 strings in the files. If a stretch of text contains only characters used in base64 encoding, and is longer than 70 characters, it it considered base64 text and is skipped when indexing. This will cause xapian-lite to skip some longer urls too, since `/` is used in base64. The length threshold of 70 is hard-coded.
To build the module, you need to have Xapian installed. On Mac, it can be installed with macports by
sudo port install xapian-core
Then, build the module by
make PREFIX=/opt/local
Here /opt/local
is the default prefix of macports, which is what I
used to install Xapian. Homebrew and Linux users probably can leave it
empty.
You can also build an standalone module, which doesn’t require xapian dynamic library when running. For that you need to build xapian as a static library. First get the source from https://xapian.org/docs/install.html.
tar -xf <xapian tarball>
./configure --disable-shared
make CPPFLAGS=-fPIC
Now find libxapian.a
in .libs
directory, copy it out into xapian-lite’s project root, and run
make standalone
I put pre-built binary for GNU/Linux and macOS in GitHub Releases. If you want to use xapian-lite in your package, you can do something like this:
(defvar xeft--linux-module-url "https://github.com/casouri/xapian-lite/releases/download/v1.0/xapian-lite-amd64-linux.so"
"URL for pre-built dynamic module for Linux.")
(defvar xeft--mac-module-url "https://github.com/casouri/xapian-lite/releases/download/v1.0/xapian-lite-amd64-mac.dylib"
"URL for pre-built dynamic module for Mac.")
(defun xeft--download-module ()
"Download pre-built module from GitHub. Return non-nil if success."
(require 'url)
(let ((module-path (expand-file-name
"xapian-lite.so"
(file-name-directory
(locate-library "xeft.el" t)))))
(cond
((eq system-type 'gnu/linux)
(url-copy-file xeft--linux-module-url module-path)
t)
((eq system-type 'darwin)
(url-copy-file xeft--mac-module-url module-path)
t)
(t (message "No pre-built module for this operating system. We only have them for GNU/Linux and macOS")
nil))))