A general purpose, extensible search utility, written in Python.
search is capable of searching for regular expressions in: text files; path
names; or symbol names in object files. It has a flexible module system to allow
the authoring of additional types of search.
Copyright © 2017-2018 Jonathan Simmonds
search requires:
- Python 2.6+
Additionally the provided modules require:
files:grep(currently GNU and BSD variants are supported)symbols:objdump(currently GNU and LLVM variants are supported)
All files are licensed under the MIT license.
usage: search [-h] [--version] [dirs | files | symbols [-u]] [-i] [-v]
[path [path ...]] regex
A module-based, recursive file searching utility.
positional arguments:
path Optional path(s) to perform the search in or on. If
omitted the current working directory is used.
regex Perl-style regular expression to search for. It is
recommended to pass this in single quotes to prevent
shell expansion or interpretation of the regex
characters.
global arguments:
-h, --help Show this help message and exit.
--version Show the version number of the program and its
installed modules and exit.
search modules:
{dirs,files,symbols} Select which search module to use. Defaults to files.
dirs Search recursively on the file names of any files in
the given paths.
files Search recursively on the contents of any files in the
given paths.
symbols Search recursively in any object files or archives for
symbols with names matching regex.
common arguments:
-i, --ignore-case Enable case-insensitive searching.
-v, --verbose Enable verbose, full replication of the result column,
even if it means taking multiple lines per match (by
default the result will be condensed to keep one line
per match if possible).
dirs module:
no additional arguments
files module:
no additional arguments
symbols module:
optional arguments:
-u, --undefined Also print undefined symbols (i.e. in objects which
reference but don't define the symbol).
$ search search_modules 'def search\('
search_modules/files.py:65 def search(regex, paths, args, ignore_case=False, verbose=False):
search_modules/dirs.py:42 def search(regex, paths, args, ignore_case=False, verbose=False):
search_modules/symbols.py:485 def search(regex, paths, args, ignore_case=False, verbose=False):$ search -i symbols -u test/symbols/*.o DIV
test/symbols/math_mul.o
Function symbol:
Name: _div
Section: __TEXT,__text
Value: 0x60
Size: 0x0
test/symbols/util_number.o
Undefined symbol:
Name: _div$ search dirs '.\.md'
./README.mdsearch has been designed from the ground up to be extensible and has a module
system allowing the contribution of custom search modules to enable new ways to
search.
By itself the search utility does nothing - it is a CLI driver: loading and
initializing all available modules, parsing the command line and directing the
search request to the appropriate module.
search comes with three provided modules:
dirs: Search recursively on the file names of any files in the given paths, similar to the Unixfindcommand.files: Search recursively on the contents of any files in the given paths, similar to the Unixgrepcommand. This is the default module which will be used if no module is specified.symbols: Search recursively in any object files or archives for symbols with names matching a regex - similar to anobjdump | greppipeline.
If you have been given an additional module, you may install it simply by
placing it in the search_modules directory alongside the search executable.
The driver will load all modules in the search_modules directory alongside the
search executable. With each of these it will bind the following methods:
-
create_subparser(subparsers)This method will be called by the driver during module initialization to allow the module to add a subparser to the main parser. This will then automatically contribute help text to the driver and allow selecting of the module in a query. Additional, module-specific arguments can be added to the subparser if necessary. NB: Any added subparser must use the
add_help=Falsekeyword argument to prevent automatically adding help options. Failure to do so will result in an exception when loading the module - help options are added and handled by the driver.The arguments are as follows:
subparsers: Special handle object (argparse._SubParsersAction) which can be used to add subparsers to a parser.
The return is as follows:
- Object representing the created subparser.
-
search(regex, paths, args, ignore_case, verbose)This method will be called to process a search query.
The arguments are as follows:
regex: String regular expression to search with.paths: List of strings representing the paths to search in/on.args: Namespace containing all parsed arguments. If the subparser added additional arguments these will be present.ignore_case: Boolean, True if the search should be case-insensitive, False if it should be case-sensitive.verbose: Boolean, True for verbose output, False otherwise.
The return is as follows:
- Not expected to return anything. Any output must be printed by the method itself.
The module loading will fail if these methods cannot be bound.
Putting all this together, if we wanted to add a new dummy module, all we
would have to do is place a new file dummy.py in the search_modules
directory. The most basic, functional contents would look like the following:
def search(regex, paths, args, ignore_case=False, verbose=False):
"""Perform the requested search.
Args:
regex: String regular expression to search with.
paths: List of strings representing the paths to search in/on.
args: Namespace containing all parsed arguments. If the
subparser added additional arguments these will be present.
ignore_case: Boolean, True if the search should be case-insensitive,
False if it should be case-sensitive.
verbose: Boolean, True for verbose output, False otherwise.
"""
for path in paths:
pass # Do some kind of searching here...
def create_subparser(subparsers):
"""Creates this module's subparser.
Args:
subparsers: Special handle object (argparse._SubParsersAction) which can
be used to add subparsers to a parser.
Returns:
Object representing the created subparser.
"""
parser = subparsers.add_parser(
'dummy',
add_help=False,
help='Do nothing at all.')
return parserAny __version__ member in the module will be picked up as the module's
version information. All modules should have a 1.x version number: major
version numbers greater than this are reserved for future use.
There are a number of helper objects provided for describing search results, formatting output and printing it to the console. These are briefly outlined below and described in much more detail in the Python docstrings:
result- Provides types necessary to build
SearchResultobjects. SearchResult: container for describing a single result to a search query.Match: abstract part of aSearchResultdescribing the component which matched the query. TheStringMatchimplementation is provided for a match found in a string (this will probably cover 90% of use cases). Modules may subclass if necessary to provide bespoke match types.Location: abstract, optional part of aSearchResultdescribing where the match has been found. TheTextFileLocationimplementation is provided for a match which has been located in a text file. Modules may subclass if necessary to provide bespoke location types.
- Provides types necessary to build
printer- Provides printers for printing streamed
SearchResultobjects. AbstractPrinter: abstract printer to print streamedSearchResults. Once created theprint_resultsmethod may be called on it with aSearchResultiterable to print the output. It is assumed the search query may be long running and it is desireable to print output as found (i.e. before termination), so it makes most sense to call this method with a generator function. A number of implementations of printers are provided. Modules may subclass if necessary to provide bespoke printers.
- Provides printers for printing streamed
console- Provides very basic console utility functions. Mostly used for writing custom printers.
ansi- Provides utilities for adding ANSI formatting to a string (i.e. coloring it) for console output. Mostly used for writing custom match or location types.
process- Provides wrappers to support streaming output from subprocesses. Mostly used for writing search modules which call out to separate tools or command line utilities.
Bringing everything together: a skeleton, functional module might look like the following:
from search_utils.printer import MultiLinePrinter, SingleLinePrinter
from search_utils.process import StreamingProcess
from search_utils.result import SearchResult, StringMatch, TextFileLocation
# Module version.
__version__ = '1.0'
def parse_result(line, regex=None):
"""Creates a SearchResult object from the output of a grep command.
Args:
line: String single line of grep output to process.
regex: String regex this result is derived from, or None if unknown.
Defaults to None.
Returns:
The initialized SearchResult.
"""
path_split = line.split(' ', 1)
line_split = path_split[1].split(':', 1)
return SearchResult(StringMatch(line_split[1].strip(), regex),
TextFileLocation(path_split[0], int(line_split[0])))
def search(regex, paths, args, ignore_case=False, verbose=False):
"""Perform the requested search.
Args:
regex: String regular expression to search with.
paths: List of strings representing the paths to search in/on.
args: Namespace containing all parsed arguments. If the
subparser added additional arguments these will be present.
ignore_case: Boolean, True if the search should be case-insensitive,
False if it should be case-sensitive.
verbose: Boolean, True for verbose output, False otherwise.
"""
with StreamingProcess(['grep', '-rnP', regex] + paths) as proc:
printer = MultiLinePrinter() if verbose else SingleLinePrinter()
printer.print_results(parse_result(line, regex) for line in proc)
def create_subparser(subparsers):
"""Creates this module's subparser.
Args:
subparsers: Special handle object (argparse._SubParsersAction) which can
be used to add subparsers to a parser.
Returns:
Object representing the created subparser.
"""
parser = subparsers.add_parser(
'dummy',
add_help=False,
help='Do nothing very much.')
return parserThis does roughly what the files module does, although simplified and
considerably less robust. The brevity of this module (most of it is docstrings)
illustrates the power of the provided utility functions.
Module authors are encouraged to review the provided modules and the docstrings
for further inspiration. The dirs module is by far the simplest (and a
pure-Python implementation), whereas the symbols module is by far the most
complex (with custom match and location types).