Skip to content

Excel Metadata Filter

Jörn Franke edited this page Dec 29, 2016 · 6 revisions

The following metadata attributes can be filtered on Excel documents. Note that the value for each filter can be any regular expression supported by Java. The attribute names differ between the .xls and the .xlsx format!

OOXML (.xlsx) supported metadata attributes

Matching

if you set "hadoopoffice.read.filter.metadata.matchAll" true then all filter properties need to match. If you set it false then at least one must match.

Core properties

The following core properties are supported.

  • hadoopoffice.read.filter.metadata.category
  • hadoopoffice.read.filter.metadata.contentstatus
  • hadoopoffice.read.filter.metadata.contenttype
  • hadoopoffice.read.filter.metadata.created (date)
  • hadoopoffice.read.filter.metadata.creator
  • hadoopoffice.read.filter.metadata.description
  • hadoopoffice.read.filter.metadata.identifier
  • hadoopoffice.read.filter.metadata.keywords
  • hadoopoffice.read.filter.metadata.lastmodifiedbyuser (date)
  • hadoopoffice.read.filter.metadata.lastprinted (date)
  • hadoopoffice.read.filter.metadata.modified
  • hadoopoffice.read.filter.metadata.revision (int)
  • hadoopoffice.read.filter.metadata.subject
  • hadoopoffice.read.filter.metadata.title

Please note that all types are String although they may have been converted from other types, such as date. Any date is converted using the Date.toString() method. This means they have the format "hh:mm:ss dd.MM.yyyy", for example, "12:00:00 01.01.2016".

Custom properties

You can define any filter on custom properties. For example, hadoopoffice.read.filter.metadata.custom.mycustomproperty will be represent a filter on custom property "mycustomproperty" in the Excel document. Each value of this property is interpreted as String.

Old Excel format (.xls)

Matching

if you set "hadoopoffice.read.filter.metadata.matchAll" true then all filter properties need to match. If you set it false then at least one must match.

Properties

The following summary information are supported:

  • hadoopoffice.read.filter.metadata.applicationname
  • hadoopoffice.read.filter.metadata.author
  • hadoopoffice.read.filter.metadata.charcount (int)
  • hadoopoffice.read.filter.metadata.comments
  • hadoopoffice.read.filter.metadata.createdatetime (date)
  • hadoopoffice.read.filter.metadata.edittime (long)
  • hadoopoffice.read.filter.metadata.keywords
  • hadoopoffice.read.filter.metadata.lastauthor
  • hadoopoffice.read.filter.metadata.lastprinted (date)
  • hadoopoffice.read.filter.metadata.lastsavedatetime (date)
  • hadoopoffice.read.filter.metadata.pagecount (int)
  • hadoopoffice.read.filter.metadata.revnumber
  • hadoopoffice.read.filter.metadata.security (int)
  • hadoopoffice.read.filter.metadata.subject
  • hadoopoffice.read.filter.metadata.template
  • hadoopoffice.read.filter.metadata.title
  • hadoopoffice.read.filter.metadata.wordcount (int)

Please note that all types are String although they may have been converted from other types, such as date. Any date is converted using the Date.toString() method. This means they have the format "hh:mm:ss dd.MM.yyyy", for example, "12:00:00 01.01.2016".

Clone this wiki locally