-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add search index #4306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
terror
wants to merge
27
commits into
ordinals:master
Choose a base branch
from
terror:search
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Add search index #4306
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
9049e9c
Add basic index
terror b10273a
Add template
terror 022e617
Fix tests
terror 929c571
Do this
terror d5d6adb
Don't add index to diff
terror 88b59d6
Remove index
terror 9f03cdc
Add search index option
terror 1004e0d
Fix tests
terror 15dfcdd
Add in missing values
terror abad18d
Rename index
terror 35b6b06
Add id to search fields
terror 24bd99d
Add existing inscriptions to index
terror 706f86b
Only add after commit
terror 6bb592e
Don't hook into event system
terror 2908de0
Remove charm
terror 86431b6
Use count to check for existing inscriptions
terror 03bc004
Properly shut down search index thread
terror 0d233e2
Add timestamp field
terror 122d64d
Sort by timestamp
terror e448cbb
Rename `document` -> `schema`
terror cd0e992
Rename + fmt
terror 1d5bc04
Merge remote-tracking branch 'upstream' into search
terror c626d5c
Tweak
terror 0c9f16b
Format
terror 21d74db
Put `query_parser` method on `SearchIndex`
terror 9792753
Don't sort search results by timestamp
terror 6f0de61
Query ranges of inscriptions when updating search index
terror File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,6 +73,8 @@ pub struct Options { | |
help = "Do not index inscriptions." | ||
)] | ||
pub(crate) no_index_inscriptions: bool, | ||
#[arg(long, help = "Use search index at <SEARCH_INDEX>.")] | ||
pub(crate) search_index: Option<PathBuf>, | ||
Comment on lines
+76
to
+77
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this doesn't even need to be configurable initially. We could just put it into the ord data dir. |
||
#[arg( | ||
long, | ||
help = "Require basic HTTP authentication with <SERVER_PASSWORD>. Credentials are sent in cleartext. Consider using authentication in conjunction with HTTPS." | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,206 @@ | ||
use { | ||
super::*, | ||
crate::subcommand::server::query, | ||
tantivy::{ | ||
collector::{Count, TopDocs}, | ||
directory::MmapDirectory, | ||
query::QueryParser, | ||
schema::{ | ||
document::OwnedValue, DateOptions, DateTimePrecision, Field, Schema as TantivySchema, | ||
INDEXED, STORED, STRING, | ||
}, | ||
DateTime, Index as TantivyIndex, IndexReader, IndexWriter, ReloadPolicy, TantivyDocument, | ||
}, | ||
}; | ||
|
||
#[derive(Clone)] | ||
struct Schema { | ||
inscription_id: Field, | ||
charm: Field, | ||
sat_name: Field, | ||
timestamp: Field, | ||
} | ||
|
||
impl Schema { | ||
fn default_search_fields(&self) -> Vec<Field> { | ||
vec![ | ||
self.inscription_id, | ||
self.charm, | ||
self.sat_name, | ||
self.timestamp, | ||
] | ||
} | ||
|
||
fn search_result(&self, document: &TantivyDocument) -> Option<SearchResult> { | ||
let inscription_id = document.get_first(self.inscription_id).and_then(|value| { | ||
if let OwnedValue::Str(id_str) = value { | ||
Some(id_str) | ||
} else { | ||
None | ||
} | ||
})?; | ||
|
||
Some(SearchResult { | ||
inscription_id: inscription_id.parse().ok()?, | ||
}) | ||
} | ||
} | ||
|
||
#[derive(Clone)] | ||
pub struct SearchIndex { | ||
ord_index: Arc<Index>, | ||
reader: IndexReader, | ||
schema: Schema, | ||
search_index: TantivyIndex, | ||
writer: Arc<Mutex<IndexWriter>>, | ||
} | ||
|
||
#[derive(Eq, Hash, PartialEq)] | ||
pub struct SearchResult { | ||
pub inscription_id: InscriptionId, | ||
} | ||
|
||
impl SearchIndex { | ||
pub fn open(index: Arc<Index>, settings: &Settings) -> Result<Self> { | ||
let mut schema_builder = TantivySchema::builder(); | ||
|
||
let schema = Schema { | ||
inscription_id: schema_builder.add_text_field("inscription_id", STRING | STORED), | ||
charm: schema_builder.add_text_field("charm", STRING), | ||
sat_name: schema_builder.add_text_field("sat_name", STRING), | ||
timestamp: schema_builder.add_date_field( | ||
"timestamp", | ||
DateOptions::from(INDEXED) | ||
.set_fast() | ||
.set_precision(DateTimePrecision::Seconds), | ||
), | ||
}; | ||
|
||
let path = settings.search_index().to_owned(); | ||
|
||
fs::create_dir_all(&path).snafu_context(error::Io { path: path.clone() })?; | ||
|
||
let search_index = | ||
TantivyIndex::open_or_create(MmapDirectory::open(path)?, schema_builder.build())?; | ||
|
||
let reader = search_index | ||
.reader_builder() | ||
.reload_policy(ReloadPolicy::OnCommitWithDelay) | ||
.try_into()?; | ||
|
||
let writer = search_index.writer(50_000_000)?; | ||
|
||
Ok(Self { | ||
ord_index: index, | ||
reader, | ||
schema, | ||
search_index, | ||
writer: Arc::new(Mutex::new(writer)), | ||
}) | ||
} | ||
|
||
pub fn search(&self, query: &str) -> Result<Vec<SearchResult>> { | ||
let searcher = self.reader.searcher(); | ||
|
||
Ok( | ||
searcher | ||
.search( | ||
&self.query_parser().parse_query(query)?, | ||
&TopDocs::with_limit(100), | ||
)? | ||
.iter() | ||
.filter_map(|(_score, doc_address)| { | ||
self | ||
.schema | ||
.search_result(&searcher.doc::<TantivyDocument>(*doc_address).ok()?) | ||
}) | ||
.collect(), | ||
) | ||
} | ||
|
||
pub fn update(&self) -> Result { | ||
let batch_size = 100; | ||
|
||
let mut starting_sequence_number = 0; | ||
|
||
let mut writer = self.writer.lock().unwrap(); | ||
|
||
loop { | ||
let batch = self.ord_index.get_inscriptions_by_sequence_range( | ||
starting_sequence_number, | ||
starting_sequence_number + batch_size, | ||
)?; | ||
|
||
if batch.is_empty() { | ||
return Ok(()); | ||
} | ||
|
||
for inscription_id in batch { | ||
self.add_inscription(inscription_id, &mut writer)?; | ||
|
||
if SHUTTING_DOWN.load(atomic::Ordering::Relaxed) { | ||
writer.commit()?; | ||
return Ok(()); | ||
} | ||
} | ||
|
||
writer.commit()?; | ||
|
||
starting_sequence_number += batch_size; | ||
} | ||
} | ||
|
||
fn add_inscription(&self, inscription_id: InscriptionId, writer: &mut IndexWriter) -> Result { | ||
let searcher = self.reader.searcher(); | ||
|
||
let query = self | ||
.query_parser() | ||
.parse_query(&format!("inscription_id:{inscription_id}"))?; | ||
|
||
if searcher.search(&query, &Count)? > 0 { | ||
return Ok(()); | ||
} | ||
|
||
let (inscription, _, _) = self | ||
.ord_index | ||
.inscription_info(query::Inscription::Id(inscription_id), None)? | ||
.ok_or(anyhow!(format!( | ||
"failed to get info for inscription with id `{inscription_id}`" | ||
)))?; | ||
|
||
let mut document = TantivyDocument::default(); | ||
|
||
document.add_text(self.schema.inscription_id, inscription.id.to_string()); | ||
|
||
for charm in inscription.charms { | ||
document.add_text(self.schema.charm, charm); | ||
} | ||
|
||
if let Some(sat) = inscription.sat { | ||
document.add_text(self.schema.sat_name, sat.name()); | ||
} | ||
|
||
document.add_date( | ||
self.schema.timestamp, | ||
DateTime::from_timestamp_secs(inscription.timestamp), | ||
); | ||
|
||
writer.add_document(document)?; | ||
|
||
log::info!( | ||
"Added inscription with id `{}` to search index", | ||
inscription_id | ||
); | ||
|
||
Ok(()) | ||
} | ||
|
||
fn query_parser(&self) -> QueryParser { | ||
let mut query_parser = | ||
QueryParser::for_index(&self.search_index, self.schema.default_search_fields()); | ||
|
||
query_parser.set_conjunction_by_default(); | ||
|
||
query_parser | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just good for debuggability: