-
Notifications
You must be signed in to change notification settings - Fork 484
Description
Is your feature request related to a problem? Please describe.
Changes configuration in the static node.yml file aren't picked up by running instances of quickwit. changing anything means that we most often have to restart the entire cluster to pick up changes. In a busy cluster, this can be rather disruptive from both an ingestion and search perspective. It is also quite a manual and laborious process.
Describe the solution you'd like
The node.yaml file should be watched and trigger live reconfiguration in places where possible reducing the amount of restarts and potential downtime of a running cluster
Describe alternatives you've considered
In an effort to make it more dynamic, we have templated environment variables allowing us to alter the pod specifications when we change this which will automtically trigger pods to restart in kubernetes. While less work, it is still rather disruptive.
Additional context
Letting kubernetes restart our indexer pool can take upwards of an hour and our ingestion is rather heavily degraded while that is happening. A combination of a reduced pool size of running indexers, shard rebalancing + backpressure, etc
Search performance drops and restarts cause a loss of hot cache in memory and it will take time for it to repopulate. Also the searcher pool size is reduce which contributes to the drop in performance.
Additionally, nodes, dropping in an out of the cluster puts a bit of stress on the control plane and metastore, increases the size of the chitchat state which has let to stability problems in the past
see: #5446