Skip to content

Conversation

dilithjay
Copy link
Contributor

If parsing in AUTO-routing mode and autoselect_llm=True (can be set as a kwarg to the parse function)

A ranked list of models is obtained based on the similarity of the input doc to the documents in the benchmark. The model ranking for the most similar doc will be used for the input doc. The router will use the highest scoring model for which the API keys have been provided.

This PR also:

  • Updates benchmark with autoselect_llm=True
  • Refactors utility functions -> a separate file for conversion utils

…e a page based on page content

- Update benchmark with option
- Refactor utility functions - create separate file for conversion utils
@pramitchoudhary pramitchoudhary added the enhancement New feature or request label Jul 29, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an automatic LLM selection feature for the parsing system. When autoselect_llm=True is set in AUTO-routing mode, the system selects the best-performing LLM based on document similarity to benchmark data.

  • Introduces a new DocumentRankedLLMSelector that ranks models based on document similarity
  • Refactors conversion utilities into a separate module for better organization
  • Updates benchmark testing to support the new autoselect feature with comparative results

Reviewed Changes

Copilot reviewed 10 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lexoid/core/llm_selector.py New module implementing document similarity-based LLM selection
lexoid/core/conversion_utils.py Refactored conversion utilities moved from utils.py
lexoid/core/utils.py Updated router function to support auto LLM selection and removed conversion functions
lexoid/core/parse_type/llm_parser.py Updated imports and removed OpenAI parameter restrictions
lexoid/api.py Modified parse function to handle auto-selected models
tests/benchmark.py Added autoselect_llm parameter support and model name handling
tests/results.csv Updated benchmark results including auto-selected model performance
tests/api_cost_mapping.json Added cost mappings for GPT-5 models
docs/benchmark.rst Updated documentation with new benchmark results
README.md Updated benchmark table with latest performance data

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

dilithjay and others added 4 commits August 25, 2025 14:05
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- remove together ai llama models as they are no longer supported
@dilithjay dilithjay merged commit 693181c into main Sep 2, 2025
@dilithjay dilithjay deleted the dj/router-model branch September 2, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants