-
Notifications
You must be signed in to change notification settings - Fork 7
feat - tests + evals #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes introduce a new evaluation module with multiple financial and market data analysis functions, aggregate them into a configuration, and export them for use. Documentation updates explain how to run these evaluations using the newly added Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant EvalsModule (evals.ts)
participant OpenAI GPT-4
User->>EvalsModule (evals.ts): Invoke evaluation function via run()
EvalsModule (evals.ts)->>OpenAI GPT-4: Submit grading prompt with financial/market question
OpenAI GPT-4-->>EvalsModule (evals.ts): Return grading result (JSON)
EvalsModule (evals.ts)-->>User: Parse and return result
sequenceDiagram
participant User
participant npx mcp-eval
participant EvalsModule (evals.ts)
participant OpenAI GPT-4
User->>npx mcp-eval: Run eval with environment variable (e.g., OPENAI_API_KEY)
npx mcp-eval->>EvalsModule (evals.ts): Load and execute selected eval function
EvalsModule (evals.ts)->>OpenAI GPT-4: Send evaluation prompt
OpenAI GPT-4-->>EvalsModule (evals.ts): Return result
EvalsModule (evals.ts)-->>npx mcp-eval: Return parsed result
npx mcp-eval-->>User: Output evaluation result
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
README.md (2)
78-84
: Fix grammar in "an mcp client" to "an MCP client"The documentation correctly explains the evals functionality, but there's a small grammar issue.
-The evals package loads an mcp client that then runs the index.ts file, so there is no need to rebuild between tests. You can load environment variables by prefixing the npx command. Full documentation can be found [here](https://www.mcpevals.io/docs). +The evals package loads an MCP client that then runs the index.ts file, so there is no need to rebuild between tests. You can load environment variables by prefixing the npx command. Full documentation can be found [here](https://www.mcpevals.io/docs).🧰 Tools
🪛 LanguageTool
[misspelling] ~80-~80: Use “a” instead of ‘an’ if the following word doesn’t start with a vowel sound, e.g. ‘a sentence’, ‘a university’.
Context: ... Running evals The evals package loads an mcp client that then runs the index.ts ...(EN_A_VS_AN)
83-83
: Consider providing a default OpenAI model in the example commandThe example is good, but users might need to specify the OpenAI model for the evals to run properly.
-OPENAI_API_KEY=your-key npx mcp-eval src/evals/evals.ts src/index.ts +OPENAI_API_KEY=your-key OPENAI_MODEL=gpt-4 npx mcp-eval src/evals/evals.ts src/index.tssrc/evals/evals.ts (3)
36-36
: Inconsistent naming convention in agent nameOther agent names use lowercase with hyphens, but this one uses title case.
- name: "Octagon Stock Data Agent Evaluation", + name: "octagon-stock-data-agent Evaluation",
57-59
: Export pattern could be more conciseThe code exports the same array twice (once in the config and once directly).
-export default config; - -export const evals = [octagonSecAgentEval, octagonTranscriptsAgentEval, octagonFinancialsAgentEval, octagonStockDataAgentEval, octagonCompaniesAgentEval]; +const evals = [octagonSecAgentEval, octagonTranscriptsAgentEval, octagonFinancialsAgentEval, octagonStockDataAgentEval, octagonCompaniesAgentEval]; + +export default config; +export { evals };
1-59
: Consider adding configuration for timeout or retry logicFor robustness in production environments, evaluations might benefit from timeout or retry logic when API calls fail.
Would you like me to provide an implementation example with retry logic for the evaluation functions?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md
(1 hunks)package.json
(2 hunks)src/evals/evals.ts
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[misspelling] ~80-~80: Use “a” instead of ‘an’ if the following word doesn’t start with a vowel sound, e.g. ‘a sentence’, ‘a university’.
Context: ... Running evals The evals package loads an mcp client that then runs the index.ts ...
(EN_A_VS_AN)
🔇 Additional comments (8)
package.json (1)
41-42
: Dependencies added correctly for the evals integrationThe
mcp-evals
package has been added correctly with version^1.0.18
, which aligns with the evaluation functionality introduced in the PR.src/evals/evals.ts (7)
3-5
: Imports look good for the evaluation functionalityThe imports correctly bring in required dependencies from the mcp-evals package and OpenAI SDK.
7-14
: Good implementation of SEC agent evaluationThe evaluation function is well-structured with clear name, description, and run method.
16-23
: Transcript agent evaluation looks goodThe evaluation function is well-structured with appropriate name, description, and query.
25-32
: Well-structured financials agent evaluationComprehensive query to properly test the agent's capabilities with multiple companies and metrics.
34-41
: Stock data agent evaluation properly definedThe name, description, and evaluation query are well-crafted.
43-50
: Companies agent evaluation looks goodThe evaluation function is well-structured with clear name, description, and query.
52-55
: Configuration object properly definedGood structure with model and evals array containing all evaluation functions.
const result = await grade(openai("gpt-4"), "What was Apple's R&D expense as a percentage of revenue in their latest fiscal year?"); | ||
return JSON.parse(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for grade function and JSON parsing
The grade function could fail if there are API issues, and JSON.parse might fail if the result isn't properly formatted JSON.
- run: async () => {
- const result = await grade(openai("gpt-4"), "What was Apple's R&D expense as a percentage of revenue in their latest fiscal year?");
- return JSON.parse(result);
+ run: async () => {
+ try {
+ const result = await grade(openai("gpt-4"), "What was Apple's R&D expense as a percentage of revenue in their latest fiscal year?");
+ return JSON.parse(result);
+ } catch (error) {
+ console.error("Error in octagonSecAgentEval:", error);
+ throw error;
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const result = await grade(openai("gpt-4"), "What was Apple's R&D expense as a percentage of revenue in their latest fiscal year?"); | |
return JSON.parse(result); | |
run: async () => { | |
try { | |
const result = await grade(openai("gpt-4"), "What was Apple's R&D expense as a percentage of revenue in their latest fiscal year?"); | |
return JSON.parse(result); | |
} catch (error) { | |
console.error("Error in octagonSecAgentEval:", error); | |
throw error; | |
} | |
} |
🤖 Prompt for AI Agents
In src/evals/evals.ts around lines 11 to 12, add error handling for the grade
function call and the JSON.parse operation. Wrap the await grade call and
JSON.parse in a try-catch block to catch any exceptions from API failures or
invalid JSON. In the catch block, handle or log the error appropriately to
prevent the function from crashing unexpectedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
README.md (1)
278-285
: Improve section hierarchy and consistency
- Change the heading from
## Running Evals
to### Running Evals
to properly nest it under the Troubleshooting section.- Capitalize
MCP
inMCP client
to match the acronym usage elsewhere.- Verify that the CLI command (
npx mcp-eval
) aligns with the actual published package name (mcp-evals
) and update the command or documentation accordingly.🧰 Tools
🪛 LanguageTool
[misspelling] ~280-~280: Use “a” instead of ‘an’ if the following word doesn’t start with a vowel sound, e.g. ‘a sentence’, ‘a university’.
Context: ... Running Evals The evals package loads an mcp client that then runs the index.ts ...(EN_A_VS_AN)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[misspelling] ~280-~280: Use “a” instead of ‘an’ if the following word doesn’t start with a vowel sound, e.g. ‘a sentence’, ‘a university’.
Context: ... Running Evals The evals package loads an mcp client that then runs the index.ts ...
(EN_A_VS_AN)
Adds new e2e test that loads an MCP client, which in turn runs the server and processes the actual tool call. Afterwards, it then grades the response for correctness.
note: I'm the package author
Summary by CodeRabbit