Skip to content

Conversation

adrianeboyd
Copy link
Contributor

@adrianeboyd adrianeboyd commented Jul 12, 2024

Add a framework that generates mock responses using polyfactory.

Related to #1.

Summary by Sourcery

This pull request adds a new framework, PolyfactoryFramework, which generates mock responses using the polyfactory library. Configuration for this framework has been added to config.yaml, and the framework is imported in frameworks/init.py.

  • New Features:
    • Introduced PolyfactoryFramework to generate mock responses using the polyfactory library.
  • Enhancements:
    • Updated config.yaml to include configuration for PolyfactoryFramework.
    • Modified frameworks/init.py to import PolyfactoryFramework.

Add a framework that generates mock responses using `polyfactory`.
Copy link

sourcery-ai bot commented Jul 12, 2024

Reviewer's Guide by Sourcery

This pull request introduces a new framework, PolyfactoryFramework, which generates mock responses using the polyfactory library. The changes include updates to the configuration file, the framework initialization file, and the addition of a new framework implementation file.

File-Level Changes

Files Changes
config.yaml
frameworks/__init__.py
frameworks/polyfactory_framework.py
Introduced PolyfactoryFramework to generate mock responses, updated configuration and initialization files, and added the new framework implementation.

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @adrianeboyd - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

@adrianeboyd
Copy link
Contributor Author

(This sourcery thing seems noisier and less capable than a linter.)

@stephenleo
Copy link
Owner

Hey I ran your branch locally and have some comments:

  1. The framework will better suit the Synthetic Data Generation task instead of the Multi-label classification task. I'm planning it out and will submit some updates to your branch once that's ready.
  2. It seems that the framework generates only synthetic labels without corresponding text. I'm not sure if polyfactory can do that. So we may need to change the response_model used to something more suitable for synthetic data generation. I'm still thinking about this as part of the Synthetic Data Generation task too.

So for now, I'll keep this PR open and revisit it again once I have Synthetic Data Generation task up and running.

@adrianeboyd
Copy link
Contributor Author

I kind of disagree, because I think it's more reasonable to have it available as a comparison for classification tasks that have a sensible random baseline. I think it would be much less interesting for a synthetic generation task (I guess unless the task is boring enough that you should be using faker instead).

It doesn't refer to the input text because it's just generating a random list of labels from the provided schema. (In a few tests where I limited the number of possible labels to match the sampling setup and then sampled more data, it had something like 0.1-0.3% accuracy. I would also guess that a majority baseline might be better than a random baseline, but I didn't try that.)

And this was all a bit facetious, I didn't necessarily expect it to be merged, since adding any accuracy metrics to the table would make you immediately want to eliminate it. The point was just that you can easily generate the structure and a random baseline from the response model.

@stephenleo
Copy link
Owner

Ah got it. good point! Yep, I'm definitely on the lookout for a suitable dataset to include an accuracy metric that will immediately flag a random label generator as inaccurate. I'm currently prioritizing getting the code up first so that datasets can be easily swapped in and out.

Use an equivalent six digit postal code field definition that is
supported by polyfactory rather than a separate validator method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants