A zero-shot NLP toolkit (powered by Instructor) #1485

rmitsch · 2025-04-15T18:00:08Z

rmitsch
Apr 15, 2025

Hey all!

I'm working on https://github.com/mantisai/sieves, a tool making it easy to build a pipeline of NLP tasks only with zero-shot models, using only generative and decoder-only models - i.e. no model training.

My motivation with this is that in most of my projects NLP projects can be done better and faster by breaking them down into a pipeline of tasks. Very often these tasks are the same (the classic NLP tasks + information extraction + question answering + summarization + ...), so the library comes with a bunch of those already implemented.

The idea is to have a library that jumpstarts a modern NLP project by providing a document- and pipeline-based NLP tool (similar to spaCy) that doesn't require any model training to get a quick prototype off the ground. It guarantees correct outputs from generative models by leveraging structured output functionality from libraries like Instructor, Outlines, DSPy, LangChain, etc. It also comes with some useful utilities for NLP pipelines like OCR or exporting model predictions to a HF dataset for fine-tuning.

I'd be excited if you checked it out, especially so about any feedback 🙂

If you're interested in what this looks like, here's a simple snippet to run zero-shot classification (you have to run pip install sieves before):

from sieves import Pipeline, Engine, tasks, Doc
from sieves.engines.instructor_ import Model

# 1. Define documents by text or URI.
docs = [Doc(text="Special relativity applies to all physical phenomena in the absence of gravity.")]

# 2. Create engine responsible for generating structured output.
model = Model(
    name="claude-3-haiku-20240307",
    client=instructor.from_anthropic(anthropic.AsyncClient()),
)
engine = Engine(model=model)

# 3. Create pipeline with tasks.
pipe = Pipeline(
    [
        # 4. Add classification task to pipeline.
        tasks.Classification(labels=["science", "politics"], engine=engine),
    ]
)

# 5. Run pipe and output results.
for doc in pipe(docs):
  print(doc.results)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A zero-shot NLP toolkit (powered by Instructor) #1485

{{title}}

Replies: 0 comments

Select a reply

A zero-shot NLP toolkit (powered by Instructor) #1485

rmitsch Apr 15, 2025

Replies: 0 comments

rmitsch
Apr 15, 2025