feat(transcript-cleaner): add transcript cleaning feature with custom vocabulary #6

pablof7z · 2025-04-15T10:43:09Z

No more "I have this idea for a noster app"

(or maybe I should just learn to speak eng... nah, this is easier)

…izable vocabulary

…nses with improved handling of instructions and markers

…ine transcript cleaning process

…p related code

… vocabulary file handling

dergigi · 2025-04-16T11:02:23Z

Ok, so this is basically the same "hack" I use to clean up the TODOs and thus go from messy action_items to clean TODO:

https://github.com/dergigi/vibeline/blob/master/src/post_process.py#L55-L83

Conceptually, that should probably be either:

Its own plugin
Part of the general "post-processing" step
(Or each plugin might have its own post-processing step, as introduced with command:)

I'm afraid that if we don't get this right (conceptually) now, it will get out of hand soon. Either each plugin has MCP tools and can just do stuff, or we have a dedicated post-processing for each plugin that can execute a script or binary.

In the best case, plugins have defined input and output folders and (almost) everything is a plugin, including transcription.

name: transcript
input: /VoiceMemos/*.m4a
output: /VoiceMemos/transcripts # no need to specify, because pluralization is the default

name: transcript-cleaner
input: /VoiceMemos/transcripts
output: /VoiceMemos/transcripts

All of the above is pseudo-code obviously, but you get the idea.

With the input/output stuff we could do post-processing in a plugin-native way, e.g.

name: TODOs
input: /VoiceMemos/action_items
output: /VoiceMemos/TODOs

I'll have to think about this some more & better explain what I have in mind.

There is a way to do this that is extensible and flexible, without going down the configuration path, and without making things terribly complicated.

Stuff should be simple. Hmm.

pablof7z · 2025-04-16T14:13:56Z

Yes. Agreed.

I think ideally you would be able to drop an input in any output and have that output processed by whatever is listening for new inputs in that directory. If we have that then yeah, the transcription cleaning feature can easily be a plugin and plugins can choose whether to just use input: /VoiceMemos/clean_transcript or they could choose to use pre-cleaned version.

The reason I put it as part of the pipeline before calling the plugins is that it's very likely that all plugins want the clean output, i.e. I don't want "Noster" on any plugin's input, if I said "nostr" they should all see it as such.

An architecture where plugins are listening to arbitrary input directories would be amazing; for example you could have a plugin that reacts to you marking an action_item as complete.

pablof7z added 7 commits April 15, 2025 11:40

feat(transcript-cleaner): add transcript cleaning feature with custom…

418d770

…izable vocabulary

move test script to tests/

e372d55

fix(transcript-cleaner): enhance transcript extraction from LLM respo…

58386f3

…nses with improved handling of instructions and markers

refactor(transcript-cleaner): remove LLM model dependency and streaml…

b6763e1

…ine transcript cleaning process

fix(env): remove unused TRANSCRIPT_CLEANER_MODEL variable

59c93e0

refactor(transcript-cleaner): remove LLM model dependency and clean u…

9cbf217

…p related code

refactor(extract): streamline transcript cleaning process and improve…

ba1d23f

… vocabulary file handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(transcript-cleaner): add transcript cleaning feature with custom vocabulary #6

feat(transcript-cleaner): add transcript cleaning feature with custom vocabulary #6

Uh oh!

pablof7z commented Apr 15, 2025

Uh oh!

dergigi commented Apr 16, 2025 •

edited

Loading

Uh oh!

pablof7z commented Apr 16, 2025

Uh oh!

Uh oh!

feat(transcript-cleaner): add transcript cleaning feature with custom vocabulary #6

Are you sure you want to change the base?

feat(transcript-cleaner): add transcript cleaning feature with custom vocabulary #6

Uh oh!

Conversation

pablof7z commented Apr 15, 2025

Uh oh!

dergigi commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablof7z commented Apr 16, 2025

Uh oh!

Uh oh!

dergigi commented Apr 16, 2025 •

edited

Loading