Skip to content

Replace dependency parser #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks
thjbdvlt opened this issue Jan 27, 2025 · 0 comments
Open
2 tasks

Replace dependency parser #2

thjbdvlt opened this issue Jan 27, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@thjbdvlt
Copy link
Owner

Improve the dependency parsing quality by training a new parser to replace the current one (2025-01-27).

  • Get or find a good corpus:
    • enough sentences.
    • containing all verb tenses and moods.
    • containing all personal pronouns.
    • GPLv3 compatible.
  • Create some tests to evaluate it.

The current parser is trained on many corpora, which could lead to inconsistent annotations (UD_French-Sequoia, UD_French-Rhapsodie and UD_French-ParisStories).

Current parser configuration:

[components.parser]
factory = "parser"

[components.parser.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "parser"
extra_state_tokens = false
hidden_width = 128
maxout_pieces = 3
use_upper = true
nO = null

[components.parser.model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"

[components.parser.model.tok2vec.embed]
@architectures = "SolipcysmeMultiHash"
width = ${components.parser.model.tok2vec.encode.width}
features = ["NORM","POS", "MORPH"]
u_features = []
rows = [4000, 2000, 4000]
include_static_vectors = true

[components.parser.model.tok2vec.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 128
depth = 4
window_size = 1
maxout_pieces = 3
@thjbdvlt thjbdvlt added the enhancement New feature or request label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant