Releases: BlackKakapo/Romanian-Word-Embeddings
Releases · BlackKakapo/Romanian-Word-Embeddings
Romanian Word Embeddings – SG & FastText (with PCA)
🔍 Overview
This release contains pretrained Word2Vec word embeddings for the Romanian language, trained using:
- Skip-Gram (SG) and
- FastText (FT) architectures
with dimensionality reduction via PCA.
These embeddings are suitable for:
- Word-level similarity
- Semantic analogy tasks
- Input for classic ML models (e.g., classifiers, clustering)
- Visualization & exploration
PCA was applied to reduce vector size from 300 ➜ 120 for better efficiency and speed.
Happy embedding!
FastText
v1.3 Update README.md
SG
v1.2 Update README.md
CBOW
v1.1 Update README.md
CBOW_300_25_5
v1.0 Update README.md