Skip to content

Turaco: The first LLM that speaks native pidgin english fluently #52

@FotieMConstant

Description

@FotieMConstant

Introduction

A large language model fine-tuned to fluently speak and understand native Pidgin English for natural communication across Africa.

Description

Turaco is the first LLM developed specifically to handle conversational pidgin english, focussing on natural interactions and everyday communication(haven't seen an AI that speaks good pidgin english, openai's chatgpt sucks at this). Built on Meta's LLAMA 3.1 base model, it aims to bridge the language gap by allowing users to engage with an AI that understands and responds in pidgin english, which is widely spoken in countries like Cameroon and Nigeria. The project is created to provide an accessible tool for education, communication, and cultural preservation, making pidgin speakers feel represented in the AI space. The model was trained using curated datasets collected from various online sources, ensuring that it grasps the unique nuances of Pidgin.

what is done so far can be found here: https://github.com/FotieMConstant/turaco

Relevant Technology

Relevant Technology:
Language: Python
Platform: Hugging Face Transformers, Google Colab A100 for fine-tuning(can also work on the free T4)
Model Base: LLAMA 3.1-8B (Meta's large language model)
Libraries/Frameworks:
Hugging Face transformers for model implementation
hf datasets library for data handling
PyTorch as the underlying deep learning framework

Complexity

  • Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
  • Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
  • Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time

  • Little work - A couple of days
  • Medium work - A week or two
  • Much work - The project will take more than a couple of weeks and serious planning is required

Categories

  • Mobile app
  • IoT
  • Web app
  • Frontend/UI
  • AI/ML
  • APIs/Backend
  • Voice Assistant
  • Developer Tooling
  • Extension/Plugin/Add-On
  • Design/UX
  • AR/VR
  • Bots
  • Security
  • Blockchain
  • Futuristic Tech/Something Unique

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions