Skip to content

Conversation

vkkhare
Copy link
Contributor

@vkkhare vkkhare commented Jul 23, 2025

Here's a comprehensive PR description for all the changes:

Description

This PR adds comprehensive support for Qwen 3 1.7B model with tool calling capabilities, uses native ONNX runtime isntead of onnxruntime_genai, and adds the export script for model enhancements.

Key Features Added

  • Adds mlc/tokenizer-cpp as third party dependency which brings cpp bindings for rust implementations in HF
  • Adds Qwen 3 1.7B support with tool calling in nimblenet_py with module support via zip
  • FP16 data type support with proper uint16_t handling in binary operations
  • ONNX model export with simplified token_id predictions instead of logits

Cpp bindings

// - dist/tokenizer.json
void HuggingFaceTokenizerExample() {
  // Read blob from file.
  auto blob = LoadBytesFromFile("dist/tokenizer.json");
  // Note: all the current factory APIs takes in-memory blob as input.
  // This gives some flexibility on how these blobs can be read.
  auto tok = Tokenizer::FromBlobJSON(blob);
  std::string prompt = "What is the capital of Canada?";
  // call Encode to turn prompt into token ids
  std::vector<int> ids = tok->Encode(prompt);
  // call Decode to turn ids into string
  std::string decoded_prompt = tok->Decode(ids);
}

void SentencePieceTokenizerExample() {
  // Read blob from file.
  auto blob = LoadBytesFromFile("dist/tokenizer.model");
  // Note: all the current factory APIs takes in-memory blob as input.
  // This gives some flexibility on how these blobs can be read.
  auto tok = Tokenizer::FromBlobSentencePiece(blob);
  std::string prompt = "What is the capital of Canada?";
  // call Encode to turn prompt into token ids
  std::vector<int> ids = tok->Encode(prompt);
  // call Decode to turn ids into string
  std::string decoded_prompt = tok->Decode(ids);
}

Delitepy Bindings

from delitepy import tokenizers

tokenizer = tokenizers.from_json(<tokenizer.json>)
token_ids = tokenizer.encode(text)
input_ids = nm.tensor([token_ids], "int64")
response = tokenizer.decode(input_ids)

FP16 Support

Enhanced binary operations now support FP16 data type through uint16_t:

# FP16 tensors now supported in all binary operations
fp16_tensor = nm.tensor(data, "float16")  # Uses uint16_t internally
result = fp16_tensor + other_tensor  # Works with add, sub, mult, div, pow, mod

Kotlin Interface

Reverse stream of generation from python and subscription in kotlin flows.

    private fun createNimbleNetTensorFromForeignFunction(fn: (String?) -> Unit) : NimbleNetTensor {
        val callbackDelitePy : DelitePyForeignFunction =  fun(input: NimbleNetTensorMap?): NimbleNetTensorMap? {
            val outputStream = input?.get("token_stream")?.data as String?
            fn(outputStream)
            return hashMapOf("result" to NimbleNetTensor(data = true, datatype = DATATYPE.BOOL, shape = intArrayOf()))
        }
        return NimbleNetTensor(data = callbackDelitePy, datatype = DATATYPE.FUNCTION, shape = intArrayOf())
    }

    suspend fun feedInput(input: String, isVoiceInitiated: Boolean, callback: (String?)->Unit) : String? {
        val res = NimbleNet.runMethod(
            "prompt_for_tool_calling",
            inputs = hashMapOf(
                "prompt" to NimbleNetTensor(input, DATATYPE.STRING, null),
                "output_stream_callback" to  createNimbleNetTensorFromForeignFunction(callback)
            ),
        )
        assert(res.status) { "NimbleNet.runMethod('prompt_for_tool_calling') failed with status: ${res.status}" }
        return res.payload?.get("results")?.data as String?
    }

Qwen Demo Setup

The Qwen demo uses a zip-based modules in delitePy

cd nimblenet_py/simulation_assets/qwen_demo
zip -j qwen_modules.zip qwen_modules/*.py
python run_demo.py

Tool Calling Features

  • Multi-step conversation support with automatic tool execution
  • JSON-based tool calling with <tool_call> XML tags
  • Built-in tools: weather, math calculator, time, location
  • Error handling and recovery for failed tool calls

Checklist:

  • I have added tests that prove my fix is effective or that my feature works
  • Has user-facing changes. This may include API or behavior changes and performance improvements, etc

Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai>
@vkkhare vkkhare self-assigned this Jul 23, 2025
add tokenizer-cpp

add jinja template for qwen and dict support for tokenizer:from_json

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>
Signed-off-by: Varun Khare <varunkhare1234@gmail.com>
f"{library_stubs_dir}/src_gen",
coreruntime_dir,
],
["cp", "-r", f"{library_stubs_dir}/src_template", f"{library_stubs_dir}/src_gen"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this accidental change?
  2. cp -R is the portable form, compared to cp -r.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Comment on lines 16 to 28
* Compares two data types and returns the one with higher precedence
* for automatic type promotion in operations. The precedence order is:
* BOOLEAN (0) < INT32 (3) < INT64 (4) < FLOAT (5) < DOUBLE (6)
* BOOLEAN (0) < INT32 (3) < INT64 (4) < FLOAT16 (4.5) < FLOAT (5) < DOUBLE (6)
*
* @param dataType1 First data type to compare
* @param dataType2 Second data type to compare
* @return The data type with higher precedence
*/
inline int get_max_dataType(int dataType1, int dataType2) {
std::map<int, int> _typeScore = {
{DATATYPE::BOOLEAN, 0}, {DATATYPE::INT32, 3}, {DATATYPE::INT64, 4},
{DATATYPE::FLOAT, 5}, {DATATYPE::DOUBLE, 6},
{DATATYPE::BOOLEAN, 0}, {DATATYPE::INT32, 3}, {DATATYPE::INT64, 4},
{DATATYPE::FLOAT16, 45}, {DATATYPE::FLOAT, 5}, {DATATYPE::DOUBLE, 6},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup will update this

vkkhare and others added 5 commits July 30, 2025 00:06
# This is the 1st commit message:

add support for dictionary indexing in onnx executor

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

# This is the commit message NimbleEdge#2:

add dictionary input support to model.run() for kv_cache

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

# This is the commit message NimbleEdge#3:

add fp16 support in delitepy

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

# This is the commit message NimbleEdge#4:

Qwen with tool calling functional in delitePy

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

# This is the commit message NimbleEdge#5:

Implemented enumerate and next in DelitePy (NimbleEdge#162)

* Implemented enumerate and next in DelitePy
Signed-off-by: Atul Jain <atul.jain@nimbleedgehq.ai>

* Cosmetics

Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>

---------

Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>
Co-authored-by: Atul Jain <atul.jain@nimbleedgehq.ai>
Co-authored-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>
Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>
Signed-off-by: Puneet Jindal <puneet.jindal@nimbleedgehq.ai>
Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai>

modular qwen demo structure

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

wip handle attention cache

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>

resume from last postion for multi-step run

Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>
Signed-off-by: Varun Khare <varun.khare@nimbledgehq.ai>
@vkkhare vkkhare marked this pull request as ready for review July 29, 2025 20:29
@vkkhare vkkhare requested review from a team and nrjpoddar as code owners July 29, 2025 20:29
@vkkhare vkkhare changed the title Adding HF_Tokenizers support to delitepy Qwen 3 1.7B Offline tool Calling Android Jul 29, 2025
@vkkhare vkkhare changed the title Qwen 3 1.7B Offline tool Calling Android Qwen 3 1.7B Offline tool calling Android Jul 29, 2025
vkkhare added 2 commits July 30, 2025 02:46
Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>
Signed-off-by: Varun Khare <varun.khare@nimbleedgehq.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants