You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can't run the benchmark code in mk mode when batch_size is greater than 1. The model I use is Llama-3.2-1B-Instruct, batch size is 2. All other parameters of ScriptConfig is set to default value.
Take the following instruction as an example.
python megakernels/scripts/generate.py mode=mk prompt="tell me a funny joke about cookies" ntok=100 batch_size=2
The traceback info is as below.
Traceback (most recent call last):
File "/root/Megakernels/megakernels/scripts/generate.py", line 211, in <module>
pydra.run(main)
File "/venv/lib/python3.12/site-packages/pydra/cli.py", line 146, in run
return _apply_overrides_and_call(fn, first_arg_type, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/pydra/cli.py", line 118, in _apply_overrides_and_call
return fn(config)
^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/Megakernels/megakernels/scripts/generate.py", line 174, in main
gen.generate(output_tokens, prompt_len, config.ntok - 1)
File "/root/Megakernels/megakernels/generators.py", line 165, in generate
output_ids = self.run(input_ids, pos_id=pos_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/Megakernels/megakernels/generators.py", line 132, in run
self.schedule.globs.hidden_states[:] = hiddens.squeeze(1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
RuntimeError: expand(CUDABFloat16Type{[2, 2048]}, size=[2048]): the number of sizes provided (1) must be greater or equal to the number of dime
nsions in the tensor (2)
Potential cause
I think the problem lies in the shape of BaseGlobals.hidden_state. It is initialized in make_global() function of Megakernels/megakernels/demos/latency/scheduler.py.
hidden_states=make_buffer(config.hidden_size)
So the hidden_states has only one dimension because config.hidden_size is a model-related constant, let it be hidden_size. But if out batch size is greater than 1, let it be n, then in run function of MK_Generator, the input_ids should have shape (n, 1). And hiddens should have size (n, 1, hidden_size), which can not be squeezed into self.schedule.globs.hidden_states (the shape is (hidden_size)).
Environment
GPU: H800
OS: Linux x86_64
CUDA: 12.8
Python: 3.12
The text was updated successfully, but these errors were encountered:
Labels: bugs, help needed
Issue Description
I can't run the benchmark code in
mk
mode whenbatch_size
is greater than 1. The model I use is Llama-3.2-1B-Instruct, batch size is 2. All other parameters ofScriptConfig
is set to default value.Take the following instruction as an example.
python megakernels/scripts/generate.py mode=mk prompt="tell me a funny joke about cookies" ntok=100 batch_size=2
The traceback info is as below.
Potential cause
I think the problem lies in the shape of
BaseGlobals.hidden_state
. It is initialized inmake_global()
function ofMegakernels/megakernels/demos/latency/scheduler.py
.So the
hidden_states
has only one dimension becauseconfig.hidden_size
is a model-related constant, let it behidden_size
. But if out batch size is greater than 1, let it ben
, then inrun
function ofMK_Generator
, theinput_ids
should have shape(n, 1)
. Andhiddens
should have size(n, 1, hidden_size)
, which can not be squeezed into self.schedule.globs.hidden_states (the shape is(hidden_size)
).Environment
The text was updated successfully, but these errors were encountered: