-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Support intern-s1 #14875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support intern-s1 #14875
Conversation
The |
@CISC hi, could you tell how to fix this error? Seems not reasonable to me
|
Running llama.cpp/convert_hf_to_gguf.py Lines 3002 to 3005 in 5eba3e3
|
try: | ||
self._set_vocab_sentencepiece() | ||
except FileNotFoundError: | ||
self._set_vocab_gpt2() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
self._set_vocab_sentencepiece() | |
except FileNotFoundError: | |
self._set_vocab_gpt2() | |
super().set_vocab() |
special_tokens_map_file = self.dir_model / 'special_tokens_map.json' | ||
additional_special_tokens = [] | ||
if special_tokens_map_file.is_file(): | ||
with open(special_tokens_map_file, encoding = 'utf-8') as f: | ||
additional_special_tokens = json.load(f).get('additional_special_tokens', []) | ||
tokenizer_cfg_file = self.dir_model / 'special_tokens_map.json' | ||
if tokenizer_cfg_file.is_file(): | ||
with open(tokenizer_cfg_file, encoding = 'utf-8') as f: | ||
added_tokens_decoder = json.load(f).get('added_tokens_decoder', {}) | ||
token2ids_map = {data['content'] : int(token) for token, data in added_tokens_decoder.items() if data['special']} | ||
for token in additional_special_tokens: | ||
if token in token2ids_map: | ||
special_vocab._set_special_token(token, token2ids_map[token]) | ||
special_vocab._set_special_token('eos', 151645) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special_tokens_map_file = self.dir_model / 'special_tokens_map.json' | |
additional_special_tokens = [] | |
if special_tokens_map_file.is_file(): | |
with open(special_tokens_map_file, encoding = 'utf-8') as f: | |
additional_special_tokens = json.load(f).get('additional_special_tokens', []) | |
tokenizer_cfg_file = self.dir_model / 'special_tokens_map.json' | |
if tokenizer_cfg_file.is_file(): | |
with open(tokenizer_cfg_file, encoding = 'utf-8') as f: | |
added_tokens_decoder = json.load(f).get('added_tokens_decoder', {}) | |
token2ids_map = {data['content'] : int(token) for token, data in added_tokens_decoder.items() if data['special']} | |
for token in additional_special_tokens: | |
if token in token2ids_map: | |
special_vocab._set_special_token(token, token2ids_map[token]) | |
special_vocab._set_special_token('eos', 151645) |
Can merge after changes and @ngxson approves. |
Hmmm, after changes... :) |
woops sorry I missed that part. I guest @RunningLeon you need to open a new PR then |
* support internvl * support interns1 * resolve comments * put interns1 in tensor mapping * resolve comment * move tokenizer changes to sub class
* support internvl * support interns1 * resolve comments * put interns1 in tensor mapping * resolve comment * move tokenizer changes to sub class
Support internlm/Intern-S1