GraniteMoeShared
This model was released on 2024-08-23 and added to Hugging Face Transformers on 2025-02-14.
GraniteMoeShared
Section titled “GraniteMoeShared”Overview
Section titled “Overview”The GraniteMoe model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.
Additionally this class GraniteMoeSharedModel adds shared experts for Moe.
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "ibm-research/moe-7b-1b-active-shared-experts"tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPUmodel = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")model.eval()
# change input text as desiredprompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the textinput_tokens = tokenizer(prompt, return_tensors="pt")# generate output tokensoutput = model.generate(**input_tokens, max_new_tokens=100)# decode output tokens into textoutput = tokenizer.batch_decode(output)# loop over the batch to print, in this example the batch size is 1for i in output: print(i)This HF implementation is contributed by Mayank Mishra, Shawn Tan and Sukriti Sharma.
GraniteMoeSharedConfig
Section titled “GraniteMoeSharedConfig”[[autodoc]] GraniteMoeSharedConfig
GraniteMoeSharedModel
Section titled “GraniteMoeSharedModel”[[autodoc]] GraniteMoeSharedModel - forward
GraniteMoeSharedForCausalLM
Section titled “GraniteMoeSharedForCausalLM”[[autodoc]] GraniteMoeSharedForCausalLM - forward