Skip to content

Conversation

ralphbean
Copy link

SUMMARY:

Previously, llm-compressor ignored HF_HUB_CACHE and other environment variables when loading models and datasets, making offline mode difficult to use with unified cache directories.

This change:

  • Removes hard-coded TRANSFORMERS_CACHE in model_load/helpers.py to respect HF_HOME, HF_HUB_CACHE environment variables
  • Propagates cache_dir from model_args to dataset_args to enable unified cache directory for both models and datasets
  • Updates dataset loading to use cache_dir parameter instead of hardcoded None

Now users can specify cache_dir parameter or use HF_HOME/HF_HUB_CACHE environment variables for true offline operation.

Offline mode is super helpful to supply-chain security use cases. It helps us generate trustworthy SBOMs for AI stuff. 🔐 🧠

TEST PLAN:

I start with the oneshot example from the README, and called it example.py:

""" This is the example from the README """

from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor import oneshot

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(scheme="W8A8", targets="Linear", ignore=["lm_head"]),
]

oneshot(
    model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    dataset="open_platypus",
    recipe=recipe,
    output_dir="TinyLlama-1.1B-Chat-v1.0-INT8",
    max_seq_length=2048,
    num_calibration_samples=512,
)

Next, remove your hf local cache to ensure your system has nothing available to it yet:

❯ rm -rf ~/.cache/huggingface

Then, run example.py with the HF_HUB_OFFLINE=1 env var. This should fail, proving that you have nothing cached.

❯ HF_HUB_OFFLINE=1 python example.py
Traceback (most recent call last):
  File "/home/rbean/code/llm-compressor/testtest/lib64/python3.13/site-packages/transformers/utils/hub.py", line 479, in cached_files
...
<snip>
...
OSError: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Check your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Good. Now, run it with HF_HUB=./hf-hub which will run it in online mode, populating the cache in a new non-standard location (just to be sure things don't get mixed up during our test):

❯ HF_HOME=./hf-hub python example.py
<lots of downloading happens, but you can ctrl-C when it gets into the real compression work>

Now, finally, you can run with both HF_HOME and HF_HUB_OFFLINE=1 and prove to yourself that llm-compressor uses that freshly-populated cache for both the model and the dataset.

❯ HF_HOME=./hf-hub HF_HUB_OFFLINE=1 python example.py
<it works!>

Copy link

github-actions bot commented Oct 7, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Contributor

Summary of Changes

Hello @ralphbean, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the llm-compressor library's support for offline operations by correctly integrating with Hugging Face's caching mechanisms. It ensures that both models and datasets can utilize a single, user-specified or environment-variable-defined cache directory, which is vital for secure supply-chain practices and generating accurate Software Bill of Materials (SBOMs) for AI components.

Highlights

  • Environment Variable Handling: The pull request fixes an issue where llm-compressor previously ignored standard Hugging Face environment variables like HF_HUB_CACHE and HF_HOME, which made true offline mode difficult to achieve.
  • Unified Cache Directory: The changes enable a unified cache directory for both models and datasets by propagating the cache_dir parameter from model arguments to dataset arguments.
  • Removed Hard-coded Cache Path: A hard-coded TRANSFORMERS_CACHE path has been removed, allowing the system to respect environment variables for determining the cache location for models.
  • Improved Dataset Loading: Dataset loading has been updated to explicitly use the cache_dir parameter, ensuring datasets are also loaded from the correct, user-defined or environment-variable-specified cache.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves offline mode support by ensuring Hugging Face environment variables for caching are respected. The changes correctly propagate the cache_dir from model arguments to dataset arguments, and adjust hf_hub_download to use the default caching behavior. My main feedback is on how the cache_dir is added to dataset_args, suggesting a more explicit definition in the DatasetArguments dataclass for better code clarity and maintainability.

@ralphbean
Copy link
Author

For context, I got interested in fixing this after trying to make llm-compressor work in combination with hermeto -> hermetoproject/hermeto#1141

Previously, llm-compressor ignored HF_HUB_CACHE and other environment
variables when loading models and datasets, making offline mode difficult
to use with unified cache directories.

This change:
- Removes hard-coded TRANSFORMERS_CACHE in model_load/helpers.py to
  respect HF_HOME, HF_HUB_CACHE environment variables
- Propagates cache_dir from model_args to dataset_args to enable
  unified cache directory for both models and datasets
- Updates dataset loading to use cache_dir parameter instead of
  hardcoded None

Now users can specify cache_dir parameter or use HF_HOME/HF_HUB_CACHE
environment variables for true offline operation.

Signed-off-by: Ralph Bean <rbean@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! One question

repo_id=cache_path,
filename="config.json",
cache_dir=TRANSFORMERS_CACHE,
cache_dir=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this model_args.cache_dir instead of None?

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to remove model_args.cache_dir and dataset_args.cache_dir if their removal means that the user can use HF_HUB_CACHE for both

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants