Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .github/workflows/linkcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,16 @@ jobs:
markdown-link-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: umbrelladocs/action-linkspector@v1
- name: Checkout code
uses: actions/checkout@v4

- name: Run linkspector
uses: umbrelladocs/action-linkspector@v1
with:
github_token: ${{ secrets.github_token }}
reporter: github-pr-review
fail_on_error: true
filter_mode: nofilter
config_file: '.github/workflows/linkspector/linkspector.yml'
show_stats: true
level: info
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Applying quantization with `llmcompressor`:

### User Guides
Deep dives into advanced usage of `llmcompressor`:
* [Quantizing with large models with the help of `accelerate`](examples/big_models_with_accelerate/README.md)
* [Quantizing large models with sequential onloading](examples/big_models_with_sequential_onloading/README.md)


## Quick Tour
Expand Down
2 changes: 1 addition & 1 deletion docs/developer/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ If not, please file a new issue, providing as much relevant information as possi

### Pull Requests & Code Reviews

Please check the PR checklist in the [PR template](.github/PULL_REQUEST_TEMPLATE.md) for detailed guide for contribution.
Please check the PR checklist in the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md) for detailed guide for contribution.

### Thank You

Expand Down
8 changes: 4 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,16 @@ Review the [LLM Compressor v0.8.0 release notes](https://github.com/vllm-project
## Recent Updates

!!! info "QuIP and SpinQuant-style Transforms"
The newly added [`QuIPModifier`](examples/transform/quip_example.py) and [`SpinQuantModifier`](examples/transform/spinquant_example.py) allow you to quantize models after injecting hadamard weights into the computation graph, reducing quantization error and greatly improving accuracy recovery for low bit-weight and activation quantization.
The newly added [`QuIPModifier`](../examples/transform/quip_example.py) and [`SpinQuantModifier`](../examples/transform/spinquant_example.py) allow you to quantize models after injecting hadamard weights into the computation graph, reducing quantization error and greatly improving accuracy recovery for low bit-weight and activation quantization.

!!! info "DeepSeekV3-style Block Quantization Support"
Allows for more efficient compression of large language models without needing a calibration dataset. Quantize a Qwen3 model to [W8A8](examples/quantization_w8a8_fp8.md).
Allows for more efficient compression of large language models without needing a calibration dataset. Quantize a Qwen3 model to [W8A8](../examples/quantization_w8a8_fp8/fp8_block_example.py).

!!! info "FP4 Quantization - now with MoE and non-uniform support"
Quantize weights and activations to FP4 and seamlessly run the compressed model in vLLM. Model weights and activations are quantized following the [NVFP4 configuration](https://github.com/neuralmagic/compressed-tensors/blob/f5dbfc336b9c9c361b9fe7ae085d5cb0673e56eb/src/compressed_tensors/quantization/quant_scheme.py#L104). See examples of [FP4 activation support](examples/quantization_w4a4_fp4/llama3_example.py), [MoE support](examples/quantization_w4a4_fp4/qwen_30b_a3b.py), and [Non-uniform quantization support](examples/quantization_non_uniform) where some layers are selectively quantized to FP8 for better recovery. You can also mix other quantization schemes, such as INT8 and INT4.
Quantize weights and activations to FP4 and seamlessly run the compressed model in vLLM. Model weights and activations are quantized following the [NVFP4 configuration](https://github.com/neuralmagic/compressed-tensors/blob/f5dbfc336b9c9c361b9fe7ae085d5cb0673e56eb/src/compressed_tensors/quantization/quant_scheme.py#L104). See examples of [FP4 activation support](../examples/quantization_w4a4_fp4/llama3_example.py), [MoE support](../examples/quantization_w4a4_fp4/qwen_30b_a3b.py), and [Non-uniform quantization support](../examples/quantization_non_uniform/README.md) where some layers are selectively quantized to FP8 for better recovery. You can also mix other quantization schemes, such as INT8 and INT4.

!!! info "Llama4 Quantization Support"
Quantize a Llama4 model to [W4A16](examples/quantization_w4a16.md) or [NVFP4](examples/quantization_w4a16.md). The checkpoint produced can seamlessly run in vLLM.
Quantize a Llama4 model to [W4A16](../examples/quantization_w4a16) or [NVFP4](../examples/quantization_w4a4_fp4/llama4_example.py). The checkpoint produced can seamlessly run in vLLM.

For more information, check out the [latest release on GitHub](https://github.com/vllm-project/llm-compressor/releases/latest).

Expand Down