Skip to content

Implement pre-commit configurations, add security policy, and update project metadata #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
default_language_version:
python: python3.10

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-added-large-files
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace

- repo: https://github.com/pycqa/isort
rev: 6.0.1
hooks:
- id: isort

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.12
hooks:
- id: ruff
args:
- --fix
- id: ruff-format
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Version 0.1.2 of **LinearBoost Classifier** is released. Here are the changes:
- Both SEFR and LinearBoostClassifier classes are refactored to fully adhere to Scikit-learn's conventions and API. Now, they are standard Scikit-learn estimators that can be used in Scikit-learn pipelines, grid search, etc.
- Added unit tests (using pytest) to ensure the estimators adhere to Scikit-learn conventions.
- Added fit_intercept parameter to SEFR similar to other linear estimators in Scikit-learn (e.g., LogisticRegression, LinearRegression, etc.).
- Removed random_state parameter from LinearBoostClassifier as it doesn't affect the result, since SEFR doesn't expose a random_state argument. According to Scikit-learn documentation for this parameter in AdaBoostClassifier:
- Removed random_state parameter from LinearBoostClassifier as it doesn't affect the result, since SEFR doesn't expose a random_state argument. According to Scikit-learn documentation for this parameter in AdaBoostClassifier:
> it is only used when estimator exposes a random_state.
- Added docstring to both SEFR and LinearBoostClassifier classes.
- Used uv for project and package management.
Expand All @@ -45,20 +45,20 @@ The documentation is available at https://linearboost.readthedocs.io/.

The following parameters yielded optimal results during testing. All results are based on 10-fold Cross-Validation:

- **`n_estimators`**:
- **`n_estimators`**:
A range of 10 to 200 is suggested, with higher values potentially improving performance at the cost of longer training times.

- **`learning_rate`**:
- **`learning_rate`**:
Values between 0.01 and 1 typically perform well. Adjust based on the dataset's complexity and noise.

- **`algorithm`**:
- **`algorithm`**:
Use either `SAMME` or `SAMME.R`. The choice depends on the specific problem:
- `SAMME`: May be better for datasets with clearer separations between classes.
- `SAMME.R`: Can handle more nuanced class probabilities.

**Note:** As of scikit-learn v1.6, the `algorithm` parameter is deprecated and will be removed in v1.8. LinearBoostClassifier will only implement the 'SAMME' algorithm in newer versions.

- **`scaler`**:
- **`scaler`**:
The following scaling methods are recommended based on dataset characteristics:
- `minmax`: Best for datasets where features are on different scales but bounded.
- `robust`: Effective for datasets with outliers.
Expand Down Expand Up @@ -200,10 +200,10 @@ params = {
LinearBoost's combination of **runtime efficiency** and **high accuracy** makes it a powerful choice for real-world machine learning tasks, particularly in resource-constrained or real-time applications.

### 📰 Featured in:
- [LightGBM Alternatives: A Comprehensive Comparison](https://nightwatcherai.com/blog/lightgbm-alternatives)
_by Jordan Cole, March 11, 2025_
- [LightGBM Alternatives: A Comprehensive Comparison](https://nightwatcherai.com/blog/lightgbm-alternatives)
_by Jordan Cole, March 11, 2025_
*Discusses how LinearBoost outperforms traditional boosting frameworks in terms of speed while maintaining accuracy.*


Future Developments
-----------------------------
Expand All @@ -224,7 +224,7 @@ This project is licensed under the terms of the MIT license. See [LICENSE](https

Some portions of this code are adapted from the scikit-learn project
(https://scikit-learn.org), which is licensed under the BSD 3-Clause License.
See the `licenses/` folder for details. The modifications and additions made to the original code are licensed under the MIT License © 2025 Hamidreza Keshavarz, Reza Rawassizadeh.
See the `licenses/` folder for details. The modifications and additions made to the original code are licensed under the MIT License © 2025 Hamidreza Keshavarz, Reza Rawassizadeh.
The original code from scikit-learn is available at [scikit-learn GitHub repository](https://github.com/scikit-learn/scikit-learn)

Special Thanks to:
Expand Down
19 changes: 19 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Security Policy

## Reporting a Vulnerability

If you think you found a vulnerability, and even if you are not sure about it, please report it right away by sending an email to: `hamid9 at outlook dot com`. Please try to be as explicit as possible, describing all the steps and example code to reproduce the security issue.

## Vulnerability Disclosures

Critical vulnerabilities will be disclosed via GitHub's [security advisory](https://github.com/LinearBoost/linearboost-classifier/security) system.

## Public Discussions

Please restrain from publicly discussing a potential security vulnerability.

It's better to discuss privately and try to find a solution first, to limit the potential impact as much as possible.

---

Thanks for your help!
7 changes: 5 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,16 @@ dependencies = [
[dependency-groups]
dev = [
"isort",
"pre-commit>=3.5.0",
"pytest>=7.0.0",
"ruff>=0.9.2",
]

[project.urls]
Homepage = "https://github.com/LinearBoost/linearboost-classifier"
Source = "https://github.com/LinearBoost/linearboost-classifier"
Documentation = "https://linearboost.readthedocs.io"
Repository = "https://github.com/LinearBoost/linearboost-classifier"
Issues = "https://github.com/LinearBoost/linearboost-classifier/issues"

[tool.hatch.version]
path = "src/linearboost/__init__.py"
Expand All @@ -66,4 +69,4 @@ line-length = 120
atomic = true
profile = "black"
skip_gitignore = true
known_first_party = ["black", "blib2to3", "blackd", "_black_version"]
known_first_party = ["linearboost"]
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
scikit-learn>=1.2.2
typing-extensions>=4.1.0; python_version < "3.11"
typing-extensions>=4.1.0; python_version < "3.11"
10 changes: 5 additions & 5 deletions src/linearboost/linear_boost.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ class LinearBoostClassifier(AdaBoostClassifier):
"""A LinearBoost classifier.

A LinearBoost classifier is a meta-estimator based on AdaBoost and SEFR.
It is a fast and accurate classification algorithm built to enhance the
It is a fast and accurate classification algorithm built to enhance the
performance of the linear classifier SEFR.

Parameters
Expand Down Expand Up @@ -107,7 +107,7 @@ class LinearBoostClassifier(AdaBoostClassifier):
class_weight : {"balanced", "balanced_subsample"}, dict or list of dicts, \
default=None
Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.
If not given, all classes are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
Expand All @@ -122,9 +122,9 @@ class LinearBoostClassifier(AdaBoostClassifier):

loss_function : callable, default=None
Custom loss function for optimization. Must follow the signature:

``loss_function(y_true, y_pred, sample_weight) -> float``

where:
- y_true: Ground truth (correct) target values.
- y_pred: Estimated target values.
Expand Down Expand Up @@ -160,7 +160,7 @@ class LinearBoostClassifier(AdaBoostClassifier):
estimator_errors_ : ndarray of floats
Classification error for each estimator in the boosted
ensemble.

n_features_in_ : int
Number of features seen during :term:`fit`.

Expand Down
Loading