Feature or Documentation Request: Frequency weighting k-medoids

Hi,
If one wants to implement the sklearn's k-medoid algorithm on a dataset with many duplicate datapoints without removing those points (I'll give an example below) it runs inefficiently, as a n**2 np multiplication is done to evaluate the distances between all points for every point being evaluated as a medoid ([Code reference](https://github.com/scikit-learn-contrib/scikit-learn-extra/blob/0f95d8dda4c69f9de4fb002366041adcb1302f3b/sklearn_extra/cluster/_k_medoids.py#L319)).
A way around that would be to eliminate duplicates beforehand, but retain a frequency matrix of their occurrences for altering in_cluster_distances before summing.
For example, if one had many customers with non-differentiated profiles, they might want that to influence the decision of the center and the spread of the cluster containing them. But there should be no reason to cycle through all identical customers to calculate the distance matrix at every step.

I could not find a way to do this in sklearn. If there is one, then please let me know and add it to the docs, and I can post it on StackExchange and other places where I've seen people ask about it. If there isn't a way, then please add it to the codebase so people don't have to code up a bug-prone wrapper.
[On StackExchange](https://datascience.stackexchange.com/questions/82658/k-medoid-clustering-with-point-weights), a responder pointed to an R package that had such an option: https://rdrr.io/cran/WeightedCluster/man/kmedoids.html (I'm not an R coder, so I can't verify)
(I would love to help, but as I started a new, intense job, it will probably be a while until I can contribute to open-source again.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature or Documentation Request: Frequency weighting k-medoids #179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature or Documentation Request: Frequency weighting k-medoids #179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions