SLEDgeH is a Python library for evaluating clustering results using a semantic-based approach. Unlike traditional distance-based metrics, this method leverages the semantic relationship between significant frequent patterns identified among cluster items. This internal validation technique is particularly effective for data organized in categorical form.
Install using git and pip install:
pip install git+https://github.com/aquinordg/sledgehammer.git
import pandas as pd
import numpy as np
from sledgehammer import sledgehammer_score, sledgehammer_score_clusters, semantic_descriptors
# Generate a random binary dataset
X = np.random.randint(0, 2, (100, 5))
# Specify the number of clusters
num_clusters = 3
# Perform K-Means clustering
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
labels = kmeans.fit_predict(X)
# Calculate the SLEDgeH score
average_score = sledgehammer_score(X, labels, aggregation='median')
print(f"Average SLEDgeH Score: {average_score}\n")
# Generate semantic descriptors
report = semantic_descriptors(X, labels, particular_threshold=0.5, report_form=True)
# Print cluster descriptors
for i in range(num_clusters):
print(f"Cluster {i}:\n{report[i]}\n")
sledgehammer_scoreComputes the average SLEDgeH score for all clusters.
X: Binary feature matrix of shape (n_samples, n_features).labels: Cluster labels for each sample.W: Weighting factors for the SLED indicators (default [0.3, 0.1, 0.5, 0.1]).particular_threshold: Threshold for descriptor particularization (None for no particularization).aggregation: Aggregation method ('harmonic', 'geometric', or 'median').score: Average SLEDgeH score.sledgehammer_score_clustersComputes the SLEDgeH score for individual clusters.
sledgehammer_score, with the addition of:
aggregation=None: If None, returns scores for each SLED indicator separately.scores: Aggregated SLEDgeH scores for each cluster.score_matrix: Individual SLED indicator scores if aggregation=None.semantic_descriptorsComputes semantic descriptors based on feature support in clusters.
X: Binary feature matrix of shape (n_samples, n_features).labels: Cluster labels for each sample.particular_threshold: Threshold for descriptor particularization.report_form: If True, returns descriptors as a sorted dictionary for each cluster.descriptors: Matrix with particularized feature support in clusters.report: Sorted dictionary of significant features in each cluster (if report_form=True).particularize_descriptorsParticularizes descriptors based on support thresholds.
descriptors: Feature support matrix of shape (n_clusters, n_features).particular_threshold: Threshold for particularization (default 1.0).descriptors: Matrix with particularized support values.This project is licensed under the MIT License. See the LICENSE file for details.
We welcome contributions to SLEDgeH! To contribute:
For questions or information, feel free to reach out at: aquinordga@gmail.com.
Feel free to open an issue or contact me for feedback or feature requests. Your input is highly appreciated!