Huggingface : Text Clustering

Description

This summary was drafted with mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf

The Text Clustering repository by HuggingFace serves as a minimal codebase for clustering texts. It contains a pipeline that uses existing standard methods such as Sentence Transformers, UMAP, Faiss, Plotly, Matplotlib, and Scikit-learn. The pipeline consists of several distinct blocks that can be customized and run in a few minutes on a consumer laptop. This repository also provides an example of clustering and topic labeling applied to the AutoMathText dataset, utilizing Cosmopedia's web labeling approach.


GitHub repo here
Link
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more