UMAP¶
UMAP is a general purpose manifold learning and dimension reduction algorithm.
This is a GlobalStep
that reduces the dimensionality of the embeddings using. Visit
the TextClustering
step for an example of use. The trained model is saved as an artifact
when creating a distiset and pushing it to the Hugging Face Hub.
Attributes¶
- n_components: The dimension of the space to embed into. This defaults to 2 to provide easy visualization (that's probably what you want), but can reasonably be set to any integer value in the range 2 to 100. - metric: The metric to use to compute distances in high dimensional space. Visit UMAP's documentation for more information. Defaults to
euclidean
. - n_jobs: The number of parallel jobs to run. Defaults to8
. - random_state: The random state to use for the UMAP algorithm.
Runtime Parameters¶
-
n_components: The dimension of the space to embed into. This defaults to 2 to provide easy visualization (that's probably what you want), but can reasonably be set to any integer value in the range 2 to 100.
-
metric: The metric to use to compute distances in high dimensional space. Visit UMAP's documentation for more information. Defaults to
euclidean
. -
n_jobs: The number of parallel jobs to run. Defaults to
8
. -
random_state: The random state to use for the UMAP algorithm.
Input & Output Columns¶
Inputs¶
- embedding (
List[float]
): The original embeddings we want to reduce the dimension.
Outputs¶
- projection (
List[float]
): Embedding reduced to the number of components specified, the size of the new embeddings will be determined by then_components
.