Package com.redis.vl.utils.vectorize
Class SentenceTransformersVectorizer
java.lang.Object
com.redis.vl.utils.vectorize.BaseVectorizer
com.redis.vl.utils.vectorize.SentenceTransformersVectorizer
Vectorizer that uses Sentence Transformers models downloaded from HuggingFace. Models are
downloaded and cached locally, then run using ONNX Runtime. This provides the same functionality
as Python's sentence-transformers library.
-
Nested Class Summary
Nested classes/interfaces inherited from class com.redis.vl.utils.vectorize.BaseVectorizer
BaseVectorizer.BatchCacheResult -
Field Summary
Fields inherited from class com.redis.vl.utils.vectorize.BaseVectorizer
cache, dimensions, dtype, modelName -
Constructor Summary
ConstructorsConstructorDescriptionSentenceTransformersVectorizer(String modelName) Create a vectorizer with default cache directory.SentenceTransformersVectorizer(String modelName, String cacheDir) Create a vectorizer with custom cache directory. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Close the vectorizer and clean up resourcesembedBatchAsLists(List<String> texts) Generate embeddings for a batch of texts with default batch size.List<float[]> embedSentences(List<String> sentences) Embed multiple sentences for clustering/selection.protected float[]generateEmbedding(String text) Generate embedding for a single text (to be implemented by subclasses).protected List<float[]> generateEmbeddingsBatch(List<String> texts, int batchSize) Generate embeddings for multiple texts in batch (to be implemented by subclasses).Methods inherited from class com.redis.vl.utils.vectorize.BaseVectorizer
embed, embed, embedBatch, embedBatch, getCache, getDataType, getDimensions, getModelName, getType, processEmbedding, setCache
-
Constructor Details
-
SentenceTransformersVectorizer
Create a vectorizer with default cache directory.- Parameters:
modelName- Name of the HuggingFace model to use
-
SentenceTransformersVectorizer
Create a vectorizer with custom cache directory.- Parameters:
modelName- Name of the HuggingFace model to usecacheDir- Custom cache directory for model storage
-
-
Method Details
-
generateEmbedding
Description copied from class:BaseVectorizerGenerate embedding for a single text (to be implemented by subclasses).- Specified by:
generateEmbeddingin classBaseVectorizer- Parameters:
text- The text to embed- Returns:
- The embedding vector
-
generateEmbeddingsBatch
Description copied from class:BaseVectorizerGenerate embeddings for multiple texts in batch (to be implemented by subclasses).- Specified by:
generateEmbeddingsBatchin classBaseVectorizer- Parameters:
texts- The texts to embedbatchSize- Number of texts to process per batch- Returns:
- List of embedding vectors
-
embedBatchAsLists
Generate embeddings for a batch of texts with default batch size. Returns List of List of Float for convenience.- Parameters:
texts- List of texts to embed- Returns:
- List of embeddings as lists of floats
-
embedSentences
Embed multiple sentences for clustering/selection. Useful for extractive summarization where we need to compare sentence similarities.- Parameters:
sentences- List of sentences to embed- Returns:
- List of embedding vectors (float arrays)
-
close
public void close()Close the vectorizer and clean up resources
-