Class SentenceTransformersVectorizer

java.lang.Object
com.redis.vl.utils.vectorize.BaseVectorizer
com.redis.vl.utils.vectorize.SentenceTransformersVectorizer

public class SentenceTransformersVectorizer extends BaseVectorizer
Vectorizer that uses Sentence Transformers models downloaded from HuggingFace. Models are downloaded and cached locally, then run using ONNX Runtime. This provides the same functionality as Python's sentence-transformers library.
  • Constructor Details

    • SentenceTransformersVectorizer

      public SentenceTransformersVectorizer(String modelName)
      Create a vectorizer with default cache directory.
      Parameters:
      modelName - Name of the HuggingFace model to use
    • SentenceTransformersVectorizer

      public SentenceTransformersVectorizer(String modelName, String cacheDir)
      Create a vectorizer with custom cache directory.
      Parameters:
      modelName - Name of the HuggingFace model to use
      cacheDir - Custom cache directory for model storage
  • Method Details

    • generateEmbedding

      protected float[] generateEmbedding(String text)
      Description copied from class: BaseVectorizer
      Generate embedding for a single text (to be implemented by subclasses).
      Specified by:
      generateEmbedding in class BaseVectorizer
      Parameters:
      text - The text to embed
      Returns:
      The embedding vector
    • generateEmbeddingsBatch

      protected List<float[]> generateEmbeddingsBatch(List<String> texts, int batchSize)
      Description copied from class: BaseVectorizer
      Generate embeddings for multiple texts in batch (to be implemented by subclasses).
      Specified by:
      generateEmbeddingsBatch in class BaseVectorizer
      Parameters:
      texts - The texts to embed
      batchSize - Number of texts to process per batch
      Returns:
      List of embedding vectors
    • embedBatchAsLists

      public List<List<Float>> embedBatchAsLists(List<String> texts)
      Generate embeddings for a batch of texts with default batch size. Returns List of List of Float for convenience.
      Parameters:
      texts - List of texts to embed
      Returns:
      List of embeddings as lists of floats
    • embedSentences

      public List<float[]> embedSentences(List<String> sentences)
      Embed multiple sentences for clustering/selection. Useful for extractive summarization where we need to compare sentence similarities.
      Parameters:
      sentences - List of sentences to embed
      Returns:
      List of embedding vectors (float arrays)
    • close

      public void close()
      Close the vectorizer and clean up resources