Class BaseVectorizer

java.lang.Object
com.redis.vl.utils.vectorize.BaseVectorizer
Direct Known Subclasses:
LangChain4JVectorizer, MockVectorizer, SentenceTransformersVectorizer

public abstract class BaseVectorizer extends Object
Abstract base class for text vectorizers. Port of redis-vl-python/redisvl/utils/vectorize/base.py
  • Field Details

    • modelName

      protected final String modelName
      The name of the embedding model.
    • dtype

      protected final String dtype
      The data type for embeddings (e.g., "float32").
    • dimensions

      protected int dimensions
      The dimension of the embedding vectors.
    • cache

      protected Optional<EmbeddingsCache> cache
      Optional cache for storing embeddings.
  • Constructor Details

    • BaseVectorizer

      protected BaseVectorizer(String modelName, int dimensions)
      Creates a new BaseVectorizer.
      Parameters:
      modelName - The name of the embedding model
      dimensions - The dimension of the embedding vectors
    • BaseVectorizer

      protected BaseVectorizer(String modelName, int dimensions, String dtype)
      Creates a new BaseVectorizer with specified data type.
      Parameters:
      modelName - The name of the embedding model
      dimensions - The dimension of the embedding vectors (-1 for auto-detect)
      dtype - The data type for embeddings (default: "float32")
  • Method Details

    • getCache

      public Optional<EmbeddingsCache> getCache()
      Get the embeddings cache if present.
      Returns:
      Optional containing the cache, or empty if none set
    • setCache

      public void setCache(EmbeddingsCache cache)
      Set an embeddings cache for this vectorizer.
      Parameters:
      cache - The embeddings cache to use
    • getDataType

      public String getDataType()
      Get the vector data type.
      Returns:
      The data type (e.g. "float32")
    • getModelName

      public String getModelName()
      Get the model name.
      Returns:
      The model name
    • getDimensions

      public int getDimensions()
      Get the embedding dimensions.
      Returns:
      The number of dimensions
    • embed

      public float[] embed(String text)
      Embed a single text string.
      Parameters:
      text - The text to embed
      Returns:
      The embedding vector
    • embed

      public float[] embed(String text, Function<String,String> preprocess, boolean asBuffer, boolean skipCache)
      Embed a single text string with full options.
      Parameters:
      text - The text to embed
      preprocess - Optional preprocessing function
      asBuffer - Return as byte buffer (not implemented in Java version)
      skipCache - Skip cache lookup and storage
      Returns:
      The embedding vector
    • processEmbedding

      protected Object processEmbedding(float[] embedding, boolean asBuffer)
      Convert embedding to byte buffer if requested.
      Parameters:
      embedding - The embedding vector
      asBuffer - Whether to return as bytes
      Returns:
      The embedding as float array or byte array
    • embedBatch

      public List<float[]> embedBatch(List<String> texts)
      Embed multiple text strings in batch.
      Parameters:
      texts - The texts to embed
      Returns:
      List of embedding vectors
    • embedBatch

      public List<float[]> embedBatch(List<String> texts, Function<String,String> preprocess, int batchSize, boolean asBuffer, boolean skipCache)
      Embed multiple text strings with full options.
      Parameters:
      texts - List of texts to embed
      preprocess - Optional preprocessing function
      batchSize - Number of texts to process per batch
      asBuffer - Return as byte buffers (not implemented in Java)
      skipCache - Skip cache lookup and storage
      Returns:
      List of embedding vectors
    • generateEmbedding

      protected abstract float[] generateEmbedding(String text)
      Generate embedding for a single text (to be implemented by subclasses).
      Parameters:
      text - The text to embed
      Returns:
      The embedding vector
    • generateEmbeddingsBatch

      protected abstract List<float[]> generateEmbeddingsBatch(List<String> texts, int batchSize)
      Generate embeddings for multiple texts in batch (to be implemented by subclasses).
      Parameters:
      texts - The texts to embed
      batchSize - Number of texts to process per batch
      Returns:
      List of embedding vectors
    • getType

      public String getType()
      Get the vector type identifier.
      Returns:
      The type of vectorizer