Package com.redis.vl.utils.vectorize
Class BaseVectorizer
java.lang.Object
com.redis.vl.utils.vectorize.BaseVectorizer
- Direct Known Subclasses:
LangChain4JVectorizer,MockVectorizer,SentenceTransformersVectorizer
Abstract base class for text vectorizers. Port of redis-vl-python/redisvl/utils/vectorize/base.py
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classHelper class to hold batch cache results. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Optional<EmbeddingsCache> Optional cache for storing embeddings.protected intThe dimension of the embedding vectors.protected final StringThe data type for embeddings (e.g., "float32").protected final StringThe name of the embedding model. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedBaseVectorizer(String modelName, int dimensions) Creates a new BaseVectorizer.protectedBaseVectorizer(String modelName, int dimensions, String dtype) Creates a new BaseVectorizer with specified data type. -
Method Summary
Modifier and TypeMethodDescriptionfloat[]Embed a single text string.float[]Embed a single text string with full options.List<float[]> embedBatch(List<String> texts) Embed multiple text strings in batch.List<float[]> embedBatch(List<String> texts, Function<String, String> preprocess, int batchSize, boolean asBuffer, boolean skipCache) Embed multiple text strings with full options.protected abstract float[]generateEmbedding(String text) Generate embedding for a single text (to be implemented by subclasses).protected abstract List<float[]> generateEmbeddingsBatch(List<String> texts, int batchSize) Generate embeddings for multiple texts in batch (to be implemented by subclasses).getCache()Get the embeddings cache if present.Get the vector data type.intGet the embedding dimensions.Get the model name.getType()Get the vector type identifier.protected ObjectprocessEmbedding(float[] embedding, boolean asBuffer) Convert embedding to byte buffer if requested.voidsetCache(EmbeddingsCache cache) Set an embeddings cache for this vectorizer.
-
Field Details
-
modelName
The name of the embedding model. -
dtype
The data type for embeddings (e.g., "float32"). -
dimensions
protected int dimensionsThe dimension of the embedding vectors. -
cache
Optional cache for storing embeddings.
-
-
Constructor Details
-
BaseVectorizer
Creates a new BaseVectorizer.- Parameters:
modelName- The name of the embedding modeldimensions- The dimension of the embedding vectors
-
BaseVectorizer
Creates a new BaseVectorizer with specified data type.- Parameters:
modelName- The name of the embedding modeldimensions- The dimension of the embedding vectors (-1 for auto-detect)dtype- The data type for embeddings (default: "float32")
-
-
Method Details
-
getCache
Get the embeddings cache if present.- Returns:
- Optional containing the cache, or empty if none set
-
setCache
Set an embeddings cache for this vectorizer.- Parameters:
cache- The embeddings cache to use
-
getDataType
Get the vector data type.- Returns:
- The data type (e.g. "float32")
-
getModelName
Get the model name.- Returns:
- The model name
-
getDimensions
public int getDimensions()Get the embedding dimensions.- Returns:
- The number of dimensions
-
embed
Embed a single text string.- Parameters:
text- The text to embed- Returns:
- The embedding vector
-
embed
public float[] embed(String text, Function<String, String> preprocess, boolean asBuffer, boolean skipCache) Embed a single text string with full options.- Parameters:
text- The text to embedpreprocess- Optional preprocessing functionasBuffer- Return as byte buffer (not implemented in Java version)skipCache- Skip cache lookup and storage- Returns:
- The embedding vector
-
processEmbedding
Convert embedding to byte buffer if requested.- Parameters:
embedding- The embedding vectorasBuffer- Whether to return as bytes- Returns:
- The embedding as float array or byte array
-
embedBatch
Embed multiple text strings in batch.- Parameters:
texts- The texts to embed- Returns:
- List of embedding vectors
-
embedBatch
public List<float[]> embedBatch(List<String> texts, Function<String, String> preprocess, int batchSize, boolean asBuffer, boolean skipCache) Embed multiple text strings with full options.- Parameters:
texts- List of texts to embedpreprocess- Optional preprocessing functionbatchSize- Number of texts to process per batchasBuffer- Return as byte buffers (not implemented in Java)skipCache- Skip cache lookup and storage- Returns:
- List of embedding vectors
-
generateEmbedding
Generate embedding for a single text (to be implemented by subclasses).- Parameters:
text- The text to embed- Returns:
- The embedding vector
-
generateEmbeddingsBatch
Generate embeddings for multiple texts in batch (to be implemented by subclasses).- Parameters:
texts- The texts to embedbatchSize- Number of texts to process per batch- Returns:
- List of embedding vectors
-
getType
Get the vector type identifier.- Returns:
- The type of vectorizer
-