Class RealtimeTruncation

  • All Implemented Interfaces:

    
    public final class RealtimeTruncation
    
                        

    When the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs.

    Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost.

    Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate.

    Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.

    • Constructor Detail

    • Method Detail

      • retentionRatio

         final Optional<RealtimeTruncationRetentionRatio> retentionRatio()

        Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

      • asRetentionRatio

         final RealtimeTruncationRetentionRatio asRetentionRatio()

        Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.

      • accept

         final <T extends Any> T accept(RealtimeTruncation.Visitor<T> visitor)

        Maps this instance's current variant to a value of type T using the given visitor.

        Note that this method is not forwards compatible with new variants from the API, unless visitor overrides Visitor.unknown. To handle variants not known to this version of the SDK gracefully, consider overriding Visitor.unknown:

        import com.openai.core.JsonValue;
        import java.util.Optional;
        
        Optional<String> result = realtimeTruncation.accept(new RealtimeTruncation.Visitor<Optional<String>>() {
            @Override
            public Optional<String> visitStrategy(RealtimeTruncationStrategy strategy) {
                return Optional.of(strategy.toString());
            }
        
            // ...
        
            @Override
            public Optional<String> unknown(JsonValue json) {
                // Or inspect the `json`.
                return Optional.empty();
            }
        });
      • validate

         final RealtimeTruncation validate()

        Validates that the types of all values in this object match their expected types recursively.

        This method is not forwards compatible with new types from the API for existing fields.