OpenAI's embedding models cannot embed text that exceeds a maximum length. The maximum length varies by model, and is measured by tokens, not string length. If you are unfamiliar with tokenization, check out How to count tokens with tiktoken.
This notebook shows how to handle texts that are longer than a model's maximum context length. We'll demonstrate using embeddings from text-embedding-ada-002
, but the same ideas can be applied to other models and tasks. To learn more about embeddings, check out the OpenAI Embeddings Guide.