How to stream completions

OpenAI Logo
Ted Sanders
Sep 2, 2022
Open in Github

By default, when you request a completion from the OpenAI, the entire completion is generated before being sent back in a single response.

If you're generating long completions, waiting for the response can take many seconds.

To get responses sooner, you can 'stream' the completion as it's being generated. This allows you to start printing or processing the beginning of the completion before the full completion is finished.

To stream completions, set stream=True when calling the chat completions or completions endpoints. This will return an object that streams back the response as data-only server-sent events. Extract chunks from the delta field rather than the message field.

Downsides

Note that using stream=True in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. This may have implications for approved usage.

Another small drawback of streaming responses is that the response no longer includes the usage field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using tiktoken.

Example code

Below, this notebook shows:

  1. What a typical chat completion response looks like
  2. What a streaming chat completion response looks like
  3. How much time is saved by streaming a chat completion
  4. How to stream non-chat completions (used by older models like text-davinci-003)
# imports
import openai  # for OpenAI API calls
import time  # for measuring time duration of API calls
# Example of an OpenAI ChatCompletion request
# https://platform.openai.com/docs/guides/chat

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
)

# calculate the time it took to receive the response
response_time = time.time() - start_time

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")
print(f"Full response received:\n{response}")
Full response received 3.03 seconds after request
Full response received:
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "\n\n1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.",
        "role": "assistant"
      }
    }
  ],
  "created": 1677825456,
  "id": "chatcmpl-6ptKqrhgRoVchm58Bby0UvJzq2ZuQ",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 301,
    "prompt_tokens": 36,
    "total_tokens": 337
  }
}

The reply can be extracted with response['choices'][0]['message'].

The content of the reply can be extracted with response['choices'][0]['message']['content'].

reply = response['choices'][0]['message']
print(f"Extracted reply: \n{reply}")

reply_content = response['choices'][0]['message']['content']
print(f"Extracted content: \n{reply_content}")
Extracted reply: 
{
  "content": "\n\n1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.",
  "role": "assistant"
}
Extracted content: 


1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.
# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/guides/chat

# a ChatCompletion request
response = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=[
        {'role': 'user', 'content': "What's 1+1? Answer in one word."}
    ],
    temperature=0,
    stream=True  # this time, we set stream=True
)

for chunk in response:
    print(chunk)
{
  "choices": [
    {
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1677825464,
  "id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}
{
  "choices": [
    {
      "delta": {
        "content": "\n\n"
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1677825464,
  "id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}
{
  "choices": [
    {
      "delta": {
        "content": "2"
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1677825464,
  "id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}
{
  "choices": [
    {
      "delta": {},
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "created": 1677825464,
  "id": "chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}

As you can see above, streaming responses have a delta field rather than a message field. delta can hold things like:

  • a role token (e.g., {"role": "assistant"})
  • a content token (e.g., {"content": "\n\n"})
  • nothing (e.g., {}), when the stream is over
# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/guides/chat

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
    stream=True  # again, we set stream=True
)

# create variables to collect the stream of chunks
collected_chunks = []
collected_messages = []
# iterate through the stream of events
for chunk in response:
    chunk_time = time.time() - start_time  # calculate the time delay of the chunk
    collected_chunks.append(chunk)  # save the event response
    chunk_message = chunk['choices'][0]['delta']  # extract the message
    collected_messages.append(chunk_message)  # save the message
    print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")  # print the delay and text

# print the time delay and text received
print(f"Full response received {chunk_time:.2f} seconds after request")
full_reply_content = ''.join([m.get('content', '') for m in collected_messages])
print(f"Full conversation received: {full_reply_content}")
Message received 0.10 seconds after request: {
  "role": "assistant"
}
Message received 0.10 seconds after request: {
  "content": "\n\n"
}
Message received 0.10 seconds after request: {
  "content": "1"
}
Message received 0.11 seconds after request: {
  "content": ","
}
Message received 0.12 seconds after request: {
  "content": " "
}
Message received 0.13 seconds after request: {
  "content": "2"
}
Message received 0.14 seconds after request: {
  "content": ","
}
Message received 0.15 seconds after request: {
  "content": " "
}
Message received 0.16 seconds after request: {
  "content": "3"
}
Message received 0.17 seconds after request: {
  "content": ","
}
Message received 0.18 seconds after request: {
  "content": " "
}
Message received 0.19 seconds after request: {
  "content": "4"
}
Message received 0.20 seconds after request: {
  "content": ","
}
Message received 0.21 seconds after request: {
  "content": " "
}
Message received 0.22 seconds after request: {
  "content": "5"
}
Message received 0.23 seconds after request: {
  "content": ","
}
Message received 0.24 seconds after request: {
  "content": " "
}
Message received 0.25 seconds after request: {
  "content": "6"
}
Message received 0.26 seconds after request: {
  "content": ","
}
Message received 0.27 seconds after request: {
  "content": " "
}
Message received 0.28 seconds after request: {
  "content": "7"
}
Message received 0.29 seconds after request: {
  "content": ","
}
Message received 0.30 seconds after request: {
  "content": " "
}
Message received 0.30 seconds after request: {
  "content": "8"
}
Message received 0.31 seconds after request: {
  "content": ","
}
Message received 0.32 seconds after request: {
  "content": " "
}
Message received 0.33 seconds after request: {
  "content": "9"
}
Message received 0.34 seconds after request: {
  "content": ","
}
Message received 0.35 seconds after request: {
  "content": " "
}
Message received 0.37 seconds after request: {
  "content": "10"
}
Message received 0.40 seconds after request: {
  "content": ","
}
Message received 0.43 seconds after request: {
  "content": " "
}
Message received 0.43 seconds after request: {
  "content": "11"
}
Message received 0.43 seconds after request: {
  "content": ","
}
Message received 0.43 seconds after request: {
  "content": " "
}
Message received 0.43 seconds after request: {
  "content": "12"
}
Message received 0.43 seconds after request: {
  "content": ","
}
Message received 0.44 seconds after request: {
  "content": " "
}
Message received 0.45 seconds after request: {
  "content": "13"
}
Message received 0.46 seconds after request: {
  "content": ","
}
Message received 0.47 seconds after request: {
  "content": " "
}
Message received 0.48 seconds after request: {
  "content": "14"
}
Message received 0.49 seconds after request: {
  "content": ","
}
Message received 0.50 seconds after request: {
  "content": " "
}
Message received 0.51 seconds after request: {
  "content": "15"
}
Message received 0.52 seconds after request: {
  "content": ","
}
Message received 0.53 seconds after request: {
  "content": " "
}
Message received 0.53 seconds after request: {
  "content": "16"
}
Message received 0.55 seconds after request: {
  "content": ","
}
Message received 0.55 seconds after request: {
  "content": " "
}
Message received 0.56 seconds after request: {
  "content": "17"
}
Message received 0.57 seconds after request: {
  "content": ","
}
Message received 0.58 seconds after request: {
  "content": " "
}
Message received 0.59 seconds after request: {
  "content": "18"
}
Message received 0.60 seconds after request: {
  "content": ","
}
Message received 0.61 seconds after request: {
  "content": " "
}
Message received 0.62 seconds after request: {
  "content": "19"
}
Message received 0.63 seconds after request: {
  "content": ","
}
Message received 0.64 seconds after request: {
  "content": " "
}
Message received 0.65 seconds after request: {
  "content": "20"
}
Message received 0.66 seconds after request: {
  "content": ","
}
Message received 0.67 seconds after request: {
  "content": " "
}
Message received 0.68 seconds after request: {
  "content": "21"
}
Message received 0.69 seconds after request: {
  "content": ","
}
Message received 0.70 seconds after request: {
  "content": " "
}
Message received 0.71 seconds after request: {
  "content": "22"
}
Message received 0.72 seconds after request: {
  "content": ","
}
Message received 0.73 seconds after request: {
  "content": " "
}
Message received 0.74 seconds after request: {
  "content": "23"
}
Message received 0.75 seconds after request: {
  "content": ","
}
Message received 0.75 seconds after request: {
  "content": " "
}
Message received 0.76 seconds after request: {
  "content": "24"
}
Message received 0.79 seconds after request: {
  "content": ","
}
Message received 0.79 seconds after request: {
  "content": " "
}
Message received 0.79 seconds after request: {
  "content": "25"
}
Message received 0.80 seconds after request: {
  "content": ","
}
Message received 0.81 seconds after request: {
  "content": " "
}
Message received 0.82 seconds after request: {
  "content": "26"
}
Message received 0.83 seconds after request: {
  "content": ","
}
Message received 0.84 seconds after request: {
  "content": " "
}
Message received 0.85 seconds after request: {
  "content": "27"
}
Message received 0.86 seconds after request: {
  "content": ","
}
Message received 0.87 seconds after request: {
  "content": " "
}
Message received 0.88 seconds after request: {
  "content": "28"
}
Message received 0.89 seconds after request: {
  "content": ","
}
Message received 0.90 seconds after request: {
  "content": " "
}
Message received 0.92 seconds after request: {
  "content": "29"
}
Message received 0.92 seconds after request: {
  "content": ","
}
Message received 0.93 seconds after request: {
  "content": " "
}
Message received 0.94 seconds after request: {
  "content": "30"
}
Message received 0.95 seconds after request: {
  "content": ","
}
Message received 0.96 seconds after request: {
  "content": " "
}
Message received 0.97 seconds after request: {
  "content": "31"
}
Message received 0.98 seconds after request: {
  "content": ","
}
Message received 0.99 seconds after request: {
  "content": " "
}
Message received 1.00 seconds after request: {
  "content": "32"
}
Message received 1.01 seconds after request: {
  "content": ","
}
Message received 1.02 seconds after request: {
  "content": " "
}
Message received 1.03 seconds after request: {
  "content": "33"
}
Message received 1.04 seconds after request: {
  "content": ","
}
Message received 1.05 seconds after request: {
  "content": " "
}
Message received 1.06 seconds after request: {
  "content": "34"
}
Message received 1.07 seconds after request: {
  "content": ","
}
Message received 1.08 seconds after request: {
  "content": " "
}
Message received 1.09 seconds after request: {
  "content": "35"
}
Message received 1.10 seconds after request: {
  "content": ","
}
Message received 1.11 seconds after request: {
  "content": " "
}
Message received 1.12 seconds after request: {
  "content": "36"
}
Message received 1.13 seconds after request: {
  "content": ","
}
Message received 1.13 seconds after request: {
  "content": " "
}
Message received 1.14 seconds after request: {
  "content": "37"
}
Message received 1.15 seconds after request: {
  "content": ","
}
Message received 1.17 seconds after request: {
  "content": " "
}
Message received 1.18 seconds after request: {
  "content": "38"
}
Message received 1.19 seconds after request: {
  "content": ","
}
Message received 1.19 seconds after request: {
  "content": " "
}
Message received 1.20 seconds after request: {
  "content": "39"
}
Message received 1.21 seconds after request: {
  "content": ","
}
Message received 1.22 seconds after request: {
  "content": " "
}
Message received 1.23 seconds after request: {
  "content": "40"
}
Message received 1.24 seconds after request: {
  "content": ","
}
Message received 1.25 seconds after request: {
  "content": " "
}
Message received 1.26 seconds after request: {
  "content": "41"
}
Message received 1.27 seconds after request: {
  "content": ","
}
Message received 1.28 seconds after request: {
  "content": " "
}
Message received 1.29 seconds after request: {
  "content": "42"
}
Message received 1.30 seconds after request: {
  "content": ","
}
Message received 1.31 seconds after request: {
  "content": " "
}
Message received 1.32 seconds after request: {
  "content": "43"
}
Message received 1.33 seconds after request: {
  "content": ","
}
Message received 1.34 seconds after request: {
  "content": " "
}
Message received 1.37 seconds after request: {
  "content": "44"
}
Message received 1.37 seconds after request: {
  "content": ","
}
Message received 1.37 seconds after request: {
  "content": " "
}
Message received 1.37 seconds after request: {
  "content": "45"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "46"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "47"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "48"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "49"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "50"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "51"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "52"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "53"
}
Message received 2.10 seconds after request: {
  "content": ","
}
Message received 2.10 seconds after request: {
  "content": " "
}
Message received 2.10 seconds after request: {
  "content": "54"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "55"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "56"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "57"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "58"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "59"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "60"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "61"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.24 seconds after request: {
  "content": " "
}
Message received 2.24 seconds after request: {
  "content": "62"
}
Message received 2.24 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "63"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "64"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "65"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "66"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "67"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "68"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "69"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "70"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "71"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "72"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "73"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.25 seconds after request: {
  "content": "74"
}
Message received 2.25 seconds after request: {
  "content": ","
}
Message received 2.25 seconds after request: {
  "content": " "
}
Message received 2.26 seconds after request: {
  "content": "75"
}
Message received 2.26 seconds after request: {
  "content": ","
}
Message received 2.26 seconds after request: {
  "content": " "
}
Message received 2.27 seconds after request: {
  "content": "76"
}
Message received 2.28 seconds after request: {
  "content": ","
}
Message received 2.29 seconds after request: {
  "content": " "
}
Message received 2.29 seconds after request: {
  "content": "77"
}
Message received 2.31 seconds after request: {
  "content": ","
}
Message received 2.32 seconds after request: {
  "content": " "
}
Message received 2.32 seconds after request: {
  "content": "78"
}
Message received 2.33 seconds after request: {
  "content": ","
}
Message received 2.35 seconds after request: {
  "content": " "
}
Message received 2.35 seconds after request: {
  "content": "79"
}
Message received 2.36 seconds after request: {
  "content": ","
}
Message received 2.37 seconds after request: {
  "content": " "
}
Message received 2.38 seconds after request: {
  "content": "80"
}
Message received 2.39 seconds after request: {
  "content": ","
}
Message received 2.40 seconds after request: {
  "content": " "
}
Message received 2.41 seconds after request: {
  "content": "81"
}
Message received 2.42 seconds after request: {
  "content": ","
}
Message received 2.43 seconds after request: {
  "content": " "
}
Message received 2.44 seconds after request: {
  "content": "82"
}
Message received 2.45 seconds after request: {
  "content": ","
}
Message received 2.46 seconds after request: {
  "content": " "
}
Message received 2.47 seconds after request: {
  "content": "83"
}
Message received 2.48 seconds after request: {
  "content": ","
}
Message received 2.49 seconds after request: {
  "content": " "
}
Message received 2.50 seconds after request: {
  "content": "84"
}
Message received 2.51 seconds after request: {
  "content": ","
}
Message received 2.52 seconds after request: {
  "content": " "
}
Message received 2.53 seconds after request: {
  "content": "85"
}
Message received 2.54 seconds after request: {
  "content": ","
}
Message received 2.55 seconds after request: {
  "content": " "
}
Message received 2.56 seconds after request: {
  "content": "86"
}
Message received 2.57 seconds after request: {
  "content": ","
}
Message received 2.58 seconds after request: {
  "content": " "
}
Message received 2.59 seconds after request: {
  "content": "87"
}
Message received 2.60 seconds after request: {
  "content": ","
}
Message received 2.60 seconds after request: {
  "content": " "
}
Message received 2.62 seconds after request: {
  "content": "88"
}
Message received 2.63 seconds after request: {
  "content": ","
}
Message received 2.63 seconds after request: {
  "content": " "
}
Message received 2.64 seconds after request: {
  "content": "89"
}
Message received 2.66 seconds after request: {
  "content": ","
}
Message received 2.66 seconds after request: {
  "content": " "
}
Message received 2.68 seconds after request: {
  "content": "90"
}
Message received 2.68 seconds after request: {
  "content": ","
}
Message received 2.69 seconds after request: {
  "content": " "
}
Message received 2.70 seconds after request: {
  "content": "91"
}
Message received 2.71 seconds after request: {
  "content": ","
}
Message received 2.72 seconds after request: {
  "content": " "
}
Message received 2.73 seconds after request: {
  "content": "92"
}
Message received 2.74 seconds after request: {
  "content": ","
}
Message received 2.75 seconds after request: {
  "content": " "
}
Message received 2.76 seconds after request: {
  "content": "93"
}
Message received 2.77 seconds after request: {
  "content": ","
}
Message received 2.78 seconds after request: {
  "content": " "
}
Message received 2.79 seconds after request: {
  "content": "94"
}
Message received 2.80 seconds after request: {
  "content": ","
}
Message received 2.81 seconds after request: {
  "content": " "
}
Message received 2.81 seconds after request: {
  "content": "95"
}
Message received 2.82 seconds after request: {
  "content": ","
}
Message received 2.83 seconds after request: {
  "content": " "
}
Message received 2.84 seconds after request: {
  "content": "96"
}
Message received 2.85 seconds after request: {
  "content": ","
}
Message received 2.86 seconds after request: {
  "content": " "
}
Message received 2.87 seconds after request: {
  "content": "97"
}
Message received 2.88 seconds after request: {
  "content": ","
}
Message received 2.88 seconds after request: {
  "content": " "
}
Message received 2.89 seconds after request: {
  "content": "98"
}
Message received 2.90 seconds after request: {
  "content": ","
}
Message received 2.91 seconds after request: {
  "content": " "
}
Message received 2.92 seconds after request: {
  "content": "99"
}
Message received 2.93 seconds after request: {
  "content": ","
}
Message received 2.93 seconds after request: {
  "content": " "
}
Message received 2.94 seconds after request: {
  "content": "100"
}
Message received 2.95 seconds after request: {
  "content": "."
}
Message received 2.97 seconds after request: {}
Full response received 2.97 seconds after request
Full conversation received: 

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.

Time comparison

In the example above, both requests took about 3 seconds to fully complete. Request times will vary depending on load and other stochastic factors.

However, with the streaming request, we received the first token after 0.1 seconds, and subsequent tokens every ~0.01-0.02 seconds.

# Example of an OpenAI Completion request
# https://beta.openai.com/docs/api-reference/completions/create

# record the time before the request is sent
start_time = time.time()

# send a Completion request to count to 100
response = openai.Completion.create(
    model='text-davinci-002',
    prompt='1,2,3,',
    max_tokens=193,
    temperature=0,
)

# calculate the time it took to receive the response
response_time = time.time() - start_time

# extract the text from the response
completion_text = response['choices'][0]['text']

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")
print(f"Full text received: {completion_text}")
Full response received 3.43 seconds after request
Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100

A streaming completion request

With a streaming Completions API call, the text is sent back via a series of events. In Python, you can iterate over these events with a for loop.

# Example of an OpenAI Completion request, using the stream=True option
# https://beta.openai.com/docs/api-reference/completions/create

# record the time before the request is sent
start_time = time.time()

# send a Completion request to count to 100
response = openai.Completion.create(
    model='text-davinci-002',
    prompt='1,2,3,',
    max_tokens=193,
    temperature=0,
    stream=True,  # this time, we set stream=True
)

# create variables to collect the stream of events
collected_events = []
completion_text = ''
# iterate through the stream of events
for event in response:
    event_time = time.time() - start_time  # calculate the time delay of the event
    collected_events.append(event)  # save the event response
    event_text = event['choices'][0]['text']  # extract the text
    completion_text += event_text  # append the text
    print(f"Text received: {event_text} ({event_time:.2f} seconds after request)")  # print the delay and text

# print the time delay and text received
print(f"Full response received {event_time:.2f} seconds after request")
print(f"Full text received: {completion_text}")
Text received: 4 (0.18 seconds after request)
Text received: , (0.19 seconds after request)
Text received: 5 (0.21 seconds after request)
Text received: , (0.23 seconds after request)
Text received: 6 (0.25 seconds after request)
Text received: , (0.26 seconds after request)
Text received: 7 (0.28 seconds after request)
Text received: , (0.29 seconds after request)
Text received: 8 (0.31 seconds after request)
Text received: , (0.33 seconds after request)
Text received: 9 (0.35 seconds after request)
Text received: , (0.36 seconds after request)
Text received: 10 (0.38 seconds after request)
Text received: , (0.39 seconds after request)
Text received: 11 (0.41 seconds after request)
Text received: , (0.42 seconds after request)
Text received: 12 (0.44 seconds after request)
Text received: , (0.45 seconds after request)
Text received: 13 (0.47 seconds after request)
Text received: , (0.48 seconds after request)
Text received: 14 (0.50 seconds after request)
Text received: , (0.51 seconds after request)
Text received: 15 (0.53 seconds after request)
Text received: , (0.54 seconds after request)
Text received: 16 (0.56 seconds after request)
Text received: , (0.57 seconds after request)
Text received: 17 (0.59 seconds after request)
Text received: , (0.62 seconds after request)
Text received: 18 (0.62 seconds after request)
Text received: , (0.63 seconds after request)
Text received: 19 (0.64 seconds after request)
Text received: , (0.66 seconds after request)
Text received: 20 (0.67 seconds after request)
Text received: , (0.68 seconds after request)
Text received: 21 (0.70 seconds after request)
Text received: , (0.71 seconds after request)
Text received: 22 (0.73 seconds after request)
Text received: , (0.74 seconds after request)
Text received: 23 (0.76 seconds after request)
Text received: , (0.77 seconds after request)
Text received: 24 (0.78 seconds after request)
Text received: , (0.80 seconds after request)
Text received: 25 (0.81 seconds after request)
Text received: , (0.82 seconds after request)
Text received: 26 (0.84 seconds after request)
Text received: , (0.85 seconds after request)
Text received: 27 (0.89 seconds after request)
Text received: , (0.90 seconds after request)
Text received: 28 (0.90 seconds after request)
Text received: , (0.91 seconds after request)
Text received: 29 (0.92 seconds after request)
Text received: , (0.94 seconds after request)
Text received: 30 (0.95 seconds after request)
Text received: , (0.96 seconds after request)
Text received: 31 (0.97 seconds after request)
Text received: , (0.99 seconds after request)
Text received: 32 (1.00 seconds after request)
Text received: , (1.01 seconds after request)
Text received: 33 (1.03 seconds after request)
Text received: , (1.04 seconds after request)
Text received: 34 (1.05 seconds after request)
Text received: , (1.07 seconds after request)
Text received: 35 (1.08 seconds after request)
Text received: , (1.10 seconds after request)
Text received: 36 (1.11 seconds after request)
Text received: , (1.12 seconds after request)
Text received: 37 (1.13 seconds after request)
Text received: , (1.15 seconds after request)
Text received: 38 (1.16 seconds after request)
Text received: , (1.18 seconds after request)
Text received: 39 (1.19 seconds after request)
Text received: , (1.20 seconds after request)
Text received: 40 (1.22 seconds after request)
Text received: , (1.24 seconds after request)
Text received: 41 (1.25 seconds after request)
Text received: , (1.26 seconds after request)
Text received: 42 (1.27 seconds after request)
Text received: , (1.29 seconds after request)
Text received: 43 (1.30 seconds after request)
Text received: , (1.31 seconds after request)
Text received: 44 (1.32 seconds after request)
Text received: , (1.34 seconds after request)
Text received: 45 (1.35 seconds after request)
Text received: , (1.36 seconds after request)
Text received: 46 (1.38 seconds after request)
Text received: , (1.39 seconds after request)
Text received: 47 (1.40 seconds after request)
Text received: , (1.42 seconds after request)
Text received: 48 (1.43 seconds after request)
Text received: , (1.45 seconds after request)
Text received: 49 (1.47 seconds after request)
Text received: , (1.47 seconds after request)
Text received: 50 (1.49 seconds after request)
Text received: , (1.50 seconds after request)
Text received: 51 (1.51 seconds after request)
Text received: , (1.53 seconds after request)
Text received: 52 (1.54 seconds after request)
Text received: , (1.55 seconds after request)
Text received: 53 (1.57 seconds after request)
Text received: , (1.58 seconds after request)
Text received: 54 (1.59 seconds after request)
Text received: , (1.61 seconds after request)
Text received: 55 (1.62 seconds after request)
Text received: , (1.64 seconds after request)
Text received: 56 (1.65 seconds after request)
Text received: , (1.66 seconds after request)
Text received: 57 (1.69 seconds after request)
Text received: , (1.69 seconds after request)
Text received: 58 (1.70 seconds after request)
Text received: , (1.72 seconds after request)
Text received: 59 (1.73 seconds after request)
Text received: , (1.74 seconds after request)
Text received: 60 (1.76 seconds after request)
Text received: , (1.77 seconds after request)
Text received: 61 (1.78 seconds after request)
Text received: , (1.80 seconds after request)
Text received: 62 (1.81 seconds after request)
Text received: , (1.83 seconds after request)
Text received: 63 (1.84 seconds after request)
Text received: , (1.85 seconds after request)
Text received: 64 (1.86 seconds after request)
Text received: , (1.88 seconds after request)
Text received: 65 (1.89 seconds after request)
Text received: , (1.90 seconds after request)
Text received: 66 (1.92 seconds after request)
Text received: , (1.93 seconds after request)
Text received: 67 (1.95 seconds after request)
Text received: , (1.96 seconds after request)
Text received: 68 (1.99 seconds after request)
Text received: , (1.99 seconds after request)
Text received: 69 (2.00 seconds after request)
Text received: , (2.01 seconds after request)
Text received: 70 (2.03 seconds after request)
Text received: , (2.04 seconds after request)
Text received: 71 (2.05 seconds after request)
Text received: , (2.07 seconds after request)
Text received: 72 (2.08 seconds after request)
Text received: , (2.09 seconds after request)
Text received: 73 (2.11 seconds after request)
Text received: , (2.12 seconds after request)
Text received: 74 (2.13 seconds after request)
Text received: , (2.15 seconds after request)
Text received: 75 (2.16 seconds after request)
Text received: , (2.17 seconds after request)
Text received: 76 (2.18 seconds after request)
Text received: , (2.20 seconds after request)
Text received: 77 (2.22 seconds after request)
Text received: , (2.23 seconds after request)
Text received: 78 (2.24 seconds after request)
Text received: , (2.25 seconds after request)
Text received: 79 (2.26 seconds after request)
Text received: , (2.28 seconds after request)
Text received: 80 (2.28 seconds after request)
Text received: , (2.29 seconds after request)
Text received: 81 (2.30 seconds after request)
Text received: , (2.31 seconds after request)
Text received: 82 (2.33 seconds after request)
Text received: , (2.34 seconds after request)
Text received: 83 (2.35 seconds after request)
Text received: , (2.36 seconds after request)
Text received: 84 (2.37 seconds after request)
Text received: , (2.39 seconds after request)
Text received: 85 (2.39 seconds after request)
Text received: , (2.40 seconds after request)
Text received: 86 (2.43 seconds after request)
Text received: , (2.43 seconds after request)
Text received: 87 (2.44 seconds after request)
Text received: , (2.45 seconds after request)
Text received: 88 (2.46 seconds after request)
Text received: , (2.47 seconds after request)
Text received: 89 (2.48 seconds after request)
Text received: , (2.49 seconds after request)
Text received: 90 (2.50 seconds after request)
Text received: , (2.51 seconds after request)
Text received: 91 (2.52 seconds after request)
Text received: , (2.54 seconds after request)
Text received: 92 (2.55 seconds after request)
Text received: , (2.57 seconds after request)
Text received: 93 (2.57 seconds after request)
Text received: , (2.58 seconds after request)
Text received: 94 (2.59 seconds after request)
Text received: , (2.60 seconds after request)
Text received: 95 (2.62 seconds after request)
Text received: , (2.62 seconds after request)
Text received: 96 (2.64 seconds after request)
Text received: , (2.65 seconds after request)
Text received: 97 (2.66 seconds after request)
Text received: , (2.67 seconds after request)
Text received: 98 (2.68 seconds after request)
Text received: , (2.69 seconds after request)
Text received: 99 (2.71 seconds after request)
Text received: , (2.72 seconds after request)
Text received: 100 (2.73 seconds after request)
Full response received 2.73 seconds after request
Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100

Time comparison

In the example above, both requests took about 3 seconds to fully complete. Request times will vary depending on load and other stochastic factors.

However, with the streaming request, we received the first token after 0.18 seconds, and subsequent tokens every ~0.01-0.02 seconds.