Home

Awesome

πŸ’ͺ reliableGPT: Stop failing customer requests for your LLM App πŸš€

⚠️ DEPRECATION WARNING: LiteLLM is our new home. Thank you for checking us out! ❀️

Use LiteLLM to 20x your throughput - load balance between Azure, OpenAI (litellm router docs)

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

⚑️ Get 0 dropped requests for your LLM app in production ⚑️

When a request to your llm app fails, reliableGPT handles it by:

Community

Getting Started

Step 1. pip install package

pip install reliableGPT

Step 2. The core package is 1 line of code

Integrating with OpenAI, Azure OpenAI, Langchain, LlamaIndex

from reliablegpt import reliableGPT
openai.ChatCompletion.create = reliableGPT(openai.ChatCompletion.create, user_email='ishaan@berri.ai')

Troubleshooting

If you experience failure, try

pip install reliableGPT==0.2.976

πŸ‘‰ Code Examples

ezgif com-optimize

How does it handle failures?

Advanced Usage

Breakdown of params

Here's everything you can pass to reliableGPT

ParameterTypeRequired/OptionalDescription
openai.ChatCompletion.createOpenAI methodRequiredThis is a method from OpenAI, used for calling the OpenAI chat endpoints
user_emailstring/listRequiredUpdate you on spikes in errors. You can either set user_email to one email (as user_email = "ishaan@berri.ai") or multiple (as user_email = ["ishaan@berri.ai", "krrish@berri.ai"] if you want to send alerts to multiple emails
fallback_strategylistOptionalYou can define a custom fallback strategy of OpenAI models you want to try using. If you want to try one model several times, then just repeat that e.g. ['gpt-4', 'gpt-4', 'gpt-3.5-turbo'] will try gpt-4 twice before trying gpt-3.5-turbo
model_limits_dirdictOptionalNote: Required if using queue_requests = True, For models you want to handle rate limits for set model_limits_dir = {"gpt-3.5-turbo": {"max_token_capacity": 1000000, "max_request_capacity": 10000}} You can find your account rate limits here: https://platform.openai.com/account/rate-limits
user_tokenstringOptionalPass your user token if you want us to handle OpenAI Invalid Key Errors - we'll rotate through your stored keys (more on this below πŸ‘‡) till we get one that works
azure_fallback_strategyList[string]OptionalPass your backup azure deployment/engine id's. In case your requests start failing we'll switch to one of these (if you also pass in a backup openai key, we'll try the Azure endpoints before the raw OpenAI ones)
backup_openai_keystringOptionalPass your OpenAI API key if you're using Azure and want to switch to OpenAI in case your requests start failing
cachingboolOptionalCache your openai responses, Used as backup in case model fallback fails or overloaded queue (if you're servers are being overwhelmed with requests, it'll alert you and return cached responses, so that customer requests don't get dropped)
max_threadsintOptionalPass this in alongside caching=True, for it to handle the overloaded queue scenario

πŸ‘¨β€πŸ”¬ Use Cases

Use Caching around your Query Endpoint πŸ”₯

If you're seeing high-traffic and want to make sure all your users get a response, wrap your query endpoint with reliableCache. It monitors for high-thread utilization and responds with cached responses.

Step 1. Import reliableCache

from reliablegpt import reliableCache

Step 2. Initialize reliableCache

# max_threads: the maximum number of threads you've allocated for flask to run (by default this is 1).
# query_arg: the variable name you're using to pass the user query to your endpoint (Assuming this is in the params/args)
# customer_instance_arg: unique identifier for that customer's instance (we'll put all cached responses for that customer within this bucket)
# user_email: [REQUIRED] your user email - we will alert you when you're seeing high utilization 
cache = reliableCache(max_threads=20, query_arg="query", customer_instance_arg="instance_id", user_email="krrish@berri.ai")

e.g. The number of threads for this flask app is 50

if __name__ == "__main__":
  from waitress import serve
  serve(app, host="0.0.0.0", port=4000, threads=50)

Step 3. Decorate your endpoint πŸš€

## Decorate your endpoint with cache.cache_wrapper, this monitors for .. 
## .. high thread utilization and sends cached responses when that happens
@app.route("/test_func")
@cache.cache_wrapper 
def test_fn():
  # your endpoint logic 

Switch between Azure OpenAI and raw OpenAI

If you're using Azure OpenAI and facing issues like Read/Request Timeouts, Rate limits, etc. you can use reliableGPT πŸ’ͺ to fall back to the raw OpenAI endpoints if your Azure OpenAI endpoint fails

Step 1. Import reliableGPT

from reliablegpt import reliableGPT

Step 2. Set your backup openai token + [Optional] Set fallback strategy

Note: This is stored locally.

#Set the backup openai key
openai.ChatCompletion.create = reliableGPT(
  openai.ChatCompletion.create,
  user_email="krrish@berri.ai",
  backup_openai_key=os.getenv("OPENAI_API_KEY"),
  fallback_strategy=["gpt-4", "gpt-4-32k"],
  verbose=True)

Step 3. Test with a bad Azure Key!

#bad key
openai.api_key = "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"

for question in list_questions:
  response = openai.ChatCompletion.create(model="gpt-4", engine="chatgpt-test", messages=[{"role":"user", "content": "Hey! how's it going?"}])
  print(response)

Handle overloaded server w/ Caching

If all else fails, reliableGPT will respond with previously cached responses. We store this in a Supabase table and use cosine similarity for similarity based retrieval. Why not in-memory cache? Because when we autoscale / push new updates to our server, we didn't want the cache to be wiped out.

Step 1. Import reliableGPT

from reliablegpt import reliableGPT

Step 2. Turn on caching

#Set the backup openai key
openai.ChatCompletion.create = reliableGPT(
  openai.ChatCompletion.create,
  user_email="krrish@berri.ai",
  caching=True)

Optional: Pass your max threads

Tell reliableGPT what the maximum number of threads you have, handling your requests for you. e.g. The number of threads for this flask app is 50

if __name__ == "__main__":
  from waitress import serve
  serve(app, host="0.0.0.0", port=4000, threads=50)

Tell reliableGPT what the maximum number of threads is - max_threads=50

#Set the backup openai key
openai.ChatCompletion.create = reliableGPT(
  openai.ChatCompletion.create,
  user_email="krrish@berri.ai",
  caching=True,
  max_threads=50)

Step 3. Test it

Check out ./reliablegpt/tests/test_Caching

We spin up a flask server, and then run a test script to run a set of questions against the flask server.

Handle rotated keys

Step 1. Add your keys

from reliablegpt import add_keys, delete_keys, reliableGPT
# Storing your keys πŸ”’
user_email = "krrish@berri.ai" # πŸ‘ˆ Replace with your email
token = add_keys(user_email, ["openai_key_1", "openai_key_2", "openai_key_3"])

Pass in a list of your openai keys. We will store these and go through them in case any get keys get rotated by OpenAI. You will get a special token, give that to reliableGPT.

Step 2. Initialize reliableGPT

import openai 
openai.api_key = "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" ## Invalid OpenAI key

print("Initializing reliableGPT πŸ’ͺ")
openai.ChatCompletion.create = reliableGPT(openai.ChatCompletion.create, user_email= user_email, user_token = token)

reliableGPTπŸ’ͺ catches the Invalid API Key error thrown by OpenAI and rotates through the remaining keys to ensure you have zero downtime in production.

Step 3. Delete keys

#Deleting your keys from reliableGPT 🫑
delete_keys(user_email = user_email, user_token=token)

You own your keys, and can delete them whenever you want.

Support

Reach out to us on Discord or Email us at ishaan@berri.ai & krrish@berri.ai