Home

Awesome

Enterprise Azure OpenAI

Repository detailing the deployment of an Enterprise Azure OpenAI reference architecture. <br/>Link: Azure Architecture Center - Monitor OpenAI Models

An advanced pattern is available for customers using models with larger token sizes, want to perform advanced analytics on the prompts and responses, or have requirements to send these events to another type of data store.

Key Solution Advantages:

<div> <video controls src="https://user-images.githubusercontent.com/47987698/232847412-e0f5cdd1-a587-457f-9365-e3caa21c8ec9.mp4" muted="false"></video> </div>

Reference Architecture

img

<table style="border-collapse:collapse"> <tr> <td style="border: none;vertical-align: top; font-size:.9em;">1. Client applications can access Azure OpenAI endpoints to perform text generation (completions) and model training (fine-tuning) endpoints to leverage the power of large language models.<br/> <br/> 2. Next-Gen Firewall Appliance (Optional) - Provides deep packet level inspection for network traffic to the OpenAI Models. </td> <td style="border: none;vertical-align: top; font-size:.9em;">3. API Management Gateway enables security controls, auditing, and monitoring of the Azure OpenAI models.  Security access is granted via AAD Groups with subscription based access permissions in APIM.  Auditing is enabled via Azure Monitor request logging for all interactions with the models.  Monitoring enables detailed AOAI model usage KPIs/Metrics.</td> <td style="border: none;vertical-align: top; font-size:.9em;">4. API Management Gateway connects to all Azure resources via Private Link to ensure all traffic is secured by private endpoints and contained to private network.<br/><br/>5. Multiple Azure OpenAI instances enable scale out of API usage to ensure high-availability and disaster recovery for the service.</td> </tr> </table>

Features

This project framework provides the following features:

Getting Started

Prerequisites

Installation

Provisioning artifacts, begin by provisioning the solution artifacts listed below:

(Optional)

Managed Services

Configuration

Azure OpenAI

Azure Key Vault

Provision an Azure Key Vault Resource: Deploy Key Vault

Once deployed, add Key1 from the Azure OpenAI instance as a secret: Add a Secret

API Management Config

(Optional) Subscription Access Control

API Management allows API providers to protect their APIs from abuse and create value for different API product tiers. Use of API Management layer to throttle incoming requests is a key role of Azure API Management. Either by controlling the rate of requests or the total requests/data transferred.

Logging OpenAI completions

import openai

openai.api_type = "azure"
openai.api_base = "https://xxxxxxxxx.azure-api.net/" # APIM Endpoint
openai.api_version = "2023-05-15"
openai.api_key = "APIM SUBSCRIPTION KEY" #DO NOT USE ACTUAL AZURE OPENAI SERVICE KEY


response = openai.Completion.create(engine="modelname",  
                                    prompt="prompt text", temperature=1,  
                                    max_tokens=200,  top_p=0.5,  
                                    frequency_penalty=0,  
                                    presence_penalty=0,  
                                    stop=None) 

</code>

Demo

ApiManagementGatewayLogs
| where tolower(OperationId) in ('completions_create','chatcompletions_create')
| where ResponseCode  == '200'
| extend modelkey = substring(parse_json(BackendResponseBody)['model'], 0, indexof(parse_json(BackendResponseBody)['model'], '-', 0, -1, 2))
| extend model = tostring(parse_json(BackendResponseBody)['model'])
| extend prompttokens = parse_json(parse_json(BackendResponseBody)['usage'])['prompt_tokens']
| extend completiontokens = parse_json(parse_json(BackendResponseBody)['usage'])['completion_tokens']
| extend totaltokens = parse_json(parse_json(BackendResponseBody)['usage'])['total_tokens']
| extend ip = CallerIpAddress
| where model !=  ''
| summarize
    sum(todecimal(prompttokens)),
    sum(todecimal(completiontokens)),
    sum(todecimal(totaltokens)),
    avg(todecimal(totaltokens))
    by ip, model

img

ApiManagementGatewayLogs
| where tolower(OperationId) in ('completions_create','chatcompletions_create')
| where ResponseCode  == '200'
| extend model = tostring(parse_json(BackendResponseBody)['model'])
| extend prompttokens = parse_json(parse_json(BackendResponseBody)['usage'])['prompt_tokens']
| extend prompttext = substring(parse_json(parse_json(BackendResponseBody)['choices'])[0], 0, 100)

img

Resources

Frequently Asked Questions