Home

Awesome

LangTrace - Trace Attributes

This repository hosts the JSON schema definitions and the generated model code for both Python and TypeScript. It's designed to streamline the development process across different programming languages, ensuring consistency in data structure and validation logic. The repository includes tools for automatically generating model code from JSON schema definitions, simplifying the task of keeping model implementations synchronized with schema changes.

Repository Structure

/
├── schemas/                      # JSON schema definitions
│   └── openai_span_attributes.json
├── scripts/                      # Shell scripts for model generation
│   └── generate_python.sh
├── generated/                    # Generated model code
│   ├── python/                   # Python models
│   └── typescript/               # TypeScript interfaces
├── package.json
├── requirements.txt
├── README.md
└── .gitignore

Prerequisites

Before you begin, make sure you have the following installed on your system:

Generating Models

Python Models

To generate Python models from a JSON schema, use the generate_python.sh script located in the scripts directory. This script takes the path to a JSON schema file as an argument and generates a Python model in the generated/python directory.

./scripts/generate_python.sh schemas/llm_span_attributes.json

TypeScript Interfaces

To generate TypeScript interfaces from a JSON schema, use the scripts/generate_typescript.sh script located in the scripts directory. This script also takes the path to a JSON schema file as an argument and generates a TypeScript interface in the src/typescript/models directory. t

(cd src/typescript && npm i)
./scripts/generate_typescript.sh schemas/llm_span_attributes.json

OpenTelemetry Semantic Attributes

Service TypeNameType/SchemaDescription
LLMllm.prompts[{role: string, content: string}]Captures the input messages given to the LLM. It includes the prompt with role "System" and any "user" and "assistant" messages along with the history.<br>Notes:<br>1. Prompts are standardized for every LLM vendor.<br>2. The "system" role will always represent the system prompt passed. Ex: The preamble parameter passed to the cohere API is appended to the system prompt and captured within llm.prompts.
LLMllm.responses[{role: string, content: string}]Captures the output messages given by the LLM.<br>Notes:<br>1. For image generation, content is an object which has, 'url' which is the url of the image and any other properties that gets attached with it based on the LLM vendor.<br>2. For tool calling, the list includes role, content and additional properties like tool_id depending on the LLM vendor.
LLMllm.token.countsllm.token.counts: {<br> input_tokens: number,<br> output_tokens: number,<br> total_tokens: number<br> }Captures the token counts used with the request including input, output and total tokens.<br>Notes:<br>1. For streaming mode, some LLM vendors like OpenAI do not have the token counts. So, this metric calculates the token counts for each stream chunk using the tiktoken library. As a result, it may not be accurate.<br>2. For cohere, this captures the billed units. And also captures the search_units when search capabilities are used.
LLMllm.apistringThe endpoint being invoked. Ex: /chat/completions
LLMllm.modelstringThe model used for the call. The model is captured from the response and not from the request. Response has the accurate model name. Ex: Passing "gpt-4" in the request can result in "gpt-4-0613" in the response depending on the version of gpt-4 being used. This is more accurate description of the model used for the call.
LLMllm.tempraturenumberThe temperature setting used
LLMllm.top_pnumberTop P setting
LLMllm.top_knumberTop K setting<br>Note:<br>1. For LLMs that support top_n, the argument is captured in this attribute as both top_k and top_n represent the same thing.
LLMllm.userstringThis is an LLM request parama for identifying the user originating this request. Not to be confused with the user.id attribute passed to the langtrace SDK using with_additional_attributes option.
LLMllm.system.fingerprintstringThe system fingerprint parameter passed to the API.
LLMllm.streambooleanWhether or not streaming is used
LLMllm.encoding.formats[string]Mainly applies to Embedding models. List of encoding formats used for embedding.
LLMllm.dimensionsstringThe number of dimensions the resulting output embeddings should have
LLMllm.generation_idstringCaptures the generation_id from a response if any.
LLMllm.response_idstringCaptures the response_id from a response if any.
LLMllm.citations[object]List of citations from cohere’s response. Serialized as is without any mutation to apply any standardization. Cohere Documentation on Documents and Citations
LLMllm.documents[object]Serialized list of documents passed to the rerank API of cohere. This primarily applies to retrieval models and serialized as is without any mutation to apply any standardization.
LLMllm.frequency_penaltystringFrequency penalty if passed
LLMllm.presence_penaltystringPresence penalty if passed
LLMllm.connectors[object]Applies mainly for cohere. Serialized directly without mutation.
LLMllm.tools[object]The list of tools or functions available for the LLM to take a decision on. There is no standardization applied for the schema and serialized as is for different LLM vendors.
LLMllm.tool_results[object]For LLM vendors that require tool_results passed as a separate parameter with the request. Ex: Cohere. For OpenAI, tool results are part of the messages parameter and are captured with llm.prompts.
LLMllm.embedding_inputs[string]Captures the input strings provided to the embedding model.
LLMllm.embedding_dataset_idstringApplies only for cohere
LLMllm.embedding_input_typestringApplies only for cohere
LLMllm.embedding_job_namestringApplies only for the embed_job API for cohere.
LLMllm.retrieval.querystringQuery passed to the retrieval model. Ex: Cohere Rerank
LLMllm.retrieval.results[string]Serialized array of objects returned by a retrieval model that usually includes the score and the index of the documents passed.
VectorDBserver.addressstringCaptures the DB server address if found
VectorDBdb.operationstringOperations of a vectorDB - add, delete, query, peek etc.
VectorDBdb.systemstringCaptures the db - chromedb, pinecone etc.
VectorDBdb.namespacestringNamespace of the database
VectorDBdb.indexstringIndex passed to the database if any
VectorDBdb.collection.namestringCaptures the collection name where vectors are stored that the operation is querying.
VectorDBdb.pinecone.top_kstringCaptures the top_k value for KNN search
VectorDBdb.chromadb.embedding_modelstringCaptures the embedding model used with chromadb
Frameworkhttp://langchain.task.name/angchain.task.namestringShort term that indicates what task the framework is performing. The names are framework specific. Currently it could be one of the following: load_pdf, vector_store, split_text, retriever, prompt, runnable, runnablepassthrough, jsonoutputparser, stroutputparser, listoutputparser, xmloutputparser.
Frameworklangchain.inputsstringSerialized inputs to the function call
Frameworklangchain.outputsstringSerialized outputs of the function call
Frameworkllamaindex.task.namestringShort term that indicates what task the framework is performing. Currently it could be one of the following - query, retrieve, extract, aextract, load_data, chat, achat
Frameworkllamaindex.inputsstringSerialized inputs to the function call
Frameworkllamaindex.outputsstringSerialized outputs of the function call
Langtraceuser.feedback.ratingnumberThis is useful for capturing the feedback provided by the user of the application for an LLM’s response. Ex: a user hitting a thumbs up or down for a chatbot’s response.
Langtraceuser.idstringThis is application specific and can be optionally passed using the with_additional_attributes option from the SDK for tying users to requests. More details: Langtrace Trace User Feedback
Langtracelangtrace.testIdstringUnique id of the test generated within langtrace for capturing requests to a specific test bucket. Useful for evaluating a set of requests against a specific test. Ex: A test for measuring factual accuracy.
Langtracelangtrace.service.namestringCaptures the service name - Ex: openai, llamaindex etc.
Langtracelangtrace.service.typestringCaptures the service type - It can be one of the below 3<br>- LLM<br>- VectorDB<br>- Framework
Langtracelangtrace.service.versionstringVersion of the library being used: Ex: 3.0.0 represents the 3.0.0 version of openai python library
Langtracelangtrace.sdk.namestringLangtrace SDK that is generating this span. Currently its typescript or python.
Langtracelangtrace.versionstringLangtrace SDK version.

Contributing

Contributions are welcome! If you'd like to add a new schema or improve the existing model generation process, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or fix.
  3. Make your changes.
  4. Test your changes to ensure the generated models are correct.
  5. Submit a pull request with a clear description of your changes.

License

This project is licensed under the Apache 2.0. See the LICENSE file for more details.