Home

Awesome

Autonomous HR Chatbot built using ChatGPT, LangChain, Pinecone and Streamlit

Companion Reading: Creating a (mostly) Autonomous HR Assistant with ChatGPT and LangChain’s Agents and Tools


TL;DR/Description


This is a prototype enterprise application - an autonomous agent that is able to answer HR queries using the tools it has on hand. It was made using LangChain's agents and tools modules, using Pinecone as vector database and powered by ChatGPT or gpt-3.5-turbo. The front-end is Streamlit using the streamlit_chat component.

Tools:

  1. Timekeeping Policies - A ChatGPT generated sample HR policy document. Embeddings were created for this doc using OpenAI’s text-embedding-ada-002 model and stored in a Pinecone index.
  2. Employee Data - A csv file containing dummy employee data (e.g. name, supervisor, # of leaves etc). It's loaded as a pandas dataframe and manipulated by the LLM using LangChain's PythonAstREPLTool
  3. Calculator - this is LangChain's calculator chain module, LLMMathChain

Sample Chat

sample_chat

Sample Tool Use

sample_tool_use


How to use this repo

  1. Install python 3.10. Windows, Mac
  2. Clone the repo to a local directory.
  3. Navigate to the local directory and run this command in your terminal to install all prerequisite modules - pip install -r requirements.txt
  4. Input your own API keys in the hr_agent_backend_local.py file (or hr_agent_backend_azure.py if you want to use the azure version; just uncomment it in the frontend.py file)
  5. Run streamlit run hr_agent_frontent.py in your terminal

Storing Embeddings in Pinecone

  1. Create a Pinecone account in pinecone.io - there is a free tier. Take note of the Pinecone API and environment values.
  2. Run the notebook 'store_embeddings_in_pinecone.ipynb'. Replace the Pinecone and OpenAI API keys (for the embedding model) with your own.

Tech Stack


Azure OpenAI Service - the OpenAI service offering for Azure customers.
LangChain - development frame work for building apps around LLMs.
Pinecone - the vector database for storing the embeddings.
Streamlit - used for the front end. Lightweight framework for deploying python web apps.
Azure Data Lake - for landing the employee data csv files. Any other cloud storage should work just as well (blob, S3 etc).
Azure Data Factory - used to create the data pipeline.
SAP HCM - the source system for employee data.

Video Demo


Youtube Link


Author


Stephen Bonifacio

Feel free to connect with me on:

Linkedin: https://www.linkedin.com/in/stephenbonifacio/
Twitter: https://twitter.com/Stepanogil