Awesome
Movie Search Engine
This is a demo of a movie search engine. This project is inspired by Andrej Karpathy's weekend hack and is forked from this old project weaviate/weaviate-examples/movies-search-engine.
This project allows three types of searches over movies: keyword-based (BM25), semantic, and hybrid searches. Additionally, it retrieves similar movies to a selected one.
Read more on the related blog.
Prerequisites
- Docker
- Python
- Set the environment variables for your $OPENAI_API_KEY, $WEAVIATE_API_KEY, and $WEAVIATE_URL. If you are running Weaviate via Docker, the WEAVIATE_URL is "http://localhost:8080" and no WEAVIATE_API_KEY is needed.
Setup instructions
Follow the following steps to reproduce the example
- Setup a virtual environment
python -m venv .venv
source .venv/bin/activate
- Set your OPENAI_API_KEY in the docker-compose.yml file and run the following command to run the weaviate docker file
docker compose up -d
- Run the following command in directory to install all required dependencies
pip install -r requirements.txt
- Run the following command to add all the data objects,you can change path of dataset at line 115 if necessary. You can also decrease the number of data objects at line 119 so that it takes less time.
python add_data.py
- After adding data run the following command to install all required node modules.
npm install
- After adding data and installing modules run the following command and navigate to http://localhost:3000/ to perform searching
npm run start
Large Language Model (LLM) Costs
This project utilizes OpenAI models. Be advised that the usage costs for these models will be billed to the API access key you provide. Primarily, costs are incurred during data embedding. The default vectorization engine for this project is Ada v2
.
Project Architecture
This project is built on three primary components:
- Weaviate Database: You have the option to host on Weaviate Cloud Service (WCS) or run it locally.
- Frontend: HTML,CSS,Js
- Backend: NodeJs
Dataset
- 48,000+ movies dataset (License: CC0: Public Domain) for the columns: 'Id', 'Name', 'PosterLink', 'Genres', 'Actors', 'Director', 'Description', 'DatePublished', and 'Keywords'
- Wikipedia Movie Plots (License: CC BY-SA 4.0), for the column 'Plot'
Open Source Contributions
Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Visit our Weaviate Community Forum if you need any help!