Home

Awesome

About ♊️ GemiNews 📰

Self: palladius/gemini-news-crawler (public)

This is a News Slurper that takes News in real time and - hopefully - feeds an LLM with RAG knowledge.

Apps are on Cloud Run:

Description

How can we get an LLM to be updated to today’s news? Gen AI is great at answering questions.. from the past. After the LLM was trained, all you can do is RAG. How about crawling the web for latest news with Gemini for multimodal extraction and offering summarization by your favorite topic? It all gets more exciting thanks to Andrei’s langchainrb gem.

Features

App Architecture

Demos

4 juicy demos are available under webapp/docs/demo/:

https://github.com/palladius/gemini-news-crawler/blob/main/webapp/docs/demo/DEMO.md

Other Ideas

My idea is to bring slides and a demo, all done in ruby leveraging nokogiri, langchainrb and possibly some capabilities in Langchainrb that Andrei is now building (*).

Slides: explain the overall idea, empathise with audience, show architecture diagram, why we’re here, and make people laugh.

My idea is to build a demo in two parts:

Possibly, retrieve similar pictures/articles based on the questions (embedding style).

App info

TODOs

Autofeed now

  1. cd crawler/ ; $ make crawl-a-lot or make crawl-continuously. This populates XML every 15min (or I get kicked out by the robots :P ) and slurps articles from XML. XML I check on git, articles i dont or theyre too many.
  2. cd webapp ; bundle exec make seed-forever (without bundle wont work). this seeds info from (1) into ActiveRecord, hence DB.
  3. call an async routing to populate - although since v0.1.5 this should happen automatically before save of Article.
  4. This workED: cd webapp ; echo Article.compute_embeddings_for_all | rails c. Note: since I moved from Array to Vector this script is now BROKEN

Bibliography