Home

Awesome

TLGS - Totally Legit Gemini Search

Overview

TLGS is a search engine for Gemini. It's slightly overengineered for what it currently is and uses weird tech. And I'm proud of that. The current code basse is kinda messy - I promise to clean them up. The main features/characteristics are as follows:

As of now, indexing of news sites, RFCs, documentations are mostly disabled. But likely be enabled once I have the mean and resources to scale the setup.

Using this project

Requirments

Building and running the project

To build the project. You'll need a fully C++20 capable compiler. The following compilers should work as of writing this README

Install all dependencies. And run the commands:

mkdir build
cd build
cmake ..
make -j

Creating and maintaining the index

To create the inital index:

  1. Initialize the database ./tlgs/tlgs_ctl/tlgs_ctl ../tlgs/config.json populate_schema
  2. Place the seed URLs into seeds.text
  3. In the build folder, run ./tlgs/crawler/tlgs_crawler -s seeds.text -c 4 ../tlgs/config.json

Now the crawler will start crawling the geminispace while also updating outdated indices (if any). To update an existing index. Run:

./tlgs/crawler/tlgs_crawler -c 2 ../tlgs/config.json
# -c is the maximum concurrent connections the crawler will make

NOTE: TLGS's crawler is distributable. You can run multiple instances in parallel. But some intances may drop out early towards the end or crawling. Though it does not effect the result of crawling.

Running the capsule

openssl req -new -subj "/CN=my.host.name.space" -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 -days 36500 -nodes -out cert.pem -keyout key.pem
cd tlgs/server
./tlgs_server ../../../tlgs/server_config.json

Via systemd

sudo systemctl start tlgs_server
sudo systemctl start tlgs_crawler

Server config

The custom_config.tlgs section in search_config.json (installed at /etc/tlgs/server_config.json) contains confgurations for TLGS server. Besides the usual Drogon's config options. custom_config changes the property of TLGS itself. Current supported options are:

ranking_algo

The ranking algorithm TLGS uses to rank pages in search result. The ranking is then combined with the text match score to produce the final search rank. Current supported values are hits and salsa. Refering to the HITS and SALSA ranking algorithm. It defaults to salsa if no value is provided.

SALSA runs slightly faster than HITS for large search results. Both literature and imperical experience suggests SALSA provides better ranking. Thus we switched from HITS to SALSA.

"ranking_algo": "salsa"

TODOs