Home

Awesome

Lemmy (Stats) Data Exporter

About This Project

This project aims to act as a bridge between Lemmy's PostgreSQL database and InfluxDB, primarily to track various metrics that might be important to instance administrators. It is intended to be used with Grafana however you're more than welcome to of course use the data on your favorite Influx-consuming-mechanism.

You can preview what the final result looks like here - since its a "snapshot" version, some components might look a bit off (such as a "Page not found" title, and the "email verification breakdown" panel for some reason - which is the same type of panel as "User Breakdown", with the same data from "Email Verification Overview").

Motivations

I've built this project because I want to be a responsible Lemmy instance administrator. The Fediverse (Wikipedia) relies on trust between instance administrators to keep their instances clean. As of the current time of writing, Lemmy doesn't really have too many tools built-in to audit what is occurring on your instance. However, the data is certainly there, in its PostgreSQL database! However, no one wants to really be stuck at the psql command line (or well, I certainly don't) so I figured if I had a way to visualize the data, then it would give me an easy way to keep tabs on my Lemmy instance.

Also, I've been wanting to get a better grasp on Rust (the codebase may reflect this...) - which happens to also be the same backend language that Lemmy uses! At the same time, aside from a premade dashboard here and there, I've never made my own Grafana dashboard (or interacted with InfluxDB for that matter) so this project gave me quite a few different topics to learn from.

The spirit of the Fediverse is all about being open, so of course I'm happy to share this project with anyone whose interested!

Sounds Great! Where do I get started?

Before we get to the good part, if you'll just allow me a quick disclaimer:

Please utilize this tool responsibly, and carefully. I'm sure I do not need to give instance admins the usual "backup your stuff!" talk, but please - make sure you have backups of any important data. There are no plans for this tool to use anything other than SELECT queries with the database, but who knows, maybe a cosmic ray hits your system and flips a bit causing that SELECT to turn into a DELETE... okay, that can't actually happen for numerous reasons - but still, please take backups. You should be doing that anyways!

Requirements

To utilize this project as intended, you'll need the following:

Compiling from Source

Whenever you want to utilize a tool that is built by someone else to interact with a piece of software you're running, I find it a good idea to audit the source code of the project (even if you don't know Rust, you can take a look at the SQL queries that are made in src/main.rs) and build it from source. As such, I'm going to place this section first - its easy anyways!

As mentioned above, you'll need a recent version of Rust-stable's toolchain to be installed. My favorite method for installing Rust is with Rustup, as if you choose to install it from your distribution's package manager, you may end up getting an older version (I've been bitten by this in the past personally...) - speaking of which, you can try to compile this on Windows or macOS, I don't see why it wouldn't work but I've not personally tried it myself.

Then, clone this repository somewhere, and run the cargo build --release command. You'll then have a handy binary over at target/lemmy-data-exporter!

Preparation

Whether you've compiled the project from source, or downloaded a binary from the releases section then we're almost ready to go!

If you haven't already, you'll need to create an Influx database for your metrics/stats to go into. While providing instructions on installing InfluxDB is out of scope for this guide, that link should get you started! Once you have InfluxDB up and running, you should be able to connect to your instance with the influx command run create database YOUR_DB_NAME_HERE (the name of the database doesn't matter, but keep note of it somewhere).

Additionally, if you haven't already you'll need to make sure that your Lemmy's PostgreSQL instance is available to connect to from the system that you're running this utility on... Speaking of which, a couple of more disclaimers before we proceed:

NOTE:

Do not continue, nor pass "go", until you've read the above two notes!

Continuing on, you'll need to make sure that you can connect to your Lemmy PostgreSQL container from the system that this tool is running on. Assuming that you're using a docker-compose.yml file for your Lemmy stack, and that this tool is going to be running on the same system as the rest of your Lemmy containers, you can modify docker-compose.yml and look for the postgres section, and make the following modification:

postgres:
    image: postgres:15-alpine
    # this needs to match the database host in lemmy.hson
    hostname: postgres
    ports:
      - "127.0.0.1:5432:5432" # <-- This is the line you need, along with the "ports:" declaration right above this line
    #    ^ Last warning, do NOT forget to explicitly bind it to localhost unless you're dead-sure of the potential ramifications from skipping doing so!
    # You'll probably have things below this line, do not touch those lines and keep them as they are
    # ...etc

Now, you'll need to create a separate database user for this tool (you don't actually have to but I really really recommend making a separate one, as a general best practice) - this user will only be granted SELECT permissions, so that even if you or someone else were to change all of the queries in this tool from SELECT to DELETE it wouldn't have any effect - acting as a good enough safeguard. To do so, you'll need access to the psql CLI tool, which comes with the postgres container if you do not already have it installed elsewhere:

You have backed up your database, right? Please do not proceed with running any SQL commands until you've done so!

Run the following commands in the same folder as your docker-compose.yml file:

docker compose exec -it postgres /bin/bash

That will get you into the container, now to enter the postgres CLI:

psql -U lemmy

Now to create a new user called metrics (it can be whatever you'd like, but keep note of it somewhere, along with the password):

CREATE USER metrics WITH PASSWORD 'INSERT_A_SECURE_PASSWORD_HERE';

Next, grant it SELECT access to your lemmy database:

GRANT SELECT ON ALL TABLES IN SCHEMA public TO metrics;

Excellent! Now press Ctrl+d twice (the first time will exit the psql shell, and the second time will exit the postgres container's bash shell). Now, we're finally ready to start exporting the metrics!

Running

It took a minute to get here, apologies! I try to be incredibly thorough, while also being detailed so that you know exactly what you're doing. Those steps will only have to be done once though!

lemmy-data-exporter uses environmental variables to instruct it on the various details it needs to bridge the connection between your PostgreSQL database and Influx database.

One last disclaimer... I think. My instance is no where near the size of the major-league Lemmy instances. While I believe these queries should be fairly efficient, I'm not a professional DBA. What I'm saying is, when you run this tool, the time it takes to run is most likely dependent on the size of your instance. When I run this on the same machine that my instance is running on, it takes not even a full second to run. But its fine, because we have backups! Right...?

The environmental variables are as follows:

VariableDescriptionExample
PG_DB_HOSTThe IP/hostname of your Lemmy postgres db127.0.0.1
PG_DB_USERThe name of the user you created for this toolmetrics
PG_DB_PASSThe password of the user you created for this toolhunter2
PG_DB_NAMEThe name of your lemmy databaselemmy
INFLUX_HOSTThe IP/hostname of your Influx database127.0.0.1
INFLUX_PORTThe port number of the HTTP API for Influx (its probably 8086)8086
INFLUX_NAMEThe name of the Influx database you created earliermetrics
SLEEP_DURATION(Optional) The amount of time in seconds to sleep between runs60

At the current moment, all of these variables (except SLEEP_DURATION) are required - if you miss one, the application will scream at you and walk out the door, leaving an error message in its wake.

About "SLEEP_DURATION": This flag sets the amount of time to sleep between runs. It's not required, and setting it to zero or leaving it out will keep the default behavior - which is quitting upon completion.

To start the application, place all of your variables before the invocation of the command, example:

PG_DB_HOST=127.0.0.1 PG_DB_USER=metrics PG_DB_PASS=hunter2 PG_DB_NAME=lemmy INFLUX_HOST=127.0.0.1 INFLUX_PORT=8086 INFLUX_NAME=metrics ./lemmy-data-exporter

NOTE: If you have any special characters in the values you're passing to these variables (such as PG_DB_PASS) then wrap the variable in single-quotes ' to prevent bash from expanding it. Example:

PG_DB_HOST=127.0.0.1 PG_DB_USER=metrics PG_DB_PASS='My@Secure!12978Password' PG_DB_NAME=lemmy INFLUX_HOST=127.0.0.1 INFLUX_PORT=8086 INFLUX_NAME=metrics ./lemmy-data-exporter

(Permission denied? Make the binary executable first: chmod +x lemmy-data-exporter)

If it worked, you'll get some output from the program, and then it'll exit as it only runs the operation once. We'll fix that in a moment!

Example Output:

Connecting to Postgres...
Found stats: LemmyStats {
    time: 2023-07-02T15:16:05.607633219Z,
    registered_users: 12, # This should be self explanatory, but its the total number of users registered on your instance
    verified_users: 12, # The amount of users who've verified their email address
    unverified_users: 0, # The amount of users who have NOT verified their email address
    approved_users: 12, # The amount of registered users who had their accounts approved
    unapproved_users: 0, # The amount of registered users who have NOT had their accounts approved
    num_of_apps: 0, # The current number of pending application/sign up requests (yes, "apps" was probably a bad name... oh well)
    denied_users: 0, # The amount of applicants who were denied
    known_communities: 92, # The amount of communities your instance knows about
    known_instances: 1533, # The amount of instances your instance knows about
    known_comments: 155645, # The amount of comments your instance knows about
    known_posts: 13166, # The amount of posts your instance knows about
    comments_from_local: 250, # The amount of comments users on YOUR instance have made (whether locally, or remotely)
    posts_from_local: 4, # The amount of posts users on YOUR instance have made (whether locally, or remotely)
    upvotes_from_local: 3950, # The amount of upvotes (posts+comments) users on YOUR instance have cast (whether locally, or remotely)
    downvotes_from_local: 40, # The amount of downvotes (posts+comments) users on YOUR instance have cast (whether locally, or remotely)
}
Wrote to Influx: 
Done pushing Influx stats
Pushed stats to influx

Running via Docker [Compose]

There is an included docker-compose.yml file for your convenience, which has a ton of comments on how to use lemmy-stats-exporter as a dockerized application if that is your preference. Please be sure to read the comments there in full before blindly deploying it.

Of course, you can run this without compose by either building the included Dockerfile locally, or pulling the russjr08/lemmy-stats-exporter image on DockerHub. If you do this, I assume you are familiar with running containersoutside of docker-compose, but I'd still have a look at the comments in docker-compose.yml just to be aware of any potential caveats.

Automation

I'd imagine that you're not going to want to sit there and continuously run that command, because I certainly don't! To make things easier and automatic, I've created a systemd service/timer that you can use to run this in the background. If you look at the files in the systemd folder, you'll want to go ahead and grab them, and do the following with them:

Now, once those are all copied over, run systemctl daemon-reload in order to make systemd aware of the new units. You can then give it a try by running systemctl start lemmy-data-exporter.service, if it returns without an error then it most likely worked but you can verify by running systemctl status lemmy-data-exporter.service (it is normal for the "Active" state to say "inactive (dead)" since its being triggered by the timer, and isn't a long-living process). Assuming that went well, enable the timer (so that it runs automatically) with systemctl enable lemmy-data-exporter.timer.

Both units have been set so that if for whatever reason docker.service isn't running (such as it crashed, or you explicitly stopped it) then the timer/service should automatically stop as well. Congrats, the hard part is over!

Now what?

Now that we've gotten some metrics out of the Lemmy database and into an Influx database, we can visualize it using a tool such as Grafana! I've also taken the liberty of exporting the Grafana dashboard I built into the grafana folder. If you didn't see the preview earlier, you can find a preview version of the dashboard over here.

Installing Grafana is also out of scope for this README, but once you do have Grafana installed, go to the bottom left and click Settings -> Data Sources, then Add Data Source.

Make sure to select InfluxDB as your source type, and then from there fill in the details of your InfluxDB that you setup earlier (I'm not sure if the data source name impacts whether the dashboard import works, I named mine InfluxLemmy). Note, if you're running Grafana on a server that is different from the InfluxDB server, it will need a connection to Influx in order to actually read the data. You could do this with a firewall exception, a Wireguard tunnel, etc. Once the details are entered in, click Save & test at the bottom and it should confirm that your data source was added that Grafana could connect to it.

Then go to the left sidebar, and click Dashboards (the four squares almost Windows-logo-looking icon) -> Import, from there you can then either save the JSON file from GitHub onto your computer, or just copy it from file viewer and paste it into the Import via panel json textbox.

The end result, should look something like this if all went well!

Wrapping up

Thanks for taking a look at my project! Hopefully it can help others! If you have any questions, please don't hesitate to reach out to me either via Matrix or just opening an issue here.

How can I contribute?

There's a few ways you can contribute!