Home

Awesome

<!--- SPDX-FileCopyrightText: 2022 Brian Calhoun <brian@bemorehuman.org> SPDX-License-Identifier: MIT --> <p align="center"> <img src="/assets/logo.svg" width=15% height=15%> </p>

What does bemorehuman do?

bemorehuman is a recommendation engine; sometimes these things are called recommender systems. It recommends things such as books, or movies, or beer, or music bands to people. You input implicit or explicit ratings data or purchase data or click data, and bemorehuman will build a model of that dataset. With the model built and loaded into RAM, you can then query the system via a REST interface and receive recommendations in real-time.

Why is it useful?

People don't want to waste time getting poor quality recommedations. Often, when the word "recommendation" is used what's really meant is "popular thing for your demographic." If you're lucky, that is. If you're not lucky, recommendation can easily mean "different models of the thing you just bought" or "whatever we're paid to promote this week." Our software takes a different view. We think it's most important to give recommendations purely based on what the individual receiving the recs might like. This way, the traditional marketer-focused mechanisms can be bypassed and focus can be given to what people would find interesting or useful.

How it works

Here is a simple example of how bemorehuman works:

What's special about bemorehuman?

We focus on making bemorehuman the best recommender it can be, based on the core principle of focusing on the person receiving the rec and what would be interesting to them. Here's a list of things we think make us special. Some of these things we have in common with other recommenders, of course.

Where can I find out more?

For help getting started with the code:

Keep reading this README file or browse the code itself. You can also ask questions on our GitHub Discussions.

For motivations behind the source code:

See the MANIFESTO file.

For custom support, development, or training:

Contact us at hi@bemorehuman.org or visit https://bemorehuman.org

Contributing

You can help by commenting on or addressing the current issues. Feature requests are welcome, too. Check out the discussions and ask away. See our CONTRIBUTING file for more details.

To help foster a welcoming community, we've adopted the Contributor Covenant Code of Conduct.

Licensing

bemorehuman is open source, released under the MIT license. See the COPYING file or https://opensource.org/licenses/MIT for details. Contributions to this project are accepted under the same or similar license. Individual files contain SPDX tags to identify copyright and license information which allow this project to be machine-readable for license details. bemorehuman is fully Reuse Software-compliant.

REUSE status

CMake on multiple platforms

Installation instructions for bemorehuman

Steps involved:

  1. Download bemorehuman.
  2. Build the binaries.
  3. Make working directory.
  4. Integrate with a webserver.
  5. (optional but recommended installation verification) Download and prep Grouplens movie rating dataset.
  6. (optional but recommended installation verification) Run the test-accuracy binary and compare the results against a known working system.

Detailed instructions:

STEP 1: Download bemorehuman.

You can get bemorehuman at https://github.com/BeMoreHumanOrg/bemorehuman

For these instructions, let's say the source you clone or download/unpack resides at ~/src/bemorehuman

STEP 2: Build the binaries.

Background info

Build Dependencies

Directory structure

To build bemorehuman

Follow the regular cmake way of building:

 cd ~/src/bemorehuman    # or wherever the bemorehuman source is on your machine
 mkdir build; cd build
 cmake ..
 cmake --build .   
 sudo make install       # root is used to install the include/library/binaries and create config file under /etc)

 # if you are using your own HTTP server such as nginx, add "-DUSE_FCGI=ON" to the first cmake above 
 cmake -DUSE_FCGI=ON ..  
 # if you want to use protobuf instead of the default json, add "-DUSE_PROTOBUF=ON" to the first cmake above
 cmake -DUSE_PROTOBUF=ON ..  

Binaries built: These are the binaries that get created automatically in the above cmake process:

Additional build notes

STEP 3: Make working directory.

sudo mkdir /opt/bemorehuman
sudo chown <user who'll run bemorehuman> /opt/bemorehuman

STEP 4 (Optional): Integrate with your own webserver.

By default, bemorehuman uses its own HTTP server called hum. Hum caters to recgen's needs. Meaning, it handles POST requests from HTTP clients, but not GET. For this and the fact that there's very little error processing, please don't use hum in production. The instructions in this step are only for the situation where you want to use your own webserver. If you're ok with using hum, please skip to Step 5.

If you want to use your own webserver, just make sure it can integrate with FastCGI. I like nginx. To install nginx from Debian or Ubuntu, "sudo apt install nginx"

The default socket used for communication between recgen and the webserver is

/tmp/bemorehuman/recgen.sock

So you'll need to inform your external webserver of that. For nginx, you can add this clause to the server section of /etc/nginx/nginx.conf or /etc/nginx/sites-enabled/default:

listen 8888
location ^~ /bmh {
    include /etc/nginx/fastcgi_params;
    fastcgi_pass  unix:/tmp/bemorehuman/recgen.sock;
    fastcgi_keep_conn on;
}

After making the above changes, restart your webserver. On Debian, it's:

sudo service nginx restart

At this point bemorehuman is ready to use. However, I recommend you complete the following steps to verify the installation and make sure recommendations are working correctly.

The idea is that when you have bemorehuman installed you can test with a known dataset and check that your results are in line with what we got here at Be More Human HQ.

STEP 5 (Optional but recommended): Download and prepare Grouplens/Movielens movie rating dataset.

NOTE: All times below are from an Intel i7 8559u NUC development machine running Debian Linux, with 20 GB RAM and an SSD drive.

NOTE: "Grouplens" is the name of a University of Minnesota research lab, "Movielens" is the name of the movie ratings project. I use the terms interchangeably. They've been providing movie ratings data for research purposes for a long time. I have no affiliation with either Grouplens nor the university.

The web page that describes the dataset: https://grouplens.org/datasets/movielens/

The rest of these instructions assume you download the 25 million rating dataset. The actual file to download:

https://files.grouplens.org/datasets/movielens/ml-25m.zip

To prepare the Movielens data:

STEP 6 (Optional but recommended): Run the test-accuracy binary and compare the results against a known working system.

To run bemorehuman

Expected Results:

Please use the following numbers as a ballpark guide. All numbers below are from a desktop development machine.

"**Across all 10 users we're evaluating, Mean Absolute Error (MAE) is 1.895062, or 18.950617 percent for the ones we held back Across all 10 users we're evaluating, random recs have MAE of 3.197531."

So you should expect to see similar results. Why similar and not exact? Because of the testing method. Users evaluated for testing are chosen at random. We try to predict what we know they've already rated and the MAE is how far away the prediction is relative to scale. In the above example MAE of 1.89 means we were just under 2 away on average, on a 10-scale. So that translates to just under 1 away on a 5-scale like what might be typical when you ask people to rate things as one to five stars.

In our testing, these kinds of numbers are what we see on average no matter how many people are chosen at random from the test-accuracy client. Your results with this dataset should be similar.

If your results are significantly different, such as an MAE of 2.5 or more when running "bemorehuman -s 10 -t 20" with this Movielens dataset, then something's not right. In this situation, you may want to erase the contents of your working directory (by default it's /opt/bemorehuman) and start again after the download part in step 5.

Enjoy!

If you have questions or comments, hit us up on GitHub at https://github.com/BeMoreHumanOrg or https://bemorehuman.org

bemorehuman recommendation engine concepts

Big picture

For a list of background ideas and principles behind the technology, please read the MANIFESTO file.

Definitions

ratgen: Ratings generator. If the input behaviour data is only things like purchase data or listen data then ratgen can be used to generate ratings which can then be fed into valgen.

valence: A pairwise relationship between two things that are rated. Valences get loaded into RAM in order to generate recommendations.

valgen: Valence generator. Inputs are ratings. Outputs are valences in a valences.out flat file.

recgen: Recommendation generator. Inputs are valences. Outputs are runtime recommendations. See test-accuracy for sample client implementation.

Valgen pipeline

ratings in flat file --> "valgen" --> valences.out flat file

Recgen pipeline