Home

Awesome

albums

A collection of CSV/SQL files containing popular/acclaimed albums, used to make a inordinate list of albums to listen to.

This started by me merging a bunch of lists of music to listen to:

None of these sources are particularly perfect or complete, but they make me to listen to some music I otherwise may have not given a chance. Only see it as slightly better than what I used to do in the past, i.e. scrolling through streaming services and picking whatever they happen to have featured.

This contains code to interact with my spreadsheet -- listing the next albums I should listen to, validating the data using the Discogs API, or creating a SQL schema with the data

If you just want the data, see csv_data and sql_data for the sources/data. spreadsheets.csv can be used to make your own spreadsheet, sql_data/score_statements.sql is similar to that for SQL. I update these files periodically, whenever I update my own spreadsheet

nextalbums

The command that is installed by following the instructions below:

Usage: nextalbums [OPTIONS] COMMAND [ARGS]...

  Interact with my albums spreadsheet!

Options:
  --help  Show this message and exit.

Commands:
  create-sql-statements  Creates MySQL compliant SQL statements to create a...
  discogs-update         Update rows on the spreadsheet which just have a...
  export                 Parse and print all of the information from the...
  favorites              List my favorites using the SQL Database
  generate-csv           Generate the spreadsheet.csv file in the root dir
  print-next             Print the next albums I should listen to
  update-csv-datafiles   Updates the CSV files in data directory

Four of those commands are related to updating the data files here:

This entire process is managed by me using ./update, which calls those in the required order to update all the data here

nextalbums favorites is a small script that queries my favorite albums using the live SQL instance

The part of this I use most often is nextalbums print-next, which prints the next albums from the spreadsheet I should listen to:

$ nextalbums print-next
+--------------------------------+---------------------------+------+
| Album                          | Artist                    | Year |
+--------------------------------+---------------------------+------+
| Aqua City                      | S. Kiyotaka & Omega Tribe | 1983 |
| F-1 Grand Prix World           | T-Square                  | 1992 |
| Serendipity 18                 | The Bob Florence Limited  | 1998 |
|                                | Edition                   |      |
| The Miseducation Of Lauryn     | Lauryn Hill               | 1998 |
| Hill                           |                           |      |
| This Is Hardcore               | Pulp                      | 1998 |
| This Is My Truth Tell Me Yours | Manic Street Preachers    | 1998 |
| Vol. 2... Hard Knock Life      | Jay-Z                     | 1998 |
| Vuelve                         | Ricky Martin              | 1998 |
| Wide Open Spaces               | Dixie Chicks              | 1998 |
| 13                             | Blur                      | 1999 |
+--------------------------------+---------------------------+------+

nextalbums export exports the entire active spreadsheet to JSON, extracting names out of the sql_data YAML cache files:

{
  "score": 7.5,
  "note": null,
  "listened_on": "2019-02-19",
  "album_name": "Chet Baker Sings",
  "album_artwork_url": "https://img.discogs.com/KwmScXknIZQ3E9oyQiDjWRBwhDg=/fit-in/600x600/filters:strip_icc():format(jpeg):mode_rgb():quality(90)/discogs-images/R-2930625-1437056636-5751.jpeg.jpg",
  "cover_artists": "Chet Baker",
  "discogs_url": "https://www.discogs.com/master/60289",
  "year": 1954,
  "reasons": [
    "NME's 500 Greatest Albums of All Time"
  ],
  "genres": [
    "Jazz"
  ],
  "styles": [
    "Cool Jazz"
  ],
  "main_artists": [
    {
      "artist_id": 31617,
      "artist_name": "Chet Baker"
    }
  ],
  "other_artists": [
    {
      "artist_id": 31617,
      "artist_name": "Chet Baker"
    },
    {
      "artist_id": 1515883,
      "artist_name": "Allan Emig"
    },
    {
      "artist_id": 1914573,
      "artist_name": "William Claxton"
    },

I use that as part of HPI, which lets me use its query interface:

$ hpi query 'my.albums.history' \
  | jq -r '.[] | select(.score>=9) | "\(.album_name) - \(.cover_artists)"' \
  | head -n5
Paranoid - Black Sabbath
Untitled - Led Zeppelin
The Stranger - Billy Joel
Solid State Survivor - Yellow Magic Orchestra
The Wall - Pink Floyd

Sources for spreadsheet.csv:

Note for '1001 Albums You Must Hear Before You Die' and 'Rolling Stone's 500 Greatest Albums of All Time', the number of albums is above 1001 and 500 respectively, as there have been multiple versions of the book, and I've included anything that was ever on the list.

Note: The 'Rolling Stone's 500 Greatest of All Time' is a combination of the 2012 and earlier versions.

csv_data also contains 3 files that list albums I added Manually, on a Recommendation, or because of a Relation (I liked an artist so I added more of their works). These albums are not listed in spreadsheet.csv

The format of all files in csv_data except for all.csv and valid_albums.csv is:

Album Name, Artists on Cover, Year, DiscogsURL, Genres, Styles

all.csv contains albums I added manually, by relation, or on a recommendation, while valid_albums.csv does not. These CSV files also have a column that lists the Reason(s) the album is on the spreadsheet.

Installation:

Configuration for this is handled by modifying the settings.py file in this directory. Since that is just a python file, you're free to modify that to pull items out of environment variables (os.environ["ENVIRONMENT_VAR"]) or read/files do anything else. You can run the file (python3 settings.py) to print the computed settings

Note: To my knowledge, no one has ever done this yet, so if you have issues, feel free to open an issue

  1. Create your own copy of the spreadsheet.
  1. Clone this repository git clone https://github.com/purarue/albums, and install it using pip install --editable ., installing it as an editable package. This won't work as normal pip install, it must be editable. If you plan to use the database (MySQL/MariaDB), run pip install --editable '.[sql]'
  2. Create a file named client_secret.json in the root directory which contains your credentials for a google sheets OAuth connection. Instructions for how to get your client_secret.json file here; download your created credentials from the Google credentials console
  3. Run python3 setup_credentials.py to authenticate this with the Google account you created the spreadsheet on
  4. Update the SPREADSHEET_ID variable in settings.py - the ID is after the /d/ in the URL when viewing it in Google Sheets
  5. (If you want to add albums and validate them with nextalbums discogs-update) Create a file discogs_token.yaml in the root directory (info can be found here, token here) with contents like:
user_agent: myPython3DiscogsClient/1.0
token: !!str FDJjksdfJkJFDNMoiweiIRWkj

SQL

nextalbums create-sql-statements creates a file statements.sql that when run would create the following schema:

<img src="https://raw.githubusercontent.com/seanbrecke/albums/master/.github/images/diagram.png" alt="" width=600>

It can be run with the flag --use-scores, which adds the "Score" and "Listened On" columns to the "Album" Table, and creates the file score_statements.sql

Running it without the --use-scores flag is close to what statements.csv in the root directory chooses as valid albums - only albums that have won at least 1 award, disregarding any albums I added to the spreadsheet manually, by relation, or on a recommendation.

This works on both MySQL/MariaDB.

Example Queries:

Anything that's won a grammy award:

use albums;
SELECT Album.Name, Album.CoverArtists, Album.Year, GROUP_CONCAT(Reason.Description) as `Awards`
FROM Album
JOIN AlbumReason
	ON Album.AlbumID = AlbumReason.AlbumID
JOIN Reason
	ON AlbumReason.ReasonID = Reason.ReasonID
WHERE Reason.Description LIKE "Grammy%" # Anything that starts with "Grammy"
GROUP BY Album.AlbumID
ORDER BY COUNT(Reason.ReasonID) DESC # order by number of grammy awards
;

People who have worked on the most albums:

USE albums;
SELECT  Name, artist.works
FROM Artist
JOIN
(
	SELECT Artist_ArtistID, count(Artist_ArtistID) as `works` FROM ArtistWorkedOnAlbum
	GROUP BY Artist_ArtistID
	ORDER BY `works` DESC
) AS artist
ON ArtistID = artist.Artist_ArtistID
WHERE artist.DiscogsArtistURL <> 'https://www.discogs.com/artist/194' -- 194 is various artists
ORDER BY works DESC
;

My Favorite Albums from the 80s:

USE scorealbums;
SELECT Album.Name, Album.CoverArtists, Album.Year, Album.Score, Album.ListenedOn
FROM Album
WHERE Year > 1979 AND Year < 1990 AND SCORE IS NOT NULL
ORDER BY Album.Score DESC
LIMIT 25
;

Favorite Genres:

USE scorealbums;
SELECT Genre.Description, AVG(Album.Score) as `Average Score`
FROM Album
JOIN AlbumGenre
	ON Album.AlbumID = AlbumGenre.AlbumID
JOIN Genre
	ON AlbumGenre.GenreID = Genre.GenreID
WHERE Album.Score IS NOT NULL
GROUP BY Genre.GenreID
ORDER BY `Average Score` DESC
;

Selecting for a specific genre:

SELECT Album.Name, Album.CoverArtists, Album.Year, Album.Score
FROM Album
JOIN AlbumGenre
  ON Album.AlbumID = AlbumGenre.AlbumID
JOIN Genre
  ON AlbumGenre.GenreID = Genre.GenreID
WHERE Album.Score IS NOT NULL AND Genre.Description = "Funk / Soul"
ORDER BY Album.Score DESC;

server

server includes a flask server which grabs current information from spreadsheet:

'/' endpoint
get scored albums based on a few filters:
GET args:
limit=int, default 50
orderby=score|listened_on, default score
sort=asc|desc, defeault desc
'/artist' endpoint GET arg:
ids=id1,id2,id3,id4
(discogs artist IDs, which are returned in the response of '/')

Thats cached periodically and used to pull recent albums I've listened onto my 'Media Feed' window on my website