Home

Awesome

Anitomy

Anitomy is a C++ library for parsing anime video filenames. It's accurate, fast, and simple to use.

Examples

The following filename...

[TaigaSubs]_Toradora!_(2008)_-_01v2_-_Tiger_and_Dragon_[1280x720_H.264_FLAC][1234ABCD].mkv

...is resolved into these elements:

Here's an example code snippet...

#include <iostream>
#include <anitomy/anitomy.h>

int main() {
  anitomy::Anitomy anitomy;
  anitomy.Parse(L"[Ouroboros]_Fullmetal_Alchemist_Brotherhood_-_01.mkv");

  const auto& elements = anitomy.elements();

  // Elements are iterable, where each element is a category-value pair
  for (const auto& element : elements) {
    std::wcout << element.first << '\t' << element.second << '\n';
  }
  std::wcout << '\n';

  // You can access values directly by using get() and get_all() methods
  std::wcout << elements.get(anitomy::kElementAnimeTitle) << L" #" <<
                elements.get(anitomy::kElementEpisodeNumber) << L" by " <<
                elements.get(anitomy::kElementReleaseGroup) << '\n';

  return 0;
}

...which will output:

12      mkv
13      [Ouroboros]_Fullmetal_Alchemist_Brotherhood_-_01
7       01
2       Fullmetal Alchemist Brotherhood
16      Ouroboros

Fullmetal Alchemist Brotherhood #01 by Ouroboros

How does it work?

Suppose that we're working on the following filename:

"Spice_and_Wolf_Ep01_[1080p,BluRay,x264]_-_THORA.mkv"

The filename is first stripped off of its extension and split into groups. Groups are determined by the position of brackets:

"Spice_and_Wolf_Ep01_", "1080p,BluRay,x264", "_-_THORA"

Each group is then split into tokens. In our current example, the delimiter for the enclosed group is ,, while the words in other groups are separated by _:

"Spice", "and", "Wolf", "Ep01", "1080p", "BluRay", "x264", "-", "THORA"

Note that brackets and delimiters are actually stored as tokens. Here, identified tokens are omitted for our convenience.

Once the tokenizer is done, the parser comes into effect. First, all tokens are compared against a set of known patterns and keywords. This process generally leaves us with nothing but the release group, anime title, episode number and episode title:

"Spice", "and", "Wolf", "Ep01", "-"

The next step is to look for the episode number. Each token that contains a number is analyzed. Here, Ep01 is identified because it begins with a known episode prefix:

"Spice", "and", "Wolf", "-"

Finally, remaining tokens are combined to form the anime title, which is Spice and Wolf. The complete list of elements identified by Anitomy is as follows:

Why should I use it?

Anime video files are commonly named in a format where the anime title is followed by the episode number, and all the technical details are enclosed within brackets. However, fansub groups tend to use their own naming conventions, and the problem is more complicated than it first appears:

There are so many cases to cover that it's simply not possible to parse all filenames solely with regular expressions. Anitomy tries a different approach, and it succeeds: It's able to parse tens of thousands of filenames per second, with great accuracy.

The following projects make use of Anitomy:

See other repositories for related projects (e.g. interfaces, ports, wrappers).

Are there any exceptions?

Yes, unfortunately. Anitomy fails to identify the anime title and episode number on rare occasions, mostly due to bad naming conventions. See the examples below.

Arigatou.Shuffle!.Ep08.[x264.AAC][D6E43829].mkv

Here, Anitomy would report that this file is the 8th episode of Arigatou Shuffle!, where Arigatou is actually the name of the fansub group.

Spice and Wolf 2

Is this the 2nd episode of Spice and Wolf, or a batch release of Spice and Wolf 2? Without a file extension, there's no way to know. It's up to you consider both cases.

Suggestions to fansub groups

Please consider abiding by these simple rules before deciding on your naming convention:

License

Anitomy is licensed under Mozilla Public License 2.0.