Awesome
A Growing Network of Characters In Marvel and DC Universes
Abstract
Marvel and DC Comics are the two most important comic books publishers in the world. Over the years, each of them told stories of more than 20 000 characters, developing their personalities and making them interact with each others to build two gigantic universes.
The relationships between the characters represent a huge potential for graph visualization: the characters are nodes, and the links between them show how close they are to each other. Studying these links over time will allow an overall visualization of the evolution of the two separate universes thoughout the years, which will show the emergence of some groups. The study will be made for three different periods of time: [beginning - 1950], [beginning - 1990], [beginning - today] ("beginning" means all the data since Marvel and DC started publishing comic books).
Dataset
The data for Marvel and DC were taken respectively from the following websites:
In each database, information on characters and on published comics was collected.
For the characters, the gathered attributes were: Name
, Current Alias
, Relatives
, Affiliation
(team, group or organization the character is affiliated to), URL
of the webpage (used as a unique ID).
For the comic books, the gathered attributes were: Publication date
, Subcomics names
(if a comic book is made of multiple comics) Appearing characters
with a URL (to make sure they're in the database).
The graphs are based on three attributes: the characters that are relatives, the ones that are linked by the same affiliation, and the ones appearing in the same comic book.
Repository
Notebooks
All the used notebooks for the data gathering and analysis.
parsing_data.ipynb
: scraping of the databasescleaning_data.ipynb
: cleaning the obtained datarelatives_analysis.ipynb
: generating the adjacency matrix based on the relativesaffiliation_analysis.ipynb
: generating the adjacency matrix based on the affiliationscomics_analysis.ipynb
: generating the adjacency matrix based on the comics appearancesadjacency_processing.ipynb
: plotting the degree distributions of the adjacency matrices and generating the corresponding CSV files for the graph visualisation (Gephi software)
Data
Folder containing all the saved data.
characters_marvel.txt
: Marvel characters and their attributescomics_marvel.txt
: Marvel comics and their attributesclean_marvel.txt
: Cleaned Marvel dataframe (characters + comics)good_dc.txt
: DC good characters and their attributesbad_dc.txt
: DC bad characters and their attributescharacters_dc.txt
: DC characters and their attributes (good + bad)comics_dc.txt
: DC comics and their attributesclean_dc.txt
: cleaned DC dataframe (characters + comics)- https://drive.google.com/drive/folders/1fZhjABSlYjy7Pkc2HMnRkctInPVhlnGA?usp=sharing: rest of the data that was too large for git
Adjacency matrices
: adjacency matrices, the file names are adj_attribute_date. Attribute: aff means affiliations, relat means relatives, comics means comic books. Date: 50 means until 1950, 90 means until 1990, no date means all the dataset.CSV
: made from the adjacency matrices to match the Gephi inputs. The name format is the same as the adjacency matrices, replacing adj with edge.
Report
team 28 REPORT.pdf
: final report