Six Degrees of Black Sabbath
[tweetmeme source= 'plamere' only_single=false] My hack at last week's Music Hack Day San Francisco was Six Degrees of Black Sabbath - a web app that lets you find connections between artists based on a wide range of artist relations. It is like The Oracle of Bacon for music.
To make the connections between the artists I rely on the relation data from MusicBrainz. MusicBrainz has lots of deep data about how various artists are connected. For instance there are about 130,000 artist-to-artist connections - connections such as:
member of band
is person
personal relationship
parent
sibling
married
involved with
collaboration
supporting musician
vocal supporting musician
instrumental supporting musician
catalogued
So from this data we know that George Harrison and Paul McCartney are related because each was a 'member of the band' of The Beatles. In addition to the artist-to-artist data MusicBrainz has artist-track relations (Eric Clapton played on 'While My Guitar Gently Weeps'), artist-album (Brian Eno produced U2's Joshua Tree), track-track (Girl Talk samples 'Rock You Like A Hurricane' by the Scorpions for the track 'Girl Talk Is Here'). All told there are about 130 different types of relations that can connect two artists.
Not all of these relationships are equally important. Two artists that are members of the same band have a much stronger relationship than an artist that covers another artist. To accommodate this I assign weights to the various different types of relationships - this was perhaps the most tedious and subjective part of building this app.
Once I have all the different types of relations I created a directed graph connecting all of the artists based upon these weighted relationships. The resulting graph has 220K artists connected by over a million edges. Finding a path between a pair of artists is a simple matter of finding the shortest weighted path through the graph.
We can learn a little bit about music by looking at some of the properties of the graph. First of all, the average distance in the graph between any two artists in the graph chosen at random is 7. Some of the top most connected artists along with the number of connections:
5372 Various Artists
696 Linda Ronstadt
611 Diana Ross
560 [traditional]
538 Antonio Vivaldi
534 Jay-Z
494 Giuseppe Verdi
491 Johannes Brahms
490 Bob Dylan
465 The Beatles
442 Aaron Neville
Here we see some of the anomalies in the connection data - any classical performer who performs a piece by Mozart is connected to Mozart - thus the high connectivity counts for classical composers. A more interesting metric is the 'betweeness centrality' - artists that occur on many shortest paths between other artists have higher betweenness than those that do not. Artists with high betweenness centrality are the connecting fibers of the music space. Here are the top connecting artists:
565 Pigface
312 Various Artists
135 Mick Harris
122 Black Sabbath
120 The The
115 Youth
93 Bill Laswell
74 Painkiller
72 F.M. Einheit
71 Napalm Death
63 Flea
60 Material
56 Ginger Baker
56 Mike Patton
54 Johnny Marr
54 Paul Raven
53 Brian Eno
I had never heard of Pigface before I started this project - and was doubtful that they could really be such a connecting node in the world of music - but a look a their wikipedia page makes it instantly clear why they are such a central node - they've had well over a hundred members in the band over their history. Black Sabbath, while not at the top of the list is still extremely well connected.
I wrote the app in python, relying on networkx for the graph building and path finding. The system performs well, even surviving an appearance on the front page of Reddit. It was a fun app to write - and I enjoy seeing all the interesting pathways people have found through the artist space.