Spotify Network

Your phone is probably not strong enough to handle the graph but your computer will handle it easily :)

Introduction

Playlists on streaming platforms like Spotify play a crucial role in music discovery and dissemination. Songs often move between playlists over time, influenced by curators, trends, and listener engagement. In this project, we construct a directed graph to model how playlists influence each other based on the movement of songs.

By defining influence as a song appearing in one playlist over multiple weeks before eventually being added to another, we create a structure that reveals hidden relationships between playlists. This approach captures how songs "travel" through the ecosystem of curated playlists, forming clusters of related musical tastes and genres.

To enhance understanding of this network, we enriched the graph with metadata from the Spotify API, including playlist names, descriptions, and follower counts. The final visualization, built using D3.js, provides an interactive way to explore these relationships, emphasizing influential playlists and strong connections.

Defining connections between playlists

A directed edge from playlist $A$ to playlist $B$ represents that playlist $A$ has influenced playlist $B$.

This influence is defined as follows:

This suggests that the song may have been added to playlist $B$ due to its presence in playlist $A$. While this assumption is too strong to confirm with certainty, it provides an intuitive way to model how songs spread across playlists, capturing patterns of influence within the network.

From data to visualization

gathering data about the playlists

The initial dataset only included playlist IDs, which provided little context about the playlists in the graph. To make the visualization more informative, I enriched the data by fetching additional details from the Spotify API.

After setting up a Spotify developer account, I wrote a script that iterates through the playlists in the graph and retrieves:

This additional information is displayed as tooltips in the visualization, making the graph more engaging and insightful for exploration.

preprocessing

To refine the graph and ensure that connections between playlists are meaningful, I introduced a threshold based on the number of songs that moved from one playlist to another. Specifically, an edge between two playlists is only added if at least 40 songs transitioned from one playlist to the other by the definition given above.

This threshold was chosen to reduce the overall number of nodes while preserving significant relationships, bringing the total number of playlists in the graph to under 1000.

To further simplify the network and focus on larger clusters of influence, I removed all connected components of size 3 or less. After applying these filtering steps, the final graph consists of 909 playlists, capturing the most relevant interactions within the playlist ecosystem.

Visualization

The playlist network is visualized using D3.js as a force-directed graph, allowing for an interactive exploration of playlist influence.

This visualization helps uncover patterns in how songs spread across playlists, making it easier to identify key influencers in the network.

Findings

The resulting graph, based on these connections, is disconnected, consisting of multiple weakly connected components. These components reveal distinct clusters of playlists that influence each other, reflecting how music spreads within specific genres and themes.

Some of the most notable clusters include:

These clusters offer valuable insights into the way playlists shape music discovery, showing how songs travel within specific communities and listener preferences.

Future work

Building a Data Pipeline for Continuous Updates

Currently, the graph is based on a static snapshot of playlist data. To enhance its usefulness and keep it up to date, a future step would be to develop a data pipeline that continuously integrates new data.

This pipeline could:

By implementing this pipeline, the graph would become a living model of playlist influence, making it even more valuable for studying music discovery and the spread of songs across different communities.

Playlists Rank

Google’s PageRank algorithm revolutionized web search by ranking web pages based on their importance within a network of hyperlinks. A similar approach could be applied to playlists, ranking them based on their influence within the playlist ecosystem.

In this context, a playlist’s influence score would be determined by the number and weight of incoming edges where edges represent songs that have "traveled" from one playlist to another. A playlist that influences many other playlists, or influences highly influential playlists, would receive a higher rank.