Introduction
Playlists on streaming platforms like Spotify play a crucial role in music discovery and dissemination. Songs often move between playlists over time, influenced by curators, trends, and listener engagement. In this project, we construct a directed graph to model how playlists influence each other based on the movement of songs.
By defining influence as a song appearing in one playlist over multiple weeks before eventually being added to another, we create a structure that reveals hidden relationships between playlists. This approach captures how songs "travel" through the ecosystem of curated playlists, forming clusters of related musical tastes and genres.
To enhance understanding of this network, we enriched the graph with metadata from the Spotify API, including playlist names, descriptions, and follower counts. The final visualization, built using D3.js, provides an interactive way to explore these relationships, emphasizing influential playlists and strong connections.
Defining connections between playlists
A directed edge from playlist $A$ to playlist $B$ represents that playlist $A$ has influenced playlist $B$.
This influence is defined as follows:
- A song appears in playlist $A$ but not in playlist $B$ in week $x$.
- The song remains in playlist $A$ every week from $x$ to $x + n$ for some $n \in \mathbb{N}$.
- The song then appears in playlist $B$ in week $x + n$.
This suggests that the song may have been added to playlist $B$ due to its presence in playlist $A$. While this assumption is too strong to confirm with certainty, it provides an intuitive way to model how songs spread across playlists, capturing patterns of influence within the network.
From data to visualization
gathering data about the playlists
The initial dataset only included playlist IDs, which provided little context about the playlists in the graph. To make the visualization more informative, I enriched the data by fetching additional details from the Spotify API.
After setting up a Spotify developer account, I wrote a script that iterates through the playlists in the graph and retrieves:
- Playlist name – to give each node meaningful labels.
- Description – to provide context about the playlist’s theme or purpose.
- Number of followers – to highlight the popularity and influence of a playlist.
This additional information is displayed as tooltips in the visualization, making the graph more engaging and insightful for exploration.
preprocessing
To refine the graph and ensure that connections between playlists are meaningful, I introduced a threshold based on the number of songs that moved from one playlist to another. Specifically, an edge between two playlists is only added if at least 40 songs transitioned from one playlist to the other by the definition given above.
This threshold was chosen to reduce the overall number of nodes while preserving significant relationships, bringing the total number of playlists in the graph to under 1000.
To further simplify the network and focus on larger clusters of influence, I removed all connected components of size 3 or less. After applying these filtering steps, the final graph consists of 909 playlists, capturing the most relevant interactions within the playlist ecosystem.
Visualization
The playlist network is visualized using D3.js as a force-directed graph, allowing for an interactive exploration of playlist influence.
- Node size represents a playlist’s influence—larger nodes indicate playlists that have influenced many others.
- Edge opacity corresponds to the strength of the connection—the more songs that traveled from one playlist to another, the more visible the link.
This visualization helps uncover patterns in how songs spread across playlists, making it easier to identify key influencers in the network.
Findings
The resulting graph, based on these connections, is disconnected, consisting of multiple weakly connected components. These components reveal distinct clusters of playlists that influence each other, reflecting how music spreads within specific genres and themes.
Some of the most notable clusters include:
- EDM – Playlists featuring electronic dance music, often influencing each other as new tracks gain traction.
- Star Wars Soundtrack – A niche group of playlists dedicated to the music of Star Wars, showing how soundtrack fans curate and share music.
- K-pop – A tightly connected network of playlists centered around Korean pop music, highlighting how songs move within the global K-pop community.
- Gym & Workout Playlists – High-energy tracks frequently appearing in fitness-oriented playlists, emphasizing the role of music in exercise motivation.
- Country – A strong cluster of country music playlists, reflecting how tracks circulate within the genre’s fanbase.
These clusters offer valuable insights into the way playlists shape music discovery, showing how songs travel within specific communities and listener preferences.
Future work
Building a Data Pipeline for Continuous Updates
Currently, the graph is based on a static snapshot of playlist data. To enhance its usefulness and keep it up to date, a future step would be to develop a data pipeline that continuously integrates new data.
This pipeline could:
- Automate data collection by periodically fetching updated playlist information from the Spotify API.
- Track playlist changes over time, ensuring that new song movements and playlist influences are captured dynamically.
- Incrementally update the graph, avoiding the need for a full rebuild while keeping insights relevant.
- Store historical playlist transitions, allowing for trend analysis and better understanding of how influence evolves over time.
By implementing this pipeline, the graph would become a living model of playlist influence, making it even more valuable for studying music discovery and the spread of songs across different communities.
Playlists Rank
Google’s PageRank algorithm revolutionized web search by ranking web pages based on their importance within a network of hyperlinks. A similar approach could be applied to playlists, ranking them based on their influence within the playlist ecosystem.
In this context, a playlist’s influence score would be determined by the number and weight of incoming edges where edges represent songs that have "traveled" from one playlist to another. A playlist that influences many other playlists, or influences highly influential playlists, would receive a higher rank.