Sunday, April 6, 2014

thoughts on a music feed aggregator

I've been toying with some ideas on music aggregation.

The problem: I don't listen to music I can't share with my children without fear of violating copyright. This means I listen to a lot of CC licensed music, a lot of music contest websites, and a lot of very independent one-person bands.

I've been thinking that the music aggregator should support distributed operation, so in addition to consuming media feeds that others produce, it must be able to consume its own output.

With that, a lot of normal product assumptions regarding music consumption are destroyed.

Use-Case: When downloading tracks from a contest website some/all of the metadata embedded in the tracks may be less valid than either the metadata from the feed it is found in or the metadata available in the HTML page it is sourced from.

Use-Case: Artist uploads track for contest and uses same track for their album. The music data is the same, but the metadata is changed.

Use-Case: Artist evolves their name over time without having any clear breaks. (Example: "ADD Music" becomes "ADD" later on.)

Use-Case: Artist uploads the track and changes the name of the album before it is finished and officially released.

I've been looking at playlist formats as the basis the information exchange format. A lot of them are light on the metadata.

Something like the format used by MusicBrainz might seem ideal. It's a self-contained format dedicated to distributing music metadata... the problem, however, is that every entity (be it artist, album, track, etc) has a GUID assigned by MusicBrainz. Moreover, while MusicBrainz reliably gets finished releases (even electronic ones via BandCamp), they don't seem particularly designed for demos and ad hoc music contests... so it becomes a format that should be supported in the future, but not one suitable for the primary interchange format. (Though there do appear to be Python bindings.)

Use-Case: Artist picked a well-established name, not realizing it was already taken. The two bands/artists have the same name, but are wildly different.

Use-Case: There are a lot of duplicated album names with different Album Artists. (The simple "Compilation Flag" of iTunes breaks.) This can be done on purpose as an homage to another artist, or an accident.

Use-Case: The fantastic Song Fight! contest uses the same song title for all contributed songs in the current contest. Anything expecting a song title alone to be meaningful is broken.

I had previously seen the page mentioning Playlists on Wikipedia. It was revisiting this that I saw references to SMIL.

I also ran across a comparison of various playlist formats. The context of this data was specifically remote playlists, so it was the sort of thing I was looking for, but old data. It was missing XSPF. It covered the idea of using a metadata-heavy markup and hijacking it for playlist purposes. I'll get back to that.

It (and links related to it) indicated that SMIL was once considered an attractive format for remote playlists. While tracking down a lead from SMIL research, I found the Playr bookmarklets and references to the service previously supporting SMIL. At this point, I found that perhaps SMIL isn't really a favorite any longer. XSPF, however, is available directly, and is used by the (open-source) Flash player.

Regarding SMIL, the possibilities for doing karaoke with SMIL look really nice. In fact, integrating a video with a presentation sort of thing looks straight-forward. I did a little investigation, and while complete support for SMIL is somewhat rare, there appear to be a ton of applications that support DAISY Talking Books, which is a subset of SMIL. To that end, I've installed Obi, an application to easily create them. (A good test-case for this is Josh Woodard's music, as he makes vocal, instrumental, and lyrics available for all his stuff.) Not a high priority, but it periodically comes up.

The idea of the metadata-heavy markup, led to the Dublin Core Metadata Initiative, the Music Ontology (and Omras2), OWL, and LinkedDataTools. Bonus: Since I'm all about CC licensed music, the license can be encoded in the documents using CC REL.

Of course, I reminded myself why RSS stinks and why Atom 1.0 is the way to go, syndication-wise. I looked at django-planet as a source of ideas for how to get started.

That's basically my brain-dump for now.