Intro

I love keeping track of what I listen or watch. That’s why I like using websites LastFM and/or ListenBrainz for scrobbling (sending listening information on particular timestamp). It’s pretty simple concept and wherever you have the song information available, a simple web request to servers can mark a song listened at that particular time. Later on, you can view/analyze your own data to see your own listerning trends, create charts/plots on song/artist or genre specific.

I also like listening radio like crafted livestreams on YouTube. One of the most known example is LofiGirl (previously known as ChilledCow). The YouTube channel is running for a couple of years now and even though they have playlists on Spotify or streaming platforms, I still open up the YouTube stream sometimes and leave it on for the curated list to run as an infinite lost. Well, the problem is you cannot scrobble the songs from this content directly. Since it is a livestream, the song information is not directly on the title of the video since it’s changing and youtube title’s are not that dynamic. However the current song and its artist information hardcoded the the video as a text on top of the screen like this;

Example image

So the question is, can we extract the wanted information from the video periodically, OCR them into text and scrobble when it’s wanted?

Extract the song information from the livestream

Get the raw video link

If we take the Youtube livestream link as a given, there are various ways to get the real video link from it. The most well known example would be famous (or infamous depending on the perspective) youtube-dl. We can retrieve the raw video footage on certain intervals and extract a single frame.

Since I like using Rust, I would like to try my best to have the whole system in Rust. So, I found one particular crate which gives me the exact functionality for youtube raw video capture. It’s rustube and it is pretty straight-forward to get the raw video link from here.

use rustube::VideoFetcher;
pub async fn get_raw_link(url: &Url) -> Result<String> {
        let raw_link = VideoFetcher::from_url(url)?
            .fetch()
            .await?
            .descramble()?
            .best_video()
            .ok_or(RustubeError::YoutubeLinkCaptureError)?
            .signature_cipher
            .url
            .to_string();
        Ok(raw_link)
}

Capture a frame and process it

To get a single frame from the raw video we can use ffmpeg or if we would like to do more with the image we can try using something more concrete and heavy like opencv.

I went for the second route since we may need to process the extracted images to aid OCR process. I won’t be adding the code snippets of opencv stuff here but the whole process can be seen in the project’s image processing module here

Using opencv bindings for Rust, we can use the VideoCapturer to get a single frame every read.

Afterwards, we can crop the image with a specified region of interest to only to have the top part.

Example image - cropped

Since we are using Opencv, we might as well use some advance features such as masking to make our job on OCR easier. Masking on color to only permit very bright white parts, we can achieve the final image before OCR.

Example image - masked

OCR

From this point, the image looks ready to be supplied to OCR engine. I used tesseract-ocr bindings for rust. Then, we can split the song and the artist from the converted text.

Submitting to LastFM or ListenBrainz

Using LastFM or ListenBrainz APIs are pretty straight forward thanks to the community crates which captures their API, (except things can be more complicated with more secure LastFM authentication but we will get into that later). We can create a really simple scrobbling system taking the read song information.

Example scrobble - LastFM

Example scrobble - ListenBrainz

As you can see, we can also support the playing now APIs which both scrobbling system offer.

We can stop here, but why should we?

Sending the the song information to multiple users

The image processing part is the most expensive part of this system. If we would like to run this application on very simple systems we might think of improvements on the system. For instance, we can separate the image processing and scrobbling business so that, we might pulse the song information live to anyone who is willing to retrieve it. This server/client side system basically will give the heavy responsibility to the server side while scrobbling can happen on client side.

To do so, we can create a web API for our server module and client module can periodically talk to the specific end point.

Going on with rust, I decided to use actix-web (specifically 4.0 which is currently on beta at the moment and compatible with tokio 1.x runtime) on server side.

For the client side application we can use reqwest.

Shared API structs

Since the whole system in the same language ecosystem, we can extract the common used APIs into a separate crate and reuse them on different modules. Any change we apply to the common API would automatically effect both server backend and client frontend without too much change.

On a specific endpoint, we can use this payload in json serialized form for a simple scrobble request where track includes artist and song and action is just an enum of scrobbled or playing now.

#[derive(Debug, Serialize, Deserialize)]
pub struct ScrobbleRequest {
    pub token: String,
    pub action: Action,
    pub track: Track,
}

Now our image processing with OCR capable server is separated and multiple client can come and go whenever they want to retrieve the song infomration as long as the server is running.

Containerize the system

The dependencies for opencv and tesseract can be quite cumbersome. Certain minor version changes or simple environment variable problems can prevent an end user to easily use to system as well. The solution is to create a container of the system. I’m not an expert on docker systems so I used the archlinux base image since I tested the system on archlinux. I believe using an alpine base image can be more space efficient.

You can see the dockerfile in the github repo here.

We can stop here as well? But is there anything we can do more?

Bare-bones web frontend and sessions

How about we make our client compiled to web-assembly and give all the responsibility to the server side including scrobbling.

To achieve this, we possibly need to store credentials and give sessions to the web clients.

Sessions

Security-wise, ListenBrainz is already well constructed we might say. It supplies its users a UUID token which can be reverted anytime.

LastFM authentications are more delicate in that regard. There is a web-authentication option but that removes the possibility of a headless bare-bones client. So, we still want to use password authentication. However, we don’t want to store the client username/password on our server side so we can use the session key lastfm provides from a login.

To store the listenbrainz token and/or lastfm session key, we need a simple db on our server side. For this purpose, a simple SQLite db would be suffice since I’m not looking forward to scale this system on a large userbase.

There are options to use SQLite for Rust. But I wanted to have the async runtime and have never actually tried before SQLX. SQLX offers compile-time checked queries without a DSL and that looked like something pretty awesome to try and see with my own eyes.

SQLX can use actix-web runtime with no hassle and to be able to have compile time queries to work DATABASE_URL environment variable is enough. A simple migration sql script with an empty db is enough to check the validity of SQL queries.

Web frontend compiled to web-assembly

I still want to stay in Rust ecosystem and I know there are a couple of options for frontend such as yew or seed. I decided to give a seed a try and it’s pretty straight forward as well. I’m not too familiar with elm architecture and apparently seed uses a system like this but, surprisingly I actually like the concept quite a lot for a first time user. I assume this system may lead some unnecessary cloning during message passing but still, it creates a very good abstraction while restricting the possibilities.

You can see the details on the web-client here.

Also I put I’m hosting the client here on this website as well.

The full source code the the project is on github.

Final words

Now, I can easily scrobble the songs I’m listening from the lofigirl livestream using docker/podman on my cloud server and a browser.

This wasn’t super complex project in its core considering all the dependencies I’m using. However, it was nice to see that using one language ecosystem, someone can create a toy system with interactive modules on a weekend almost. It was nice to use all the interesting technologies I was following but didn’t have the time or use to apply on my research.

Scrobbling from live YouTube Content