LyCra - Lyrics Crawler

LyCra stands for Lyrics Crawler and provides an environment for crawlers to search for lyrics for songs in the database.

A crawler is a Python script in the crawler directory lib/crawler. It is a derived class from lib.crawlerapi.LycraCrawler as shown in the class diragram below. The name of the crawler file must be exact the same name the class of the crawler has. If the crawlers’ filename is Example.py, the class definition must be class Example(LycraCrawler). The constructor must have one parameter for a LycraDatabase database object.

The class diagram for a crawler:

digraph hierarchy { size="5,5" node[shape=record,style=filled,fillcolor=gray95] edge[dir=back, arrowtail=empty] api [label = "{LycraCrawler|# name\l# version\l# db\l|+ Crawl\l# DoCrawl\l}"] crawler[label = "{ExampleCrawler||/# DoCrawl\l}"] api -> crawler }

A minimal crawler implementation looks like this:

from lib.crawlerapi import LycraCrawler
from lib.db.lycradb import LycraDatabase

class Example(LycraCrawler):
    def __init__(self, db):
        LycraCrawler.__init__(self, db, "Example", "1.0.0")

    def DoCrawl(self, artistname, albumname, songname, songid):
        return False

Lycra API Class

class mdbapi.lycra.Lycra(config)[source]

This class does the main lyrics management.

Parameters

config – MusicDB Configuration object.

Raises

TypeError – when config is not of type MusicDBConfig

CrawlForLyrics(artistname, albumname, songname, songid)[source]

Loads all crawler from the crawler directory via LoadCrawlers() and runs them via RunCrawler().

Parameters
  • artistname (str) – The name of the artist as stored in the music database

  • albumname (str) – The name of the album as stored in the music database

  • songname (str) – The name of the song as stored in the music database

  • songid (int) – The ID of the song to associate the lyrics with the song

Returns

False if something went wrong. Otherwise True. (This is no indication that there were lyrics found!)

GetLyrics(songid)[source]

This method returns the lyrics of a song. See lib.db.lycradb.LycraDatabase.GetLyricsFromCache()

LoadCrawlers()[source]

This method loads all crawlers inside the crawler directory.

Warning

Changes at crawler may not be recognized until the whole application gets restarted. Only new added crawler gets loaded. Already loaded crawler are stuck at Pythons module cache.

Returns

None

RunCrawler(crawler, artistname, albumname, songname, songid)[source]

This method runs a specific crawler. This crawler gets all information available to search for a specific songs lyric.

This method is for class internal use. When using this class, call CrawlForLyrics() instead of calling this method directly. Before calling this method, LoadCrawlers() must be called.

The crawler base class lib.crawlerapi.LycraCrawler catches all exceptions so that they do not net to be executed in an try-except environment.

Parameters
  • crawler (str) – Name of the crawler. If it addresses the file lib/crawler/example.py the name is example

  • artistname (str) – The name of the artist as stored in the MusicDatabase

  • albumname (str) – The name of the album as stored in the MusicDatabase

  • songname (str) – The name of the song as stored in the MusicDatabase

  • songid (int) – The ID of the song to associate the lyrics with the song

Returns

None

Crawler Base Class

class lib.crawlerapi.LycraCrawler(db, name, version)[source]

This is the base class for all crawler.

Parameters
  • db – A LycraDatabase database object.

  • name (str) – Name of the crawler. It should be the same name the class and file have.

  • version (str) – A version number in format major.minor.patchlevel as string. For example "1.0.0"

Raises
  • TypeError – If the db argument is not of type LycraDatabase

  • TypeError – If name or version number are not of type str

Crawl(artistname, albumname, songname, songid)[source]

This method gets called by the lyrics manager mdbapi.lycra.Lycra. It provides a small environment to fit the crawler into MusicDBs infrastructure. It catches exceptions and measures the time the crawler needs to run.

Parameters
  • artistname (str) – The name of the artist as stored in the MusicDatabase

  • albumname (str) – The name of the album as stored in the MusicDatabase

  • songname (str) – The name of the song as stored in the MusicDatabase

  • songid (int) – The ID of the song to associate the lyrics with the song

Returns

True if the crawler found lyrics, otherwise False

DoCrawl(artistname, albumname, songname, songid)[source]

This is the prototype the derived class has to implement for crawling.

Parameters
  • artistname (str) – The name of the artist as stored in the music database

  • albumname (str) – The name of the album as stored in the music database

  • songname (str) – The name of the song as stored in the music database

  • songid (int) – The ID of the song to associate the lyrics with the song

Returns

True if the crawler found lyrics, otherwise False

Lycra Database

Lyrics cache entry:

id

crawler

songid

updatetime

url

lyrics

crawler:

Name of the crawler

updatetime:

Unix timestamp when this crawler entry was updated the last time

url:

URL from that the lyrics were loaded

lyrics:

The lyrics itself. This entry will be compressed using the lib.db.database.Database.Compress() method.

class lib.db.lycradb.LycraDatabase(path)[source]

Derived from lib.db.database.Database.

Parameters

path (str) – Absolute path to the LyCra database file.

Raises

ValueError – When the version of the database does not match the expected version. (Updating MusicDB may failed)

GetLyricsFromCache(songid)[source]

This method returns a list of all entries from the cache, that matches the songid.

Parameters

songid (int) – ID of a song

Returns

A list of entries with lyrics, or None if nothing found.

WriteLyricsToCache(crawler, songid, lyrics, url)[source]

This method writes the lyrics a crawler found into the database. If there is already an entry for the combination of songid and crawler, this entry gets updated.

The lyrics will be compressed.

Parameters
  • crawler (str) – Name of the crawler that found the lyrics

  • songid (int) – ID of the song of that the lyrics are

  • lyrics (str) – The lyrics that shall be stored

  • url (str) – The source of the lyrics

Returns

None