calibre python 接口

Calibre提供了一系列的python接口，通过这些接口，可以实现对Calibre数据库的批量操作。

¶安装

在Calibre下载页面下载对应系统的安装包安装即可。安装完成之后，Calibre安装目录会有对应的shell和python库。如果是Linux系统，可以直接通过包管理器安装Calibre。

以Ubuntu为例，安装完毕之后，Calibre相关的文件在/usr/lib/calibre目录下。

root@localhost$ ls -l /usr/lib/calibre/
total 36
drwxr-xr-x 14 root root 4096 Jan 27  2020 calibre
drwxr-xr-x  2 root root 4096 Jan 27  2020 css_selectors
drwxr-xr-x  2 root root 4096 Jan 27  2020 duktape
drwxr-xr-x  8 root root 4096 Jan 27  2020 html5lib
drwxr-xr-x  2 root root 4096 Jan 27  2020 lzma
drwxr-xr-x  2 root root 4096 Jan 27  2020 odf
drwxr-xr-x  2 root root 4096 Jan 27  2020 regex
drwxr-xr-x  2 root root 4096 Jan 27  2020 templite
drwxr-xr-x  3 root root 4096 Jan 27  2020 tinycss

以下的说明都以Ubuntu为例。

¶Calibre shell命令

Calibre提供了一系列的shell命令，而这些命令其实是python脚本。通过阅读官方编写的python脚本，可以学习calibre的python接口。cli-index.html列出了Calibre提供的所有shell指令。我希望通过脚本批量修改书籍的metadata，所以需要关注calibredb命令的实现。

¶Calibre python 接口

calibre提供了python接口的文档，但是感觉不全。与安装目录提供的文件比较，很多东西文档没有介绍。所以不参考Calibre提供的python文档，直接阅读库的源代码。

https://manual.calibre-ebook.com/py-modindex.html

¶calibredb

通过阅读calibredb的帮助网页，发现calibredb实现了对数据库的各种操作。这里我只关心获取metadata以及设置metadata。

list
add
remove
add_format
remove_format
show_metadata
set_metadata
export
catalog
saved_searches
add_custom_column
custom_columns
remove_custom_column
set_custom
restore_database
check_library
list_categories
backup_metadata
clone
embed_metadata
search

¶calibredb命令的实现

通过which命令确定calibredb命令的位置，cat即可。通过阅读发现，calibredb是对cli命令的进一步封装。从脚本也可以看出，calibre所有的python文件都位于/usr/lib/calibre下面（第10行）。

#!/usr/bin/python2.7

"""
This is the standard runscript for all of calibre's tools.
Do not modify it unless you know what you are doing.
"""

import sys, os

path = os.environ.get('CALIBRE_PYTHON_PATH', '/usr/lib/calibre')
if path not in sys.path:
    sys.path.insert(0, path)

sys.resources_location = os.environ.get('CALIBRE_RESOURCES_PATH', '/usr/share/calibre')
sys.extensions_location = os.environ.get('CALIBRE_EXTENSIONS_PATH', '/usr/lib/calibre/calibre/plugins')
sys.executables_location = os.environ.get('CALIBRE_EXECUTABLES_PATH', '/usr/bin')


from calibre.library.cli import main
sys.exit(main())

下面是cli.py中main()函数的实现。不难看出，根据子命令构造出对应的函数名（command变量），然后调用。例如子命令list会调用函数command_list。

def main(args=sys.argv):
    parser = option_parser()
    if len(args) < 2:
        parser.print_help()
        return 1
    if args[1] not in COMMANDS:
        if args[1] == '--version':
            parser.print_version()
            return 0
        parser.print_help()
        return 1

    command = eval('command_'+args[1])
    dbpath = prefs['library_path']

    return command(args[2:], dbpath)

¶编写脚本

¶引入calibre

#!/usr/bin/python2.7
import sys, os

path = os.environ.get('CALIBRE_PYTHON_PATH', '/usr/lib/calibre')
if path not in sys.path:
    sys.path.insert(0, path)

sys.resources_location = os.environ.get('CALIBRE_RESOURCES_PATH', '/usr/share/calibre')
sys.extensions_location = os.environ.get('CALIBRE_EXTENSIONS_PATH', '/usr/lib/calibre/calibre/plugins')
sys.executables_location = os.environ.get('CALIBRE_EXECUTABLES_PATH', '/usr/bin')

from calibre.db.legacy import LibraryDatabase

¶打开数据库

1	db = LibraryDatabase("/path/to/calibre-library")

¶db常用API

更多API请打开文件/usr/lib/calibre/calibre/db/legacy.py查看。

API	说明
db.get_data_as_dict()	返回数据库所有书籍的信息，格式是字典。数据库较大时，此函数非常慢。
db.all_ids()	返回所有书籍的id。相对来说比较快。
db.get_metadata(id, index_is_id=True)	返回指定id的metadata，数据类型是calibre.ebooks.metadata.book.base.Metadata。支持set/get函数。
db.set_metadata(id, metadata, force_changes=True)	更新metadata，第二个参数是get_metadate()的返回值。

get_data_as_dict() 返回的字典包含如下索引：

rating、author_sort、isbn、pubdate、series、fmt_mobi、id、size、uuid、title、comments、languages、sort、tags、timestamp、last_modified、authors、publisher、series_index、identifiers、cover、formats。

¶metadata常用API

每本书的metadata本质上是一个字典，字典包含的索引如下。

NULL_VALUES = {
                'user_metadata': {},
                'cover_data'   : (None, None),
                'tags'         : [],
                'identifiers'  : {},
                'languages'    : [],
                'device_collections': [],
                'author_sort_map': {},
                'authors'      : [_('Unknown')],
                'author_sort'  : _('Unknown'),
                'title'        : _('Unknown'),
                'user_categories' : {},
                'author_link_map' : {},
                'language'     : 'und'
}

calibre.ebooks.metadata.book.base.Metadata常用的API如下，更多信息请阅读文件/usr/lib/calibre/calibre/ebooks/metadata/book/base.py查看。

API	说明
get('index')	获取指定索引的值。
set('index', value)	设置指定索引的值。主要要类型匹配。例如authors的值是一个list，author_sort的值是一个字符串。

¶示例脚本

脚本的作用是简单格式化书籍的作者信息。使用脚本之前，需要更改Calibre数据库的路径。

#!/usr/bin/python2.7
# encoding:utf-8
# -*- coding=UTF-8 -*-

import sys, os, re, time

path = os.environ.get('CALIBRE_PYTHON_PATH', '/usr/lib/calibre')
if path not in sys.path:
    sys.path.insert(0, path)

sys.resources_location = os.environ.get('CALIBRE_RESOURCES_PATH', '/usr/share/calibre')
sys.extensions_location = os.environ.get('CALIBRE_EXTENSIONS_PATH', '/usr/lib/calibre/calibre/plugins')
sys.executables_location = os.environ.get('CALIBRE_EXECUTABLES_PATH', '/usr/bin')

from calibre.db.legacy import LibraryDatabase

db = LibraryDatabase("/path/to/calibre_path")

def format_authors(id):
    metadata = db.get_metadata(id, index_is_id=True)
    old_authors = metadata.get('authors')
    new_authors = []

    print("id = " + str(id) + ", title = " + metadata.get('title') + ", old authors:")
    for author in old_authors:
        print(author)

    # 删除括号中的内容
    for author in old_authors:
        author = re.sub(u"\\(.*?\\)|\\[.*?]|\\（.*?）|\\【.*?】", "", author)
        author = author.replace(u'•', u'·')
        new_authors.append(author)

    # 使用逗号分隔不同作者
    old_authors = new_authors
    new_authors = []
    for author in old_authors:
        split_authors = re.split(u',|、|，', author)
        while '' in split_authors:
            split_authors.remove('')
        new_authors += split_authors

    # 如果所有信息都被删除了，则不更改
    if len(new_authors) == 0:
        return

    # 如果作者信息没有发现变化，则不更改
    old_authors = metadata.get('authors')
    if (old_authors == new_authors):
        return

    print("new authors:")
    for author in new_authors:
        print(author)
    print("")

    metadata.set('authors',     new_authors)
    metadata.set('author_sort', new_authors[0])
    db.set_metadata(id, metadata, force_changes=True)

def test_format_authors(id):
    authors = 'pk'
    metadata = db.get_metadata(id, index_is_id=True)
    authors_list = [authors, "pk2"]
    metadata.set('authors',     authors_list)
    metadata.set('author_sort', authors_list[0])
    print("set " + str(id) + " authors to " + str(authors_list))
    db.set_metadata(id, metadata, force_changes=True)

def main():
    all_ids = db.all_ids()
    for id in all_ids:
        format_authors(id)
        time.sleep(0.2)

if __name__ == '__main__':
    main()