导出EPUB格式的O’Reilly图书及解决导出章节不完整问题

买了一个ACM会员,可以访问O’Reilly的在线图书资源。我比较喜欢看head first系列的书。由于国内访问速度不佳,体验不是很好。再加上个人的占有欲,我打算把O’Reilly的书下载到本地看。

Github有一个库safaribooks,恰好满足我的需求。但是我用ACM账号登录时的登录方式为单点登录(SingleSignOn,SSO),不是直接用的oreilly的账号,所以为了让程序能正常运行,需要首先获取oreilly网站的cookie。下面我完整叙述整个流程。 安装safaribooks

$ git clone https://github.com/lorenzodifuccia/safaribooks.git
$ cd safaribooks/
$ pip3 install -r requirements.txt

获取cookie 在浏览器中正常登录oreilly learning,按F12,打开控制台(console),输入如下代码获取cookies:

javascript:(function(){var output = {};document.cookie.split(/\s*;\s*/).forEach(function(pair) {pair = pair.split(/\s*=\s*/);output[pair[0]]=pair.splice(1).join('=');});console.log(JSON.stringify(output));})();

把屏幕上输出的内容复制到文件中,文件名设为cookies.json,将文件放置到safaribooks.py所在文件夹中(即safaribooks/)。 按理说这时直接在终端输入以下命令就能把对应图书导出来了(后面那串数字是图书对应id,可以在浏览器中打开图书页面,从url链接中看到):

python3 safaribooks.py 9781491919521

但是我却收到了一个错误。

[#] Authentication issue: unable to access profile page.                        
[!] Aborting... 

我在该项目的Issues中寻找到的解决方案如下:

# 修改safaribooks.py文件,
# 修改这句代码:PROFILE_URL = SAFARI_BASE_URL + "/profile/"
# 修改为:
PROFILE_URL = SAFARI_BASE_URL + "/home/?next=%2Fprofile%2F"

再次运行命令,然后我就成功导出了格式为epub的图书文件。终端显示的内容如下:

aoyu@Guanghaos-MacBook-Pro safaribooks % python3 safaribooks.py 9781491919521
                                                                                
       ____     ___         _     
      / __/__ _/ _/__ _____(_)    
     _\ \/ _ `/ _/ _ `/ __/ /     
    /___/\_,_/_/ \_,_/_/ /_/      
      / _ )___  ___  / /__ ___    
     / _  / _ \/ _ \/  '_/(_-<    
    /____/\___/\___/_/\_\/___/    

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[-] Successfully authenticated.                                                 
[*] Retrieving book info...                                                     
[-] Title: Head First Python, 2nd Edition                                       
[-] Authors: Paul Barry                                                         
[-] Identifier: 9781491919521                                                   
[-] ISBN: 9781491919538                                                         
[-] Publishers: O'Reilly Media, Inc.                                            
[-] Rights: Copyright © 2016 Paul Barry                                         
[-] Description: Want to learn the Python language without slogging your way through how-to manuals? With Head First Python, you’ll quickly grasp Python’s fundamentals, working with the built-in data structures and functions. Then you’ll move on to building your very own webapp, exploring database management, exception handling, and data wrangling. If you’re intrigued by what you can do with context managers, decorators, comprehensions, and generators, it’s all here. This second edition is a complete learning ex...
[-] Release Date: 2016-11-23                                                    
[-] URL: https://learning.oreilly.com/library/view/head-first-python/9781491919521/
[*] Retrieving book chapters...                                                 
[*] Output directory:                                                           
    /Users/aoyu/Desktop/Oreilly/safaribooks/Books/Head First Python 2nd Edition (9781491919521)
[-] Downloading book contents... (30 chapters)                                  
    [#####################################################################] 100%
[-] Downloading book CSSs... (4 files)                                          
    [#####################################################################] 100%
[-] Downloading book images... (216 files)                                      
    [#####################################################################] 100%
[-] Creating EPUB file...                                                       
[*] Done: /Users/aoyu/Desktop/Oreilly/safaribooks/Books/Head First Python 2nd Edition (9781491919521)/9781491919521.epub

    If you like it, please * this project on GitHub to make it known:
        https://github.com/lorenzodifuccia/safaribooks
    e don't forget to renew your Safari Books Online subscription:
        https://learning.oreilly.com

[!] Bye!!

end.


当然这篇文章我是给自己看的,如果你不小心点了进来,但是还是没看懂该如何用这个程序,建议访问该项目在github的地址,地址我会放在文章最后。 ACM的会员在发展中国家(包括中国)的价格是8美元,换算成人民币是56块多(以前换算汇率都是按6,现在按7了?),如果能完整读完一本英文书,我认为是很划算的。


2019年12月29日补充: 我在阅读的时候发现从第二章开始,每章的内容都只有一部分。我寻找解决方案的过程不再多言。出现这个问题的原因是,我用上面那段代码获取到的cookie是不完整的,缺了几条,导致在导出第一章后面的章节时,身份验证不通过,因此只导出了开头一小部分内容(预览内容)[4]。 我的解决方案是:把漏掉的cookies手动补上去?。根据我的对比,缺了下面三条cookie:

kampyle_userid

orm-rt

groot_sessionid

添加到cookies.json文件末尾,重新运行程序就正常了。 至于说原理,或者说更优雅的解决方案,交给未来的我去考虑吧。现在的我不想关心。

aoyu@Guanghaos-MacBook-Pro safaribooks % python3 safaribooks.py 9781491919521
                                                                                
 ██████╗     ██████╗ ██╗  ██╗   ██╗██████╗ 
██╔═══██╗    ██╔══██╗██║  ╚██╗ ██╔╝╚════██╗
██║   ██║    ██████╔╝██║   ╚████╔╝   ▄███╔╝
██║   ██║    ██╔══██╗██║    ╚██╔╝    ▀▀══╝ 
╚██████╔╝    ██║  ██║███████╗██║     ██╗   
 ╚═════╝     ╚═╝  ╚═╝╚══════╝╚═╝     ╚═╝                                           

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[-] Successfully authenticated.                                                 
[*] Retrieving book info...                                                     
[-] Title: Head First Python, 2nd Edition                                       
[-] Authors: Paul Barry                                                         
[-] Identifier: 9781491919521                                                   
[-] ISBN: 9781491919538                                                         
[-] Publishers: O'Reilly Media, Inc.                                            
[-] Rights: Copyright © 2016 Paul Barry                                         
[-] Description: Want to learn the Python language without slogging your way through how-to manuals? With Head First Python, you’ll quickly grasp Python’s fundamentals, working with the built-in data structures and functions. Then you’ll move on to building your very own webapp, exploring database management, exception handling, and data wrangling. If you’re intrigued by what you can do with context managers, decorators, comprehensions, and generators, it’s all here. This second edition is a complete learning ex...
[-] Release Date: 2016-11-23                                                    
[-] URL: https://learning.oreilly.com/library/view/head-first-python/9781491919521/
[*] Retrieving book chapters...                                                 
[*] Output directory:                                                           
    /Users/aoyu/Desktop/Oreilly/safaribooks/Books/Head First Python 2nd Edition (9781491919521)
[-] Downloading book contents... (30 chapters)                                  
    [#####################################################################] 100%
[-] Downloading book CSSs... (4 files)                                          
    [#####################################################################] 100%
[-] Downloading book images... (1009 files)                                     
    [#####################################################################] 100%
[-] Creating EPUB file...                                                       
[*] Done: /Users/aoyu/Desktop/Oreilly/safaribooks/Books/Head First Python 2nd Edition (9781491919521)/9781491919521.epub

    If you like it, please * this project on GitHub to make it known:
        https://github.com/lorenzodifuccia/safaribooks
    e don't forget to renew your Safari Books Online subscription:
        https://learning.oreilly.com

[!] Bye!!

参考资料 [1] safaribooks项目地址 https://github.com/lorenzodifuccia/safaribooks
[2] Issue#1 https://github.com/lorenzodifuccia/safaribooks/issues/160
[3] Issue#2 https://github.com/lorenzodifuccia/safaribooks/issues/2
[4] Issue#3 https://github.com/lorenzodifuccia/safaribooks/issues/150