Boston.com Big Picture Cataloguer
I’m a big fan of The Boston Globe’s photojournalism series, The Big Picture. So much so, in fact, that I decided to dedicate a few hours this week to building a program that would not just download the entire series, but add caption metadata to each photo, since many are informative and look very nice in Picasa, for example.
Now, I’m happy that the application is stable enough to release to the world in the Code section of my website.
Since I don’t want people to be hammering The Boston Globe’s servers, I’ve made the script wait a fraction of a second between each request, and since I don’t want people to be able to disable this functionality, unfortunately only binaries will be available for the time being. Windows binaries are available already, OS X and Linux binaries to come in a few days.
Indeed, if those at The Boston Globe have a problem with how the program operates, they need simply contact me and we can come to an agreement, but I’ve worked hard to make sure that the program contacts their servers as little as possible.
Bug reports will be automatically submitted through this website too, but if you have any unforeseen problems (e.g. a crash or a hang), email me with as much information as possible (text describing the “Traceback” printed before the crash, what album/photo the program was working on, etc).
What can you do once you’ve got the entire 2GB collection of photos downloaded? Well, you can simply look through them at your own pace and comfort, or indeed choose to create a montage screensaver from them (although be warned – a screensaver that fades from a beautiful Antarctic landscape to a bloody photo of a victim of the war in Afghanistan might not be exactly what you had in mind.)
But in any event, hopefully it’ll be of some use. Enjoy!






BigPicture Cataloguer v0.1
Aengus Walton, ventolin.org
===========================
Please enter a path in which to store the downloaded pictures:
(e.g. C:\bigpictures)
C:\photos\bigpictures
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error getting picture caption… Logged and continuing…
Error extracting image url – probably due to inclusion of youtube video.
Traceback (most recent call last):
File “bigpicture.py”, line 325, in
File “bigpicture.py”, line 254, in downloadAlbum
File “os.pyc”, line 157, in makedirs
WindowsError: [Error 267] The directory name is invalid: u”C:\\photos\\bigpictur
es\\2008/10/Nachtwey’s Wish: Awareness of XDR-TB”
Hi Vikram,
I’ve fixed that (very silly) bug – download version 0.2 and it should work fine. Thanks for the feedback.
[...] This post was mentioned on Twitter by Ty Williams. Ty Williams said: Fan of the #BostonBigPicture ? If so, you should check this app out –> http://bit.ly/d6TBx4 [...]
Hi,
The program crashes on Windows 7. No errors, simply exits and no photos are downloaded. I’ve downloaded the latest version. A quick response will be appreciated.
http://www.instantfundas.com/2010/05/boston-globe-big-picture-gallery.html
(Having had an email conversation with Kaushik while ventolin.org was down for a few hours, he told me that once he ran the program again, it worked fine. No need to worry!)
The most urgent update needed for this program is the ability to select which series one wants to download. I love the Big Picture and regularly save the whole webpage (which includes the captions) but I do it selectively as not everything posted there is of interest.
Hi Amanda,
I’ll see about doing that.. I hadn’t planned on doing anything so involved with the project, really – since hard disk space is so cheap these days, I figured it wouldn’t be too much of a drag to have a few albums (each only a couple of megabytes) which you don’t want. If I were to make selections possible, I have the feeling I’d have to code a GUI in order to make it worthwhile, so users could click a link and preview the album in their browser, and I’m not certain I’ll have the time in the coming weeks or months.
But we’ll see! In the meantime, I hope this suffices.
Hi, Aengus, the task stops here cant proceed, what shall i do next?
d:\bigpicture
Traceback (most recent call last):
File “bigpicture.py”, line 328, in
File “bigpicture.py”, line 172, in getUndoneMonths
File “bigpicture.py”, line 105, in getURLList
File “bigpicture.py”, line 83, in getPage
File “mechanize\_opener.pyc”, line 426, in urlopen
File “mechanize\_opener.pyc”, line 193, in open
File “mechanize\_urllib2_fork.pyc”, line 344, in _open
File “mechanize\_urllib2_fork.pyc”, line 332, in _call_chain
File “mechanize\_urllib2_fork.pyc”, line 1142, in http_open
File “mechanize\_urllib2_fork.pyc”, line 1118, in do_open
urllib2.URLError:
I got this error:
Processing and saving image: /home/fish/Pictures/BigPictures/2008/05/Cassini Nears Four-year Mark/02 – cassini2.jpg
Traceback (most recent call last):
File “bigpicture.py”, line 336, in
downloadAlbum(album)
File “bigpicture.py”, line 277, in downloadAlbum
metadata = pyexiv2.ImageMetadata(path)
AttributeError: ‘module’ object has no attribute ‘ImageMetadata’
Hi,
this occurs if you have the 0.1 branch of pyexiv2 installed.. You’ll need to install the 0.2 version instead.
Cheers,
Aengus
Using e:\Big Pictures as the root dir to store pictures in.
Quiet mode enabled. Beginning immediately.
Traceback (most recent call last):
File “bigpicture.py”, line 336, in
File “bigpicture.py”, line 190, in downloadAlbum
File “bigpicture.py”, line 83, in getPage
File “mechanize\_opener.pyc”, line 426, in urlopen
File “mechanize\_opener.pyc”, line 193, in open
File “mechanize\_urllib2_fork.pyc”, line 344, in _open
File “mechanize\_urllib2_fork.pyc”, line 332, in _call_chain
File “mechanize\_urllib2_fork.pyc”, line 1142, in http_open
File “mechanize\_urllib2_fork.pyc”, line 1118, in do_open
urllib2.URLError:
Hi,
what version are you using (windows binary or source) ? If source, what version of mechanize do you have installed? Are you behind a http proxy?
Aengus
windows binary
no proxy
Hi,
If you are in China, this is most likely caused by the “Great Firewall of China”. To get around this problem, you can download and install Tor from here – http://www.torproject.org/
Install, set it up as your Windows proxy (in ‘Internet Settings’ in the control panel), and then re-run bigpicture.exe.
If torproject.org is also blocked in China, I can host a copy here.
There is a very prominent issue in the metadata tagging.
Whenever a hyperlink or special character(e.g. brackets) comes in the Image caption, the remaining part of the caption is truncated in the metadata. Please fix it since it ruins the full message in the caption.
e.g. in http://www.boston.com/bigpicture/2011/01/haiti_one_year_later.html
For Image 9,10,18,19,20,21,22,23, the metadata contains the following caption: (
For image 24 -”In this photo, a boy takes a picture of an old woman during”
Please fix it, I would be highly grateful.
Tx.
Thanks for reporting this bug. I’ll work on getting a fix out by this weekend.
Hi,
A fixed version (0.5) is now up.
Thanks a lot for fixing this. May god bless you!!
I like this program very much, could you please check it and let it work?
Cheers, Jerry
Hi Jerry, regarding the problem you’re having downloading the ‘Mercury and MESSENGER’ series, have you tried deleting your ‘Mercury and MESSENGER’ folder and re-running the program?
If that doesn’t work, can you provide me with more details – what OS are you using? Are you using the script or executable? What is your internet connection like?
hey there i get this error when trying to complete “The Turkey earthquake” from 2011-10
Processing and saving image: c:\big pictures\2011/10/The Turkey earthquake\28 – bp28.jpg
Traceback (most recent call last):
File “bigpicture.py”, line 304, in
File “bigpicture.py”, line 170, in downloadAlbum
File “bigpicture.py”, line 61, in getPage
File “mechanize\_opener.pyc”, line 426, in urlopen
File “mechanize\_opener.pyc”, line 204, in open
File “mechanize\_urllib2_fork.pyc”, line 457, in http_response
File “mechanize\_opener.pyc”, line 227, in error
File “mechanize\_urllib2_fork.pyc”, line 332, in _call_chain
File “mechanize\_urllib2_fork.pyc”, line 477, in http_error_default
urllib2.HTTPError: HTTP Error 404: Not Found
any ideas on what this is?
I have noticed that from time to time certain older photos in the Big Pictures become unavailable.
Right now, I am seeing the same problem but for a different album:
http://www.boston.com/bigpicture/2010/07/cleaning_dalian_harbor.html
Going there, the pictures are not visible in a Web Browser.
BigPicture Cataloguer v0.5
Aengus Walton, ventolin.org
===========================
Using C:\Big Picture as the root dir to store pictures in.
Do you want to continue? (Y/N): y
Processing and saving image: C:\Big Picture\2010/07/Cleaning Dalian harbor1 -d01_24419667.jpg
Internal Server Error downloading file. Waiting 2 seconds and retrying…
Traceback (most recent call last):
File “bigpicture.py”, line 304, in
File “bigpicture.py”, line 246, in downloadAlbum
File “bigpicture.py”, line 76, in getImage
File “urllib.pyc”, line 237, in retrieve
File “urllib.pyc”, line 205, in open
File “urllib.pyc”, line 360, in open_http
File “urllib.pyc”, line 377, in http_error
File “urllib.pyc”, line 383, in http_error_default
IOError: (‘http error’, 404, ‘Not Found’, )
Hey Stephen,
yep, that’s unfortunately true. Those at the Boston Globe have the strange habit of making photo albums and then deleting them out of the blue, but not deleting links to them.
This is fixed in a new version of the big picture cataloguer which I hope to share soon, once I’ve cleaned up the code a little bit.
If the wait is a problem, I can send you a new build of it by email.
Hello Stephan,
Suddenly the program BigPictures.exe that I downloaded from your website stopped working. Actually it was working until a few days ago but it did not completely download all the images from BG. For example in one folder there should have been 25 inages but only one was downloaded. I tried it several times but there are no more downloads possible. The version I have is 0.5
Can you let me know when the new version that you are cleaning up the code for is being released? I hope this version will alow me to download the missing images.
Thanks,
v.