Sredzkistraße

  • Home
  • About
  • Academic
  • Code
12 May 2010

Boston.com Big Picture Cataloguer

I’m a big fan of The Boston Globe’s photojournalism series, The Big Picture. So much so, in fact, that I decided to dedicate a few hours this week to building a program that would not just download the entire series, but add caption metadata to each photo, since many are informative and look very nice in Picasa, for example.

Now, I’m happy that the application is stable enough to release to the world in the Code section of my website.

Since I don’t want people to be hammering The Boston Globe’s servers, I’ve made the script wait a fraction of a second between each request, and since I don’t want people to be able to disable this functionality, unfortunately only binaries will be available for the time being. Windows binaries are available already, OS X and Linux binaries to come in a few days.

Indeed, if those at The Boston Globe have a problem with how the program operates, they need simply contact me and we can come to an agreement, but I’ve worked hard to make sure that the program contacts their servers as little as possible.

Bug reports will be automatically submitted through this website too, but if you have any unforeseen problems (e.g. a crash or a hang), email me with as much information as possible (text describing the “Traceback” printed before the crash, what album/photo the program was working on, etc).

What can you do once you’ve got the entire 2GB collection of photos downloaded? Well, you can simply look through them at your own pace and comfort, or indeed choose to create a montage screensaver from them (although be warned – a screensaver that fades from a beautiful Antarctic landscape to a bloody photo of a victim of the war in Afghanistan might not be exactly what you had in mind.)

But in any event, hopefully it’ll be of some use. Enjoy!

This entry was posted on Wednesday, May 12th, 2010 at 8:02 pm and is filed under Art, Code, Computers, Internet, Media, Photography, Politics. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

24 Responses to “Boston.com Big Picture Cataloguer”

  1. avatar Vikram says:
    May 13, 2010 at 2:40 am

    BigPicture Cataloguer v0.1
    Aengus Walton, ventolin.org
    ===========================
    Please enter a path in which to store the downloaded pictures:
    (e.g. C:\bigpictures)
    C:\photos\bigpictures
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error getting picture caption… Logged and continuing…
    Error extracting image url – probably due to inclusion of youtube video.
    Traceback (most recent call last):
    File “bigpicture.py”, line 325, in
    File “bigpicture.py”, line 254, in downloadAlbum
    File “os.pyc”, line 157, in makedirs
    WindowsError: [Error 267] The directory name is invalid: u”C:\\photos\\bigpictur
    es\\2008/10/Nachtwey’s Wish: Awareness of XDR-TB”

  2. avatar aengus says:
    May 13, 2010 at 12:56 pm

    Hi Vikram,

    I’ve fixed that (very silly) bug – download version 0.2 and it should work fine. Thanks for the feedback.

  3. avatar Tweets that mention Sredzkistraße – Boston.com Big Picture Cataloguer -- Topsy.com says:
    May 14, 2010 at 5:44 am

    [...] This post was mentioned on Twitter by Ty Williams. Ty Williams said: Fan of the #BostonBigPicture ? If so, you should check this app out –> http://bit.ly/d6TBx4 [...]

  4. avatar Kaushik says:
    May 17, 2010 at 7:14 am

    Hi,

    The program crashes on Windows 7. No errors, simply exits and no photos are downloaded. I’ve downloaded the latest version. A quick response will be appreciated.

    http://www.instantfundas.com/2010/05/boston-globe-big-picture-gallery.html

  5. avatar aengus says:
    May 17, 2010 at 6:32 pm

    (Having had an email conversation with Kaushik while ventolin.org was down for a few hours, he told me that once he ran the program again, it worked fine. No need to worry!)

  6. avatar Ananda says:
    May 20, 2010 at 1:25 am

    The most urgent update needed for this program is the ability to select which series one wants to download. I love the Big Picture and regularly save the whole webpage (which includes the captions) but I do it selectively as not everything posted there is of interest.

  7. avatar aengus says:
    May 20, 2010 at 1:38 am

    Hi Amanda,
    I’ll see about doing that.. I hadn’t planned on doing anything so involved with the project, really – since hard disk space is so cheap these days, I figured it wouldn’t be too much of a drag to have a few albums (each only a couple of megabytes) which you don’t want. If I were to make selections possible, I have the feeling I’d have to code a GUI in order to make it worthwhile, so users could click a link and preview the album in their browser, and I’m not certain I’ll have the time in the coming weeks or months.

    But we’ll see! In the meantime, I hope this suffices.

  8. avatar alick says:
    May 25, 2010 at 3:10 am

    Hi, Aengus, the task stops here cant proceed, what shall i do next?

    d:\bigpicture
    Traceback (most recent call last):
    File “bigpicture.py”, line 328, in
    File “bigpicture.py”, line 172, in getUndoneMonths
    File “bigpicture.py”, line 105, in getURLList
    File “bigpicture.py”, line 83, in getPage
    File “mechanize\_opener.pyc”, line 426, in urlopen
    File “mechanize\_opener.pyc”, line 193, in open
    File “mechanize\_urllib2_fork.pyc”, line 344, in _open
    File “mechanize\_urllib2_fork.pyc”, line 332, in _call_chain
    File “mechanize\_urllib2_fork.pyc”, line 1142, in http_open
    File “mechanize\_urllib2_fork.pyc”, line 1118, in do_open
    urllib2.URLError:

  9. avatar fish says:
    May 25, 2010 at 5:19 am

    I got this error:
    Processing and saving image: /home/fish/Pictures/BigPictures/2008/05/Cassini Nears Four-year Mark/02 – cassini2.jpg
    Traceback (most recent call last):
    File “bigpicture.py”, line 336, in
    downloadAlbum(album)
    File “bigpicture.py”, line 277, in downloadAlbum
    metadata = pyexiv2.ImageMetadata(path)
    AttributeError: ‘module’ object has no attribute ‘ImageMetadata’

  10. avatar aengus says:
    May 25, 2010 at 5:59 am

    Hi,

    this occurs if you have the 0.1 branch of pyexiv2 installed.. You’ll need to install the 0.2 version instead.

    Cheers,
    Aengus

  11. avatar cngdxd says:
    May 26, 2010 at 4:24 am

    Using e:\Big Pictures as the root dir to store pictures in.
    Quiet mode enabled. Beginning immediately.
    Traceback (most recent call last):
    File “bigpicture.py”, line 336, in
    File “bigpicture.py”, line 190, in downloadAlbum
    File “bigpicture.py”, line 83, in getPage
    File “mechanize\_opener.pyc”, line 426, in urlopen
    File “mechanize\_opener.pyc”, line 193, in open
    File “mechanize\_urllib2_fork.pyc”, line 344, in _open
    File “mechanize\_urllib2_fork.pyc”, line 332, in _call_chain
    File “mechanize\_urllib2_fork.pyc”, line 1142, in http_open
    File “mechanize\_urllib2_fork.pyc”, line 1118, in do_open
    urllib2.URLError:

  12. avatar aengus says:
    May 26, 2010 at 6:19 am

    Hi,
    what version are you using (windows binary or source) ? If source, what version of mechanize do you have installed? Are you behind a http proxy?

    Aengus

  13. avatar cngdxd says:
    May 27, 2010 at 10:50 am

    windows binary
    no proxy

  14. avatar aengus says:
    May 27, 2010 at 11:06 am

    Hi,

    If you are in China, this is most likely caused by the “Great Firewall of China”. To get around this problem, you can download and install Tor from here – http://www.torproject.org/

    Install, set it up as your Windows proxy (in ‘Internet Settings’ in the control panel), and then re-run bigpicture.exe.

    If torproject.org is also blocked in China, I can host a copy here.

  15. avatar VeryImp says:
    January 14, 2011 at 12:34 pm

    There is a very prominent issue in the metadata tagging.
    Whenever a hyperlink or special character(e.g. brackets) comes in the Image caption, the remaining part of the caption is truncated in the metadata. Please fix it since it ruins the full message in the caption.
    e.g. in http://www.boston.com/bigpicture/2011/01/haiti_one_year_later.html
    For Image 9,10,18,19,20,21,22,23, the metadata contains the following caption: (
    For image 24 -”In this photo, a boy takes a picture of an old woman during”

    Please fix it, I would be highly grateful.
    Tx.

  16. avatar aengus says:
    January 19, 2011 at 11:26 pm

    Thanks for reporting this bug. I’ll work on getting a fix out by this weekend.

  17. avatar aengus says:
    January 23, 2011 at 9:01 pm

    Hi,

    A fixed version (0.5) is now up.

  18. avatar VeryImp says:
    March 11, 2011 at 7:17 pm

    Thanks a lot for fixing this. May god bless you!!

  19. avatar Jerry T says:
    April 9, 2011 at 3:23 am

    I like this program very much, could you please check it and let it work?

    Cheers, Jerry

  20. avatar aengus says:
    May 4, 2011 at 9:17 am

    Hi Jerry, regarding the problem you’re having downloading the ‘Mercury and MESSENGER’ series, have you tried deleting your ‘Mercury and MESSENGER’ folder and re-running the program?

    If that doesn’t work, can you provide me with more details – what OS are you using? Are you using the script or executable? What is your internet connection like?

  21. avatar ben says:
    January 2, 2012 at 12:44 am

    hey there i get this error when trying to complete “The Turkey earthquake” from 2011-10

    Processing and saving image: c:\big pictures\2011/10/The Turkey earthquake\28 – bp28.jpg
    Traceback (most recent call last):
    File “bigpicture.py”, line 304, in
    File “bigpicture.py”, line 170, in downloadAlbum
    File “bigpicture.py”, line 61, in getPage
    File “mechanize\_opener.pyc”, line 426, in urlopen
    File “mechanize\_opener.pyc”, line 204, in open
    File “mechanize\_urllib2_fork.pyc”, line 457, in http_response
    File “mechanize\_opener.pyc”, line 227, in error
    File “mechanize\_urllib2_fork.pyc”, line 332, in _call_chain
    File “mechanize\_urllib2_fork.pyc”, line 477, in http_error_default
    urllib2.HTTPError: HTTP Error 404: Not Found

    any ideas on what this is?

  22. avatar Stephen says:
    February 7, 2012 at 5:08 pm

    I have noticed that from time to time certain older photos in the Big Pictures become unavailable.

    Right now, I am seeing the same problem but for a different album:
    http://www.boston.com/bigpicture/2010/07/cleaning_dalian_harbor.html

    Going there, the pictures are not visible in a Web Browser.

    BigPicture Cataloguer v0.5
    Aengus Walton, ventolin.org
    ===========================
    Using C:\Big Picture as the root dir to store pictures in.
    Do you want to continue? (Y/N): y
    Processing and saving image: C:\Big Picture\2010/07/Cleaning Dalian harbor1 -d01_24419667.jpg
    Internal Server Error downloading file. Waiting 2 seconds and retrying…
    Traceback (most recent call last):
    File “bigpicture.py”, line 304, in
    File “bigpicture.py”, line 246, in downloadAlbum
    File “bigpicture.py”, line 76, in getImage
    File “urllib.pyc”, line 237, in retrieve
    File “urllib.pyc”, line 205, in open
    File “urllib.pyc”, line 360, in open_http
    File “urllib.pyc”, line 377, in http_error
    File “urllib.pyc”, line 383, in http_error_default
    IOError: (‘http error’, 404, ‘Not Found’, )

  23. avatar aengus says:
    February 7, 2012 at 5:12 pm

    Hey Stephen,

    yep, that’s unfortunately true. Those at the Boston Globe have the strange habit of making photo albums and then deleting them out of the blue, but not deleting links to them.

    This is fixed in a new version of the big picture cataloguer which I hope to share soon, once I’ve cleaned up the code a little bit.

    If the wait is a problem, I can send you a new build of it by email.

  24. avatar V. says:
    April 18, 2012 at 11:51 pm

    Hello Stephan,
    Suddenly the program BigPictures.exe that I downloaded from your website stopped working. Actually it was working until a few days ago but it did not completely download all the images from BG. For example in one folder there should have been 25 inages but only one was downloaded. I tried it several times but there are no more downloads possible. The version I have is 0.5
    Can you let me know when the new version that you are cleaning up the code for is being released? I hope this version will alow me to download the missing images.
    Thanks,
    v.

Leave a Reply

Click here to cancel reply.

« Innovative Art
Big Picture Cataloguer: An update »
  • In My Ears

    • Cover artwork for The Central Scrutinizer
      The Central Scrutinizer
      Frank Zappa
      37 minutes ago
    • Cover artwork for Good Clean Fun
      Good Clean Fun
      Cat Power
      44 minutes ago
    • Cover artwork for Once We Walked In The Sunlight
      Once We Walked In The Sunlight
      Papercuts
      1 hours and 28 minutes ago
    • Cover artwork for Carbonated
      Carbonated
      Mount Kimbie
      1 hours and 36 minutes ago
    • Cover artwork for Field
      Field
      Mount Kimbie
      1 hours and 42 minutes ago
  • CATEGORIES

    • America (10)
    • Art (50)
      • Architecture (11)
      • Design (8)
      • Photography (9)
    • Computers (18)
      • Code (8)
      • Computer Games (1)
      • Computer Science (5)
      • Cryptography (1)
      • Robotics (2)
    • Digital Rights (3)
    • Drink (1)
    • Film (8)
      • Animation (2)
      • Documentary (1)
      • Short Film (4)
    • Funny (15)
    • Gay Rights (3)
    • Germany (9)
      • Berlin (5)
      • German Language (3)
    • Guns (1)
    • History (1)
    • Idiots (11)
    • India (1)
    • Internet (24)
    • Ireland (9)
      • Irish Language (2)
      • The Troubles (1)
    • Israel / Palestine Conflict (2)
    • Media (12)
      • News (9)
      • TV (1)
    • Music (32)
      • Bad Music™ (4)
      • Downloads (3)
      • Electronic (8)
      • Experimental (6)
      • Free music (2)
      • Generative Music (1)
      • Jazz (1)
      • Live (4)
      • Music theory (2)
      • Videos (3)
    • Oddities (29)
    • Politics (29)
      • Censorship (5)
      • Far-right (1)
    • Religion (2)
    • Science (5)
    • Sports (1)
      • Football (1)
    • War (5)
    • Words (17)
      • Linguistics (7)
      • Literature (2)
      • Poetry (4)
  • Friends' Blogs

    • Jaded Isle
    • johnl.org
    • jonathan.beaton
    • Kay Doubleu
    • King Lud’s Revenge
    • Perte de Temps
  • My Other Websites

    • The Wisp Archive
  • META

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org
Avatars by Sterling Adventures
Creative Commons License
Sredzkistraße is proudly powered by WordPress
Design & code by Jonk, modified for Sredzkistraße by aengus.
Entries (RSS) and Comments (RSS).