Sredzkistraße

  • Home
  • About
  • Academic
  • Code

Archive for the ‘Internet’ Category

You are currently browsing the archives for the Internet category.

9 Jan 2012

@grammer_man who the fuck is this nigga and why u comin at me like that #Hoeassnigga

Had a spare hour last Thursday and decided to write a little twitter bot. There he is above. His name is Grammer_Man and he corrects other twitter users’ misspellings, using data scraped from these Wikipedia pages.

Responses have been pouring in already, some agitated, some confused, but most positive — which was a pleasant surprise. In any event, the minimal amount of effort in coding has paid off many times over in entertainment.

You can see who’s responding at the moment by searching for @grammer_man, and also by checking his list of favourites.

Here is the (somewhat slapdash) code that powers our fearless spelling Nazi:

grabber.py

This module grabs the spelling data from Wikipedia.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import pickle
import requests
from BeautifulSoup import BeautifulSoup
def grab(letter):
    '''
    Grabs spellings from wikipedia
    '''
    url = 'http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/%s' % letter
    html = requests.get(url).content
    soup = BeautifulSoup(html)
    bullets = soup.findAll('li')
    retval = {}
    for bullet in bullets:
        if 'plainlinks' in repr(bullet):
            values = bullet.text.split('(')
            if len(values) == 2:
                retval[values[0]] = values[1][:-1] # shave off the ) at end
    return retval
def get_spellings():
    '''
    Returns a dictionary of {false: correct} spellings
    '''
    if not os.path.exists('words.pkl'):
        retval = {}
        for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
            print 'Getting typos - %s' % c
            retval.update(grab(c))
        print 'Dumping...'
        f = open('words.pkl', 'w')
        pickle.dump(retval, f)
        f.close()
        return retval
    else:
        f = open('words.pkl', 'r')
        retval = pickle.load(f)
        f.close()
        return retval
    
if __name__ == '__main__':
    get_spellings()

bot.py

The bot. Selects misspellings at random, searches for them, responds to them, while also taking breaks between tweets and longer breaks every few hours.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import random
import time
import pickle
import twitter
from grabber import get_spellings
API = twitter.Api(consumer_key='XXX',
                  consumer_secret='XXX',
                  access_token_key='XXX',
                  access_token_secret='XXX')
MESSAGES = u'''
$USERNAME sooo you might wanna spell $CORRECT the right way next time!! Not your fault bro.
#
# All messages stored in here, one per line.
# Edited out in order to save space in this blog post.
#
'''.split('\n')
def compose_message(twitter_post, mistake, correct):
    '''
    Choose a message from MESSAGES at random, substitute fields to personalise it and
    check if it exceeds the twitter message limit. Try this 100 times before failing.
    '''
    retries = 0
    while retries < 100:
        retries += 1
        message = MESSAGES[random.randint(0, len(MESSAGES) - 1)]
        message = message.replace('$USERNAME', '@%s' % twitter_post.user.screen_name)
        message = message.replace('$MISTAKE', '"%s"' % mistake).replace('$CORRECT', '"%s"' % correct)
        if message and len(message) < 141:
            return message
    return None
def correct_spelling(twitter_post, mistake, correct):
    '''
    Correct someone's spelling in a twitter_post
    '''
    print u'Correcting @%s for using %s...' %(twitter_post.user.screen_name,
                                            mistake)
    message = compose_message(twitter_post, mistake, correct)
    if not message:
        print u'All messages were too long... Aborting...'
        return None
    else:
        API.PostUpdate(message, in_reply_to_status_id=twitter_post.id)
        return True
def search(word):
    '''
    Search twitter for uses of a word, return one if it's been used recently.
    Otherwise return None.
    
    TODO: Add time awareness.
    '''
    print 'Searching for uses of %s...' % word
    results = API.GetSearch(word)
    if results:
        for result in results:
            if not check_if_done(result.id) and not result.user.screen_name == 'grammer_man' and word in result.text:
                return result
    return None
def check_if_done(id):
    '''
    Checks if a tweet has already been responded to
    '''
    if os.path.exists('done.pkl'):
        f = open('done.pkl', 'r')
        done = pickle.load(f)
        f.close()
        if id in done:
            return True
    return False
def update_done(id):
    '''
    Updates a list of tweets that've been replied to
    '''
    if os.path.exists('done.pkl'):
        f = open('done.pkl', 'r')
        done = pickle.load(f)
        f.close()
    else:
        done = []
    done.append(id)
    f = open('done.pkl', 'w')
    pickle.dump(done, f)
    f.close()
def main():
    '''
    Main program flow
    '''
    words = get_spellings()
    counter = 0
    while True:
        word = random.choice(words.keys())
        post = search(word)
        if counter > 100:
            rand_time = random.randint(120*60, 240*60)
            print 'Done %s tweets, sleeping for %s minutes' % (counter, rand_time/60)
            time.sleep(rand_time)
            counter = 0
        # TODO: PROPERLY PRUNE THE MISTAKES/CORRECTIONS FROM WIKIPEDIA AND REMOVE THIS:
        if not u',' in word + words[word] and not u';' in word + words[word]:
            if post:
                result = correct_spelling(post, word, words[word])
                if result:
                    counter += 1
                    print '#%s Done' % counter
                    update_done(post.id)
                    time.sleep(random.randint(300,500))
if __name__ == '__main__':
    main()

Grammer_Man uses the following libraries:

  • python-twitter (Be warned: no proxy support)
  • requests
  • BeautifulSoup
9 January, 2012 at 20:06 by aengus

Posted in Code, Computers, Funny, Idiots, Internet, Oddities | 14 Comments »

22 Oct 2010

But what does it mean?

500 / 200.

From a collective of people including the man behind King Lud’s Revenge.

22 October, 2010 at 15:15 by aengus

Posted in Art, Funny, Internet, Oddities, Photography | No Comments »

9 Oct 2010

Bigpicture Cataloguer 0.4 Released

New version of The Big Picture Cataloguer available from here. Thanks for your patience; sorry it took so long.

9 October, 2010 at 11:19 by aengus

Posted in Code, Internet | No Comments »

20 Aug 2010

The Videogame Music Preservation Foundation

A friend came across this website a few weeks ago, and I was very excited about it – an archive of plenty of video game music (mainly for DOS, which is what I grew up with), all recorded properly in order to maximise the nostalgia, and made available in ogg format.

I contacted the guy who runs it about setting up a torrent of the entire archive, and he very kindly obliged. You can get the entire collection here (~4.4GB in total). Enjoy!

20 August, 2010 at 18:24 by aengus

Posted in Art, Computer Games, Computers, Downloads, Electronic, Internet, Music | No Comments »

24 Jul 2010

Why the future doesn’t need us.

I finally managed to get around to reading Bill Joy’s article Why the future doesn’t need us the other day while waiting to board a plane. Bill Joy is a renowned computer scientist who co-founded Sun Microsystems and authored the popular UNIX text editor vi. The article is concerned with the ever increasing speed of “progress” in fields of new technology (primarily robotics, nanotechnology and genetic engineering) which Joy views with apprehension, arguing that the products of these fields will eventually render mankind obsolete and lead to our self-destruction.

There’s no point trying to quote it, so instead you can read the article here, read more about Bill Joy here, or read responses and criticism of the article here.

24 July, 2010 at 1:21 by aengus

Posted in Computer Science, Computers, Internet, Robotics, Science | No Comments »

20 May 2010

Big Picture Cataloguer: An update

In just over a week since I released the Big Picture Cataloguer, there’s been a surprising amount of interest and enthusiasm about it. Since I still haven’t gotten binary versions of the program for OS X and Linux up (I’ve no access to an OS X computer, and getting the required libraries installed on Linux has proved to be quite difficult), I’ve decided to relent and share the source code of the cataloguer under a Creative Commons license.

The script makes use of pyexiv2 – the 0.2 branch – for metadata editing, mechanize for grabbing pages and submitting error reports, the very handy unaccented_map() class (included) for unicode trickery and of course the wonderful XML parser, BeautifulSoup.

Naturally, it’s available from the Big Picture Cataloguer’s page in the Code section of this site.

Given how much The Big Picture galleries’ HTML format has subtly changed over time, and the fact I wrote this in a rush, it’s quite messy, but it does the job.

Today’s update is of version 0.3, which has an optional “quiet mode” to enable users to schedule the program to run frequently. Enjoy!

20 May, 2010 at 20:11 by aengus

Posted in Art, Code, Computers, Internet, Photography | No Comments »

12 May 2010

Boston.com Big Picture Cataloguer

I’m a big fan of The Boston Globe’s photojournalism series, The Big Picture. So much so, in fact, that I decided to dedicate a few hours this week to building a program that would not just download the entire series, but add caption metadata to each photo, since many are informative and look very nice in Picasa, for example.

Now, I’m happy that the application is stable enough to release to the world in the Code section of my website.

Since I don’t want people to be hammering The Boston Globe’s servers, I’ve made the script wait a fraction of a second between each request, and since I don’t want people to be able to disable this functionality, unfortunately only binaries will be available for the time being. Windows binaries are available already, OS X and Linux binaries to come in a few days.

Indeed, if those at The Boston Globe have a problem with how the program operates, they need simply contact me and we can come to an agreement, but I’ve worked hard to make sure that the program contacts their servers as little as possible.

Bug reports will be automatically submitted through this website too, but if you have any unforeseen problems (e.g. a crash or a hang), email me with as much information as possible (text describing the “Traceback” printed before the crash, what album/photo the program was working on, etc).

What can you do once you’ve got the entire 2GB collection of photos downloaded? Well, you can simply look through them at your own pace and comfort, or indeed choose to create a montage screensaver from them (although be warned – a screensaver that fades from a beautiful Antarctic landscape to a bloody photo of a victim of the war in Afghanistan might not be exactly what you had in mind.)

But in any event, hopefully it’ll be of some use. Enjoy!

12 May, 2010 at 20:02 by aengus

Posted in Art, Code, Computers, Internet, Media, Photography, Politics | 24 Comments »

25 Feb 2010

New discoveries

This was posted on reddit today. I agree entirely with the poster’s sentiment: interesting links on reddit are, more often than not, not links to the gateway of a whole website of interesting stuff. When they are links to a website’s front page, it’s generally a very narrow, single-purpose website that is quickly forgotten about. Hopefully, the poster’s subreddit — apparently yet to be made — will be a success.

In any event, having gone through the blog-post he had linked I decided to share some of my new discoveries here myself:

  • Building Maker: A Google app I was unaware of, which lets you add the 3D element to Google Maps. For all bored architects out there (since this is just what they want to be doing in their time off.)
  • Ikea Hacker: Neat stuff done with bog-standard Ikea furniture.
  • Strange Maps: A blog of, well, old and interesting maps. I don’t know if I’d go as far as to say strange…
  • Newseum: The front pages of newspapers from 78 countries around the world.
  • Cooking For Engineers: This one reminded me of my father, a pragmatist who insists on weighing pasta before cooking it, in order to make sure he’ll be doling out the correct amount. Nothing wrong with approaching cooking as a science, as opposed to an art!
  • GetHuman.com: An excellent idea for a website. This one tells you which keys you need to press in order to get an actual human operator on the line when calling a large company, saving you the time of listening to and trying to interact with a computerised system.
  • PDFGeni.com: Another great idea — a repository of PDF documents such as old technical manuals, academic texts, and so on.

I feel I must write a disclaimer, saying I haven’t used or read these sites extensively, having just discovered them a few hours ago, but from first impressions they do look like they deserve a bookmark.

25 February, 2010 at 16:21 by aengus

Posted in Architecture, Computers, Design, Internet, Oddities | No Comments »

3 Feb 2010

Goodbye Electronica

Came across a link to a song, “Goodbye Electronica” by Dave Graham, on the electronic music board xltronic tonight. Really, really enjoyed it – lovely guitar work, atmosphere and lyrics. He’s allowed me to share it with you here, saying it’s a “freebie”, so give it a listen and pass it on to anyone you think might enjoy it!

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

You can download it locally here:

http://ventolin.org/wp-content/uploads/2010/02/GoodbyeElectronica.mp3

3 February, 2010 at 1:26 by aengus

Posted in Art, Downloads, Electronic, Free music, Internet, Music | No Comments »

5 Jan 2010

Rette deine Freiheit

At the moment in Germany, there is fierce opposition growing against plans by the CDU to implement internet censorship under the guise of attacking the spread of child pornography. A movement championed by the German Piratenpartei has dubbed ex-minister for family affairs Ursula von der Leyen “Zensursula”, a portmandeau of Zensur (Censor) and Ursula, and is referring to the CDU’s plans as Stasi 2.0, a nod to the brutal secret police which operated in former East Germany.

Not only is there to be a secret list of blocked websites, such as exists in Australia, but the government is pushing for more data to be collected from citizens and retained for a long period of time.

A video which caught my attention a while back was entitled Du bist Terrorist (You are a terrorist). With soft ambient music playing, and deceptively pleasantly designed imagery, the two-minute video parodies the Du bist Deutschland ad-campaign with a soft, reassuring voice informing you of what the German government has in store for you, in terms of heavier and more invasive surveillance — because You are a terrorist.

Earlier this week I found that the same people had created a new video in the same vein, entitled Rette deine Freiheit (Save your freedom). The video focuses much more on the coming internet censorship in Germany than just data retention and physical surveillance.

Since there was no English translation available, I decided to translate it and re-upload to Youtube. The result is below:

The translation is by no means perfect, but at least it’s something. There were a few tricky problems with it:

  • Einfach wegschauen: Literally “simply look away”, the video describes this as the method tried-and-tested by members of families with a history of domestic abuse. I was going to translate it as “simply look the other way” in its first instance, since this is the closest phrase in English that pertains to such a situation. However, this doesn’t exactly capture the double-meaning employed in the video, since it implies wilful ignorance which isn’t quite applicable to what the government is doing, so I decided to settle on “simply block it out”. I’m not sure I’m happy with this, however. Suggestions?
  • In the sentence, “In Prävention, Therapie und Personal investiert hätte dies vielen Opfern helfen können: Reinste Verschwendung”, the meaning that is sarcastically implied is that the money that could be invested in preventative measures, therapy and personelle is much better spent on building an internet block. I don’t think I captured this very well.

In any event, there’s likely to be an official translation soon (I just saw an “Englisch (bald verfügbar)” notice at the top of the official page now — perhaps my emailing asking for a transcript of the video got them in a rush) and these issues will cease to be.

One last thing — if you are interested in learning more about the situation in Germany regarding internet freedom and the child pornography scare, I’d not only urge you to visit the links above, but also this shocking, but morbidly fascinating account of one techie’s work in the murkiest of subcultures. Thankfully, he doesn’t go into detail about actual child abuse, but instead details exactly how child pornography rings work, using the internet and computers.

Put simply, it proves what anyone with a clue already knows: current proposals for internet censorship will have absolutely no impact whatsoever on paedophiles and child pornographers and will only serve to infringe the rights of normal, law-abiding internet users.

Thanks to Áine and Patricia for help with one or two minor parts of the translation.

5 January, 2010 at 22:19 by aengus

Posted in Art, Censorship, Computers, Design, Digital Rights, Germany, Internet, Politics, Words | 1 Comment »

« Older Entries
  • In My Ears

    • Cover artwork for Alien Observer
      Alien Observer
      Grouper
      2 hours and 10 minutes ago
    • Cover artwork for Shine On You Crazy Diamond (Parts I-V)
      Shine On You Crazy Diamond (Parts I-V)
      Pink Floyd
      3 hours and 9 minutes ago
    • Cover artwork for No. 6 Von Karman Street
      No. 6 Von Karman Street
      A Sunny Day In Glasgow
      3 hours and 14 minutes ago
    • Cover artwork for Wake Up Pretty
      Wake Up Pretty
      A Sunny Day In Glasgow
      3 hours and 16 minutes ago
    • Cover artwork for Monophonic Shit
      Monophonic Shit
      Mr. Oizo
      3 hours and 20 minutes ago
  • CATEGORIES

    • America (10)
    • Art (50)
      • Architecture (11)
      • Design (8)
      • Photography (9)
    • Computers (18)
      • Code (8)
      • Computer Games (1)
      • Computer Science (5)
      • Cryptography (1)
      • Robotics (2)
    • Digital Rights (3)
    • Drink (1)
    • Film (8)
      • Animation (2)
      • Documentary (1)
      • Short Film (4)
    • Funny (15)
    • Gay Rights (3)
    • Germany (9)
      • Berlin (5)
      • German Language (3)
    • Guns (1)
    • History (1)
    • Idiots (11)
    • India (1)
    • Internet (24)
    • Ireland (9)
      • Irish Language (2)
      • The Troubles (1)
    • Israel / Palestine Conflict (2)
    • Media (12)
      • News (9)
      • TV (1)
    • Music (32)
      • Bad Music™ (4)
      • Downloads (3)
      • Electronic (8)
      • Experimental (6)
      • Free music (2)
      • Generative Music (1)
      • Jazz (1)
      • Live (4)
      • Music theory (2)
      • Videos (3)
    • Oddities (29)
    • Politics (29)
      • Censorship (5)
      • Far-right (1)
    • Religion (2)
    • Science (5)
    • Sports (1)
      • Football (1)
    • War (5)
    • Words (17)
      • Linguistics (7)
      • Literature (2)
      • Poetry (4)
  • Friends' Blogs

    • Jaded Isle
    • johnl.org
    • jonathan.beaton
    • Kay Doubleu
    • King Lud’s Revenge
    • Perte de Temps
  • My Other Websites

    • The Wisp Archive
  • META

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org
Avatars by Sterling Adventures
Creative Commons License
Sredzkistraße is proudly powered by WordPress
Design & code by Jonk, modified for Sredzkistraße by aengus.
Entries (RSS) and Comments (RSS).