Spycyroll - Python RSS Aggregator

Demo

This demo might be broken at times...

Subscriptions

Aaron Swartz
Bryan J Richard
Eliot Landrum
Fredrik Lundh
Garth Kidd
Hans Nowak
Jeffrey Shell
Juri Pakaste
Kendall Clark
Kevin Altis
Mark Pilgrim
Matt Croydon
Paul Everitt
Richard Jones
Russ Lipton
Simon Brunning
Simon Willison
V. Satheesh Babu


Updated on February 08, 2003 07:03 PM

SpycyRoll

Old data can be deleted by:
find path_to_data -mtime +5 -exec rm -f {} \;
for deleting items older than 5 days.

February 08, 2003

Mark invoked the lazy web earlier today in a bid to find a good way of bulk optimizing PNG files. Several people recommended pngcrush in the comments and it sounds like a fantastically useful piece of software - apparently it can run 114 different lossless compression methods on an image and automatically chose the most efficient one.


Via Scott, a clever PHP technique for ensuring data sent to the browser as a cookie or hidden form variable isn't tampered with by the user:

If you're expecting to receive data in a cookie or a hidden form field that you've previously sent to a client, make sure it hasn't been tampered with by sending a hash of the data and a secret word along with the data. Put the hash in a hidden form field (or in the cookie) along with the data. When you receive the data and the hash, re-hash the data and make sure the new hash matches the old one.

A further explanation and example code can be found in PHP and the OWASP Top Ten Security Vulnerabilities, a handy article describing how PHP coders can combat the top ten web application security problems highlighted by a recent report from OWASP. Incidentally, OWASP still haven't fixed the cross site scripting vulnerability on their own site, discovered by Tom Gilder several weeks ago.

Incidentally, while the hashing method is clever and should be nice and secure I personally advocate not sending the user any information unless absolutely necessary - use sessions and store sensitive data on the server instead. I suppose you could always use the hash to add an extra layer of security to the session identifier though.


How to troll comp.lang.python <0.5 wink> Lots of arguing on c.l.py about the ternary operator. Some people even came with mock PEPs to indicate what they think about this. ... [171 words]

I got a good response to yesterday's call for help on finding an HTML element's co-ordinates on a page. I ended up using PPK's findPos functions which seemed to do the trick just fine.

Here's the result:

Image Drag (bookmarklet, drag on to your links bar).

Image Drag makes every image on a page "draggable", using youngpup's DOM-Drag library. It works by "cloning" each image on the page and making the clone an absolutely positioned, draggable element then changing the original image to a transparent pixel of the correct width and height. I wrote it because I wanted my girlfriend to be able to play with GothMaker in Phoenix instead of IE, but it works for other pages too. There is a weird bug which affects any pages that use absolute positioning (such as this one) - I think it's a bug in DOM-Drag rather than a problem with findPos but I'm not entirely sure what's causing it.

The bookmarklet works fine in Phoenix (on both Linux and Windows) but doesn't work in IE. I'm not too bothered about this - with IE6 nearing it's second birthday if you are still using it you should seriously considering upgrading to something a bit more up to date anyway.

Since most of the sites linked to from this one use absolute positioning, here are a few which the bookmarklet works well on that are fun to mess around with:

Incidentally, although it's only meant as a fun distraction, having played with it a bit it looks like it could be quite useful as a tool for web site design tweaks such as seeing if the page would look better with images moved around a bit.


reStructuredText: slow and unforgiving?

V. Satheesh Babu likes reStructuredText, and points to a David Mertz article on IBM developerWorks. The comments are where things get interesting; reST is described as slow and unforgiving.

I'm not so sure about reST being unforgiving relative to, say, StructuredText. Both will misbehave if you don't form your text properly, but at least reST will give you an intelligent error response pointing out where you go wrong. StructuredText will just give you odd results.

Slow, I'd have to check...


Optimizing PNG files What is the best free tool on any platform for producing highly optimized PNG files? (201 words)

best thing about phoenix when i have a problem, i can look it up in Bugzilla: Bug 191637 - Customize Toolbar Bustage. i love... (35 words, 0 comment(s).)

February 07, 2003

Does anyone know if it is possible to find an HTML element's exact position on the page (in terms of pixels-from-the-top and pixels-from-the-left) using javascript? The element I have in mind is an image that has not had any positioning applied to it, but I imagine any solution will work for other elements as well. I need something that works in Mozilla/Phoenix, although a solution for IE would be nice as well. It's for a bookmarklet I'm thinking of writing.


Testing Trackbacks with tblib

Matthew Langham:

Frank is looking for someone to test whether his Trackback works. Radio doesn't support Trackback - so perhaps some kind person will help. Thanks!

I use Radio, and therefore don't have trackback support, but I did ping his weblog manually with a python trackback library that I wrote.  His trackbacks seem to be working fine.

[user @ box tb]$ ./tbclient.py -tburl http://www.koehntopp.com/cgi-bin/mt-tb.cgi/204 -title "Testing Trackback" -excerpt "I use Radio, which doesn't support trackback either, so I'm pinging gadgetguy.de with a command line client using tblib." -blogname "Matt Croydon::postneo" -url http://postneo.com/2003/02/07.html#a1992
Trackback command line client here.  Preparing TrackBack...
TrackBack URL: http://www.koehntopp.com/cgi-bin/mt-tb.cgi/204
TrackBack Title: Testing Trackback
TrackBack Excerpt: I use Radio, which doesn't support trackback either, so I'm pinging gadgetguy.de with a command line client using tblib.
Your Weblog Name: Matt Croydon::postneo
Your URL: http://postneo.com/2003/02/07.html#a1992
Pinging http://www.koehntopp.com/cgi-bin/mt-tb.cgi/204...
HTTP Response: 200 OK
TrackBack Error Code is: 0 (zero is okay)
Done!


Cool-2B-Real is a site for teenage girls. Real girls are "keepin' it real" by building strong bodies and strong minds... and they're feeling great about themselves! It has health and fitness tips, tips on feeling good about yourself, a poll ("What type of beef do you most like to eat with your friends?"), and a set of Smart Snackin' recipes such as Nacho Beef Dip, Beef on Bamboo, Easy Beef Chili and Roast Beef and Veggie Wrap. And beef games too.

Real girls eat beef. Cool-2B-Real was brought to you by the Cattlemen's Beef Board and the National Cattlemen's Beef Association.


It looks like Scott got burned by a PHP MeetUp arranged at an out of business restaurant that then failed to materialise at all. From his comments it seems like he's not the only person to hit problems. I have yet to attend a meetup (the Bristol UK ones rarely have anyone sign up for them) but I love the concept, so it's a real shame to hear about problems like this. Hopefully the MeetUp team are working on ways to stop this kind of thing from happening - some kind of short-lived email mailing list for each location/event might go some way to ensuring people who sign up for them know what's going on and bother to show up. At least their recent changes page shows that they have been actively trying to improve the overall experience.


Typing: Safety causes Bugs Alan Green: Typing: Safety causes Bugs.

Shipping the prototype Shipping the prototype - Let's promote scripting languages to the status they deserve My point is that languages like Python, but also Perl, Ruby, and JavaScript/JScript/ActionScript/EcmaScript, are strategic in ways

XYAPTU Anyone? I'm in the market for lightweight templating. I'm playing around with XYAPTU but I'm not sure if it's really what...

More SQLObject I'm actually very impressed by SQLObject. Ian Bicking has done a steller job. The rational join support is stunning. Once...

Safari Bookmarks / UI John Gruber of Daring Fireball writes up some issues with the Safari User Interface. I have a couple of counterpoints...

still alive Well, I'm still alive after two classes and chapel. And I got back my dynamics test -- it's only the... (69 words, 2 comment(s).)

Rally for Peace

A new poll indicates that

Only 6 per cent of Australians are prepared to send Australian troops to war against Iraq without United Nations backing, an exclusive national Age poll has found.
In the AC Nielsen AgePoll - a blow to the Federal Government's stand on the war with the United States - 62 per cent of respondents said Australia should be involved in a conflict only if approved by the UN.
One in three believed war against Iraq was not acceptable under any circumstances.
.
.
"The government obviously doesn't disregard community attitude," he [Defence Minister Robert Hill] said.

That's good, since there's going to be a rally next Friday, as a part of an international weekend of events opposing the US war on Iraq. Thanks to Rachel for reminding me.


Via Keith Devens, Screen-scraping with WWW::Mechanize describes how Perl's WWW::Mechanize module can be used to grab information from sites that require a user login. I've always dismissed screen scraping as something of a wasted effort, given the fact that a major rewrite of the scraper is required whenever the target site tweaks its HTML. This article has encouraged me to reconsider - some of the functionality in WWW::Mechanise is fantastic:

We create a WWW::Mechanize object and tell it the address of the site we'll be working from. The Radio Times' front page has an image link with an ALT text of "My Diary", so we can use that to get to the right section of the site:


  my $agent = WWW::Mechanize->new();
  $agent->get("http://www.radiotimes.beeb.com/");
  $agent->follow("My Diary");

The returned page contains two forms - one to allow you to choose from a list box of program types, and then a login form for the diary function. We tell WWW::Mechanize to use the second form for input. (Something to remember here is that WWW::Mechanize's list of forms, unlike an array in Perl, is indexed starting at 1 rather than 0. Our index is, therefore,'2.')


  $agent->form(2);

Now we can fill in our e-mail address for the '<INPUT name="email" type="text">' field, and click the submit button. Nothing too complicated.


  $agent->field("email", $email);
  $agent->click();

I'm still not quite impressed enough to learn Perl, but I'm very tempted to borrow some of the ideas and re-implement them in PHP or Python.


DHCP timeout Getting rid of an annoyance at boot time for Mandrake Linux. (143 Words)

Shipping the prototype Let's promote scripting languages to the status they deserve (305 Words)

Dict, Python clients, and Gophers, Oh My!

I don't use a spell checker for this weblog, and I'm truly sorry for that.  Sucks to be you, my reader.  Most of the time if I'm not sure of the spelling of a word, I'll either look it up at dict.org or find a similar, easier to spell word.

I was curious about DICT protocol clients, so I went to the links page.  This led me to John Goerzen's Python client class.  I'm taking a look at it now, it looks simple yet powerful.  John hosts his content with a gopher.  I miss gopher.  Gopher used to be the coolest thing on the planet.

Of course John has also written a gopher server in Python.  It's called PyGopherd (duh!).  I so have to set one up.


Shipping the Milestone Jon Udell has a piece in the latest Infoworld cites the recent series of interviews with Guido van Rossum by...

pylibini 0.1.2 Released!

Another gem this evening from freshmeat, pylibini 0.1.2 is out.  It's a bugfix/tweak release.  It is also now released under the LGPL (previously GPL).  Here's a quick description:

pylibini is a Python module which provides powerful access and easy manipulation of .ini files in Python applications.

Another useful tool for the toolbox.  Most of my work in Python has been fairly lightweight, probably not requiring .inis, but it's good to know about.


February 06, 2003

Contracts Contracts in Python. The latest in a series of articles that I don't find very good. Let's dissect this one, for example. ... [356 words]

This is O'Reilly in reStructuredText Bill Bumgarner reveals that his article on PyObjC was written in reStructuredText and converted to the format O'Reilly needed with a custom parser (for which you'll have to visit Bill and follow his link; gotta hand this traffic around, after all).

PyObjC

OSNews:

Some programmers see the advantage of combining Python and Objective-C in the same environment, believing that a bridge between the two languages provides tremendous power and advantages to either language. For the Objective-C developer, access to Python provides a rapid application-development solution that's far more efficient than one requiring a compiler. For the Python developer, transparent access to Objective-C would allow the developer's scripts to leverage the full power and elegance of the MacOSX environment. In this article, Bill Bumgarner shows you how to bring these worlds together.

Python just screams out to be integrated into other languages and environments, doesn't it?

I don't currently have an OSX testbed, but I have an old G3 coming in that I have to troubleshoot and buy a few parts for.  We'll see how that goes.


Shipping the prototype by Jon Udell at InfoWorld says "Let's promote scripting languages to the status they deserve"

In brief: All hail the Benevolent Goat Masters! AdAware 6. SimpleComments MT plugin. New ads at NYTimes.com. Mozilla enhancers. Safari enhancers. The longest musical performance in history. A variety of amusing anecdotes and tales from writers more versed than myself in the subtle art of telling anecdotes and tales. (666 words)

And over on Blogzilla, Lim Chee Aun has finally solved the niggling bug with Phoenix 0.5 on Windows where the icon shown in the taskbar is an ugly default Windows image.


Optimoz have released Version 0.3.5 Release Candidate 3 of their mouse gestures add-on for Mozilla based browsers. I hadn't tried the version 0.3.5 series before and the improvements are impressive to say the least:

I have severe trouble browsing without gestures these days, but the biggest time saver of them all has to be the W3C validation gesture - draw a V on the page and instantly see if it is valid [X]HTML or not.


Interviews with Guido van Rossum Links to artima.com's A Conversation with Guido van Rossum interview have been floating all over the blogsphere, but I haven't seen any links to this Guido van Rossum interview.

Java 1.5 features An interesting set of stuff: Generics Autoboxing Enhanced For-loop Enumerations Static Imports It looks like Java is reacting to the competition!

Blog template updates XML button is present wherever I provide a feed; and fixed a bug with my templates. (150 Words)

Planning thoughts Thinking about writing thoughts on measuring work related stuff. (258 Words)

writing widgets in python zone.effbot.org: Writing Widgets in Python, Part I


Warnings, a final note

I had to wrap the warning suppression code in a test for the existence of FutureWarning because that warning doesn't exist in Python before 2.3. Gah.


reStructuredText intro David Mertz write about reST (109 Words)

In law, not in code It is this last example that worries me the most, the belief that legal problems are merely annoyances that simply require technological workarounds. I see this everywhere in technical communities; I assume it's a variant of the "when all you have is a hammer" syndrome. Engineers generally have no legal expertise, but they can write code like nobody's business, so they write code to try to solve all their problems. Here's a stating-the-obvious news flash: legal problems require legal solutions, not technical ones. (860 words)

Turning Python warnings off

Aha! The warnings module provides a method of filtering out the nasty new warnings :)

Be warned though, the module documentation lies (ok, that's possibly a little strong ;) and the "message" and "module" arguments to filterwarnings() aren't actually a "compiled regular expression" as it states, but a compileable string. I ended up inserting this code just before I import portalocker:

import warnings
warnings.filterwarnings("ignore",
    r'hex/oct constants > sys\.maxint .*', FutureWarning,
    'portalocker', 0)

And now my unit tests are quiet, except for the calendar problem...


Roundup running under Python 2.3a1

Roundup produces some strange and new output when run under Python 2.3a1. All but one of the unit tests pass. The failure results from a change in the way the calendar module is implemented (it's now mostly implemented by the new datetime module). This means that the following now fails:

>>> calendar.timegm((2003,2,30,0,0,0,0,0,0))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.3/calendar.py", line 216, in timegm
    days = datetime.date(year, month, day).toordinal() - _EPOCH_ORD
ValueError: day is out of range for month

This is a shame. I'm not sure what will happen - I've posted a message to the python list to ask what the options are.

The second bit of fun was that my code has the literal 0xffff0000 in it (as part of the portalocker). This now generates a FutureWarning, which I've yet to discover how to turn off. I may resort to a silliness such as 0xffff000 << 8 to get around it, I don't know.

Finally, I have a couple of warnings about assigning to None, which is to become a keyword in some future Python version, and thus will be unassignable. The couple of places that I have that assignment were in the Zope Page Templates code (as hackish optimisations). This was an easy fix :)


MoinMoin Updates

I spent some time this evening chatting with Thomas Waldmann on #moin at irc.freenode.net. MoinMoin is currently at 1.0, though 1.1 is just around the corner. Hopefully 1.1 will include some caching support which should speed things up (Slashotting a MoinMoin wiki is not recommended) a bit. 1.1 should also bring compatability with persistance, such as the Twisted Framework and mod_python. More details on that can be found at the mailing list archives. Planned features for 1.2 include some darn smart internationalization. Linking WikiNames across languages sounds like a headache, but there is a working demo up and running.

Aparently Thomas isn't much of a Python guy, but he loves MoinMoin and is pleased to be working in Python.


PyCon DC 2003 Registration Open!

Excellent! You can now register for PyCon DC 2003.


reST on dW David Mertz on reStructuredText. reST is damn coOL. Kendall and I have a little project we've been cooking up that...

February 05, 2003

Via Leonard Lin, a nice demonstration of an enhanced HTML text area (with buttons to add tags) that works in IE, Mozilla and Phoenix. Until recently this had not been possible thanks to a long standing Mozilla bug.


Paul Tchistopolskii's XML Alternatives reminded me to take another look at YAML. The specification has been updated since I last looked and seems to be a bit more complicated, but it's still a very nicely designed format. Implementations are available for Perl, Python and Ruby with C and Java on the way but strangely no one seems to be doing one for PHP yet. I'm doing a course at Uni on compilers at the moment which includes quite a lot of stuff about writing parsers so I'm very tempted to have a go at a YAML implementation in the next few weeks just to try stuff out. The possibility of easily swapping relatively complex data structures between PHP and Python is pretty tempting as well.


Dave Winer asks why Joel Spolsky gets much more traffic when slashdotted than UserLand's hosted sites tend to. Joel explains (it's all down to network effects) and mpt kicks in a few ideas as well.


I'm really liking Jeffrey Zeldman's latest redesign. Aside from a pretty face, the markup holds some interesting ideas as well. For example, I've never seen a definition list used for a blogroll style list before:


<dl id="outside2" style="display:none;">
<dt>Relevant Externals:</dt>
<dd><a href="http://www.20things.org/" target="eljefe" 
  title="20 people make 20 things in 20 days.">20 things</a></dd>
<dd><a href="http://www.alistapart.com/stories/indexAccessibility.html" target="eljefe" 
  title="Accessibility articles and tutorials at A List Apart.">Access @ ALA</a></dd>
<dd><a href="http://www.gregstorey.com/airbag/" target="eljefe" 
  title="Greg Storey&#8217;s beautiful personal periodical.">Airbag</a></dd>
...

It makes sense in that "Relevant Externals" is a definition of the following list of defined terms. The official specification for definition lists is notoriously vague in any case:

Definition lists vary only slightly from other types of lists in that list items consist of two parts: a term and a description. The term is given by the DT element and is restricted to inline content. The description is given with a DD element that contains block-level content.

[...]

Another application of DL, for example, is for marking up dialogues, with each DT naming a speaker, and each DD containing his or her words.


Try to explain this to an American... CBR maakt rijbewijs duizend euro duurder [Dutch]. Basically the article says that getting a driver's license in the Netherlands will cost €1000 more than it does now. ... [257 words]

Global Potato News I just don't know how I ever managed to live without Global Potato News.

The Spycyroll Rolls On Kendall mentioned that the date on the roll was a bit twee (a phrase I hadn't heard since I...

Easy Click Someone finally went and did it! There's now a very simple interface available for pulling text from the Bible. Heal... (160 words, 0 comment(s).)

Fine-tuning garbage collection Fine-tuning Java garbage collection performance

Python Programmer Weblogs Python Programmer Weblogs is an online aggregator, feeding off thirty or so Python oriented weblogs, including mine. Might this be the beginnings of a Python version of java.blogs? The software

Dave Thomas has a blog Dave Thomas (co-author of The Pragmatic Programmer) has a new weblog - PragDave. One for the aggregator!

claims, frauds and hoaxes An Encyclopedia of Claims, Frauds and Hoaxes of the Occult and Supernatural

observations


Enabling IPv6 on OS X or How I Learned to Stop Worrying and Love the Dancing Kame Note: This will apparently not work if you’re…

reStructuredText for Zope 3 From the Zope 3 Dev mailing list this morning: I hereby decree that Zope3 doc strings and text documentation files...

Doctor's Report Good report from the doctor. (79 words, 0 comment(s).)