By Alvin Alexander. Last updated: June 6, 2016
I just updated my Radio Pi “RSS Feed” script, and in short, here is the source code:
#!/usr/bin/python import feedparser import time from subprocess import check_output import sys #feed_name = 'TRIBUNE' #url = 'http://chicagotribune.feedsportal.com/c/34253/f/622872/index.rss' feed_name = sys.argv[1] url = sys.argv[2] db = '/var/www/radio/data/screensaver/feeds.db' limit = 12 * 3600 * 1000 # # function to get the current time # current_time_millis = lambda: int(round(time.time() * 1000)) current_timestamp = current_time_millis() def post_is_in_db(title): with open(db, 'r') as database: for line in database: if title in line: return True return False # return true if the title is in the database with a timestamp > limit def post_is_in_db_with_old_timestamp(title): with open(db, 'r') as database: for line in database: if title in line: ts_as_string = line.split('|', 1)[1] ts = long(ts_as_string) if current_timestamp - ts > limit: return True return False def clean_string(string): return string.encode('utf-8').strip() # # get the feed data from the url # feed = feedparser.parse(url) # # figure out which posts to print # posts_to_print = [] posts_to_skip = [] for post in feed.entries: # if post is already in the database, skip it # TODO check the time title = clean_string(post.title) if post_is_in_db_with_old_timestamp(title): posts_to_skip.append(title) else: posts_to_print.append(title) # # add all the posts we're going to print to the database with the current timestamp # (but only if they're not already in there) # f = open(db, 'a') for title in posts_to_print: title_cleaned = clean_string(title) if not post_is_in_db(title_cleaned): f.write(title_cleaned + "|" + str(current_timestamp) + "\n") f.close # # output all of the new posts # count = 1 blockcount = 1 for title in posts_to_print: if count % 5 == 1: print("\n" + '((( ' + feed_name + ' - ' + str(blockcount) + ' )))') print("-------------------\n") blockcount += 1 title_cleaned = clean_string(title) print(title_cleaned + "\n") count += 1 ipAddr = check_output(["hostname", "-I"]) print "\n" print '---------------------------------' print 'IP Address: ' + ipAddr.strip() print '---------------------------------' print "\n"
When run from a crontab entry like this:
get_feed.py NPR http://www.npr.org/rss/rss.php?id=100
this script produces output like this:
((( NPR - 1 ))) ------------------- Putin Faces Frosty Reception At G20 In Australia The Wondrous World Of Tom Thumb Weddings Hong Kong Democracy Leaders Barred From Traveling To Beijing Not My Job: Ron Perlman, Who Played The Beast, Gets Quizzed On Beauty In NPR Interview, Bill Cosby Declines To Discuss Assault Allegations ((( NPR - 2 ))) ------------------- The Good Listener: For Thanksgiving, Is There Music Everyone Can Agree On? Fresh Air Weekend: Jon Stewart, Peter Mendelsund And A Review Of Bob Dylan Gen. Dempsey Lands In Iraq As U.S. Presence Starts To Grow --------------------------------- IP Address: 10.0.1.9 ---------------------------------
That code and output depend on some database files, but if you know what my Radio Pi project is, the code will make a little more sense.
As one note, I get and print the IP address of the local system because the output from this script goes to a monitor on my Radio Pi system, and it’s often helpful to know the IP address of my Raspberry Pi, because I usually don't have a keyboard or mouse attached to it.