I am attempting to convert from scratch the following PHP script into Python for my Django Project:
Note that it is my understanding that this script should handle values sent from a form, sign the data with the Secret_Key, encrypt the data in SHA256 and encode it in Base64
<?php
define ('HMAC_SHA256', 'sha256');
define ('SECRET_KEY', '<REPLACE WITH SECRET KEY>');
function sign ($params) {
return signData(buildDataToSign($params), SECRET_KEY);
}
function signData($data, $secretKey) {
return base64_encode(hash_hmac('sha256', $data, $secretKey, true));
}
function buildDataToSign($params) {
$signedFieldNames = explode(",",$params["signed_field_names"]);
foreach ($signedFieldNames as $field) {
$dataToSign[] = $field . "=" . $params[$field];
}
return commaSeparate($dataToSign);
}
function commaSeparate ($dataToSign) {
return implode(",",$dataToSign);
}
?>
Here is what I have done so far :
def sawb_confirmation(request):
if request.method == "POST":
form = SecureAcceptance(request.POST)
if form.is_valid():
access_key = 'afc10315b6aaxxxxxcfc912xx812b94c'
profile_id = 'E25C4XXX-4622-47E9-9941-1003B7910B3B'
transaction_uuid = str(uuid.uuid4())
signed_field_names = 'access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency'
signed_date_time = datetime.datetime.now()
signed_date_time = str(signed_date_time.strftime("20%y-%m-%dT%H:%M:%SZ"))
locale = 'en'
transaction_type = str(form.cleaned_data["transaction_type"])
reference_number = str(form.cleaned_data["reference_number"])
amount = str(form.cleaned_data["amount"])
currency = str(form.cleaned_data["currency"])
# Transform the String into a List
signed_field_names = [x.strip() for x in signed_field_names.split(',')]
# Get Values for each of the fields in the form
values = [access_key, profile_id, transaction_uuid,signed_field_names,'',signed_date_time,locale,transaction_type,reference_number,amount,currency]
# Insert the signedfieldnames in their place in the list (MUST BE KEPT)
values[3] = 'access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency'
# Merge the two lists as one
DataToSign = list(map('='.join, zip(signed_field_names, values)))
# Hash Sha-256
API_SECRET = 'bb588d4f96ac491ebd43cceb18xx149b79291f874f1a41fcbf5bc078bb6c8793af2df5ad4b174f80bd5f24a4e4eec6fdabdxxxxxc6c1410db40252deea613e0b976748539294438694ba08xx4ba831d3d850349cacfa445f9706aa57be7f8e61aab0be2288054dbe88ec6200ccd7c72888bcc0aa373f42059ec248d3c86b0f45'
message = '{} {}'.format(DataToSign, API_SECRET)
signature = hmac.new(bytes(API_SECRET , 'latin-1'), msg = bytes(message , 'latin-1'), digestmod = hashlib.sha256).hexdigest().upper()
base64string = base64.b64encode( bytes(signature, "utf-8") )
When printing the variables as they come, I obtain the following :
VALUES : ['afc10315b6aa3b2a8cfc91253812b94c', 'E25C4FE4-4622-47E9-9941-1003B7910B3B', '0b59b0ae-bd25-4421-a231-bb83dcfc91fa', 'access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency', '', '2021-03-06T22:07:30Z', 'en', 'authorization', '1615068450109', '100', 'USD']
DATATOSIGN : ['access_key=afc10315b6aa3b2a8cfc91253812b94c', 'profile_id=E25C4FE4-4622-47E9-9941-1003B7910B3B', 'transaction_uuid=0b59b0ae-bd25-4421-a231-bb83dcfc91fa', 'signed_field_names=access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency', 'unsigned_field_names=', 'signed_date_time=2021-03-06T22:07:30Z', 'locale=en', 'transaction_type=authorization', 'reference_number=1615068450109', 'amount=100', 'currency=USD']
SIGNATURE : 953C786EB9884CEC13C24118B00125BDCFE23AFF8AB02E7BEF29A83156C55C16
BASE64STRING : b'OTUzQzc4NkVCOTg4NENFQzEzQzI0MTE4QjAwMTI1QkRDRkUyM0FGRjhBQjAyRTdCRUYyOUE4MzE1NkM1NUMxNg=='
I think I am getting pretty close from the final result I would like to achieve since I would then simply have to post the Base64String to a specific URL.
However, I am unsure of a couple of things which may seem a bit off :
Is my "translation" of the PHP code into Python correct? Am I meant to merge my lists with a result in "DATATOSIGN"? I am not proficient in PHP so I might have misunderstood how to present the data.
The signature in Base64 should be 44 chars AT ALL TIME like "WrXOhTzhBjYMZROwiCug2My3jiZHOqATimcz5EBA07M=" when using the PHP Sample Code but mine way exceeds this limitation.
If you need any additional information, please do not hesitate to ask.
Hope you can give me pointers !
To approach this problem, it might be good to get an idea of what your final PHP result would be for given parameters.
Here are the parameters I'm using for this with your given PHP code:
$params = [
'access_key' => 'afc10315b6aaxxxxxcfc912xx812b94c',
'profile_id' => 'E25C4XXX-4622-47E9-9941-1003B7910B3B',
'transaction_uuid' => '12345',
'signed_field_names' => 'access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency',
'unsigned_field_names' => '',
'signed_date_time' => '2021-03-06 16:14:00',
'locale' => 'en',
'transaction_type' => 'credit',
'reference_number' => '12345',
'amount' => '50',
'currency' => 'usd'
];
When I run the original PHP code with these parameters, and these lines for outputting the code:
<?php
echo "build data to sign:\n";
print_r(buildDataToSign($params));
echo "\n";
echo "sign data:\n";
echo signData(buildDataToSign($params), 'secret');
?>
I get the following output:
build data to sign:
access_key=afc10315b6aaxxxxxcfc912xx812b94c,profile_id=E25C4XXX-4622-47E9-9941-1003B7910B3B,transaction_uuid=12345,signed_field_names=access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency,unsigned_field_names=,signed_date_time=2021-03-06 16:14:00,locale=en,transaction_type=credit,reference_number=12345,amount=50,currency=usd
sign data:
6V0iIqu3smGmadPK4KvRuHm1nNkuIVLBPbLg7VkA7M8=
So with your new Python version of this PHP code, you'll probably want to have a similar sign data value of 6V0iIqu3smGmadPK4KvRuHm1nNkuIVLBPbLg7VkA7M8= at the end with these parameters!
Because your Python example does not seem to get the same result as-is, after adding a return base64string to the end of your Python def, I get the following output:
sign_data:
b'NThFMjU4QTQyRjU2MkVDRDgzM0RCOEIwM0VDODczQTExNjc3MUNDMEM2OURGMDFGMjdFQkU3MEMzMDAyNjA3RQ=='
In order to match the PHP version of your code, I wanted to try to find out what was going on between the PHP and Python approaches in regard to the hmac and base64 parts.
When I broke down your PHP code example into steps relating to the hmac value and then later the base64 value, here is what I found (using a data message of 'hello' and a key of 'secret' to keep it simple):
Example PHP Code:
<?php
$hash_value = hash_hmac('sha256', 'hello', 'secret', true);
$base64_value = base64_encode($hash_value);
echo "hash value:\n";
echo $hash_value;
echo "\n";
echo "base64 value:\n";
echo $base64_value;
echo "\n";
?>
Example PHP Code Output:
;▒▒▒▒▒C▒|
base64 value:
iKqz7ejTrflNJquQ07r9SiCDBww7zOnAFO4EpEOEfAs=
That looks like some crazy binary-type data! Then, I wanted to try to make sure that the base64 value could be reproduced in Python. To do that, I used a simple approach in Python using the same values as earlier.
Example Python Code:
import base64
import hashlib
import hmac
# Based on your Python code example
hash_value = hmac.new(bytes('secret' , 'latin-1'), msg = bytes('hello', 'latin-1'), digestmod = hashlib.sha256).hexdigest().upper()
base64_value = base64.b64encode(bytes(hash_value, 'utf-8'))
print("hash value:")
print(hash_value)
print("base64 value:")
print(base64_value)
Example Python Code Output:
hash value:
88AAB3EDE8D3ADF94D26AB90D3BAFD4A2083070C3BCCE9C014EE04A443847C0B
base64 value:
b'ODhBQUIzRURFOEQzQURGOTREMjZBQjkwRDNCQUZENEEyMDgzMDcwQzNCQ0NFOUMwMTRFRTA0QTQ0Mzg0N0MwQg=='
So, like you found out earlier, something is causing this base64 value result on the Python side to be longer than the PHP version.
After looking into things more (especially seeing the strange data result in the PHP test code above), I found out that the hash_hmac() function in PHP has the option to return a result in binary form (with the true value at the end of the hash_hmac() in your PHP code example). On the Python side, it looks like you decided to use hmac.hexdigest() which I think I've used before in the past when I wanted a string-like value. For this case, however, I think you might want to get the value back as a binary value. To do this, it looks like you'll want to use hmac.digest() instead.
Modified Example Python Code:
import base64
import hashlib
import hmac
# Based on your Python code example
hash_value = hmac.new(bytes('secret' , 'latin-1'), msg = bytes('hello', 'latin-1'), digestmod = hashlib.sha256).digest()
base64_value = base64.b64encode(bytes(hash_value))
print("hash value:")
print(hash_value)
print("base64 value:")
print(base64_value)
Modified Example Python Code Output:
hash value:
b'\x88\xaa\xb3\xed\xe8\xd3\xad\xf9M&\xab\x90\xd3\xba\xfdJ \x83\x07\x0c;\xcc\xe9\xc0\x14\xee\x04\xa4C\x84|\x0b'
base64 value:
b'iKqz7ejTrflNJquQ07r9SiCDBww7zOnAFO4EpEOEfAs='
Now, the final base64 results appear to match between the example PHP and Python code.
In order for me to better understand what was different between the PHP and Python code, I ended up writing a simple translation of your PHP code into Python (and partly based on your Python code as well).
Here is what the related Python code looks like on my side (with example params):
import base64
import hmac
params = {
'access_key': 'afc10315b6aaxxxxxcfc912xx812b94c',
'profile_id': 'E25C4XXX-4622-47E9-9941-1003B7910B3B',
'transaction_uuid': "12345",
'signed_field_names': 'access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency',
'unsigned_field_names': '',
'signed_date_time': "2021-03-06 16:14:00",
'locale': 'en',
'transaction_type': "credit",
'reference_number': "12345",
'amount': "50",
'currency': "usd"
}
SECRET_KEY = 'secret'
def sign(params):
return sign_data(build_data_to_sign(params), SECRET_KEY)
def sign_data(data, secret_key):
return base64.b64encode(bytes(hmac.new(bytes(secret_key, 'latin-1'), msg=bytes(data, 'latin-1'), digestmod='sha256').digest()))
def build_data_to_sign(params):
data_to_sign = []
signed_field_names = params['signed_field_names'].split(',')
for field in signed_field_names:
data_to_sign.append(field + "=" + params[field])
return comma_separate(data_to_sign)
def comma_separate(data_to_sign):
return ','.join(data_to_sign)
When I use my code translation to check your Python code, I checked the values for the variables signed_field_names and DataToSign in your code, and I got the following results:
signed_field_names:
['access_key', 'profile_id', 'transaction_uuid', 'signed_field_names', 'unsigned_field_names', 'signed_date_time', 'locale', 'transaction_type', 'reference_number', 'amount', 'currency']
DataToSign:
['access_key=afc10315b6aaxxxxxcfc912xx812b94c', 'profile_id=E25C4XXX-4622-47E9-9941-1003B7910B3B', 'transaction_uuid=12345', 'signed_field_names=access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency', 'unsigned_field_names=', 'signed_date_time=2021-03-06 16:14:00', 'locale=en', 'transaction_type=credit', 'reference_number=12345', 'amount=50', 'currency=usd']
When I check the values with my code translation attempt, I get these values:
signed_field_names:
['access_key', 'profile_id', 'transaction_uuid', 'signed_field_names', 'unsigned_field_names', 'signed_date_time', 'locale', 'transaction_type', 'reference_number', 'amount', 'currency']
DataToSign:
access_key=afc10315b6aaxxxxxcfc912xx812b94c,profile_id=E25C4XXX-4622-47E9-9941-1003B7910B3B,transaction_uuid=12345,signed_field_names=access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency,unsigned_field_names=,signed_date_time=2021-03-06 16:14:00,locale=en,transaction_type=credit,reference_number=12345,amount=50,currency=usd
So it looks like your DataToSign = list(map('='.join, zip(signed_field_names, values))) line is specifying a list whereas my code attempt is specifying a string based on your original PHP example.
Because of this, I think you'll want to turn the result back into a string like this (though the variable name could also be written differently if you so choose):
DataToSignString = ','.join(DataToSign)
To save time in this long post, I also found that your message variable was different than my translation of your PHP code. To work around this, I made the message variable in your Python code set to the previously mentioned DataToSignString:
# Commenting out previous message line for now
# message = '{} {}'.format(DataToSignString, API_SECRET)
message = DataToSignString
Also, the following changes seem to be needed for your Python example:
signature = hmac.new(bytes(API_SECRET , 'latin-1'), msg = bytes(message , 'latin-1'), digestmod = hashlib.sha256).digest()
base64string = base64.b64encode(bytes(signature))
This way, you have a binary version of the hmac object. Also, it looks like the utf-8 part might not be needed for now in the base64encode part.
Finally, I added a return to return the calculated base64string while also converting it to a string before base64string is returned:
return str(base64string, 'utf-8')
When put together, here is what the modified code from your Python example looks like:
import base64
import datetime
import hashlib
import hmac
import pprint
import uuid
def sign():
access_key = 'afc10315b6aaxxxxxcfc912xx812b94c'
profile_id = 'E25C4XXX-4622-47E9-9941-1003B7910B3B'
transaction_uuid = "12345"
signed_field_names = 'access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency'
signed_date_time = "2021-03-06 16:14:00"
locale = 'en'
transaction_type = "credit"
reference_number = "12345"
amount = "50"
currency = "usd"
# Transform the String into a List
signed_field_names = [x.strip() for x in signed_field_names.split(',')]
# Get Values for each of the fields in the form
values = [access_key, profile_id, transaction_uuid,signed_field_names,'',signed_date_time,locale,transaction_type,reference_number,amount,currency]
# Insert the signedfieldnames in their place in the list (MUST BE KEPT)
values[3] = 'access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency'
# Merge the two lists as one
DataToSign = list(map('='.join, zip(signed_field_names, values)))
DataToSignString = ','.join(DataToSign)
# Hash Sha-256
API_SECRET = 'secret'
message = DataToSignString
signature = hmac.new(bytes(API_SECRET , 'latin-1'), msg = bytes(message , 'latin-1'), digestmod = hashlib.sha256).digest()
base64string = base64.b64encode(bytes(signature))
return str(base64string, 'utf-8')
result = sign()
print("sign_data:")
print(result)
The output for this code (with the given parameters) is:
sign_data:
6V0iIqu3smGmadPK4KvRuHm1nNkuIVLBPbLg7VkA7M8=
The value part of this output should be the same as the PHP output from earlier in this post. The earlier value was 6V0iIqu3smGmadPK4KvRuHm1nNkuIVLBPbLg7VkA7M8= and the latest output showed a result of 6V0iIqu3smGmadPK4KvRuHm1nNkuIVLBPbLg7VkA7M8=.
#summea You are god sent ! Thanks a ton ! I cannot believe how much of an effort you made, I am baffled !
If anyone is attempted to Implement Secure Acceptance / CyberSource and see this message, note that you would not be able to pass value="{{ signed_field_names }} in your front end as it is since the data looks like ['access_key','profile_id'].
You would need to either tweak it before sending it (which somehow gives out a Signature Mismatched on CYBS end) or simply hardcode the input field in payment_confirmation like so : value="access_key,profile_id,transaction_uuid,signed_field_names,unsigned_field_names,signed_date_time,locale,transaction_type,reference_number,amount,currency"/>
Is there any good library (or regex magic) which can convert a blog entry into a blog summary? I'd like the summary to display the first four sentences, first paragraph, or first X number of characters... not really sure what would be the best. Ideally, I would like it to keep html formatting tags such as <a>, <b>, <u> and <i>, but it could remove all other html tags, javascript and css.
More specifically, as input I'd give an html string representing an entire blog post. As output, I'd like an html string which contains the first few sentences, paragraph, or X number of characters. With all potentially unsafe html tags removed. In Python please.
If you're looking at the HTML you'll need to parse it. In addition to aforementioned BeautifulSoup, lxml.html has some nice HTML handling tools.
However if it's a blog you may find it even easier to work with RSS/Atom feeds. Feedparser is fantastic and would make it easy. You'd gain compatibility and durability (because RSS is more defined things will change less) but if the feed doesn't include what you need it won't help you.
I ended up using the gdata library and rolling my own blog summarizer, which uses the gdata library to fetch a Blogspot blog on Google App Engine (wouldn't be hard to port it to other platforms). The code is below. To use it, first set the constant blog_id_constant and then call get_blog_info to return a dictionary with the blog summaries.
I would not trust the code to create summaries of any random blog out there on the internet because it may not remove all unsafe html from the blog feed. However, for a simple blog that you write yourself, the code below should work.
Please feel free to copy but if you see any bugs or would like to make improvements, add them in the comments. (Sorry for the semicolons).
import sys
import os
import logging
import time
import urllib
from HTMLParser import HTMLParser
from django.core.cache import cache
# Import the Blogger API
sys.path.insert(0, 'gdata.zip')
from gdata import service
Months = ["Jan.", "Feb.", "Mar.", "Apr.", "May", "June", "July", "Aug.", "Sept.", "Oct.", "Nov.", "Dec."];
blog_id_constant = -1 # YOUR BLOG ID HERE
blog_pages_at_once = 5
# -----------------------------------------------------------------------------
# Blogger
class BlogHTMLSummarizer(HTMLParser):
'''
An HTML parser which only grabs X number of words and removes
all tags except for certain safe ones.
'''
def __init__(self, max_words = 80):
self.max_words = max_words
self.allowed_tags = ["a", "b", "u", "i", "br", "div", "p", "img", "li", "ul", "ol"]
if self.max_words < 80:
# If it's really short, don't include layout tags
self.allowed_tags = ["a", "b", "u", "i"]
self.reset()
self.out_html = ""
self.num_words = 0
self.no_more_data = False
self.no_more_tags = False
self.tag_stack = []
def handle_starttag(self, tag, attrs):
if not self.no_more_data and tag in self.allowed_tags:
val = "<%s %s>"%(tag,
" ".join("%s='%s'"%(a,b) for (a,b) in attrs))
self.tag_stack.append(tag)
self.out_html += val
def handle_data(self, data):
if self.no_more_data:
return
data = data.split(" ")
if self.num_words + len(data) >= self.max_words:
data = data[:self.max_words-self.num_words]
data.append("...")
self.no_more_data = True
self.out_html += " ".join(data)
self.num_words += len(data)
def handle_endtag(self, tag):
if self.no_more_data and not self.tag_stack:
self.no_more_tags = True
if not self.no_more_tags and self.tag_stack and tag == self.tag_stack[-1]:
if not self.tag_stack:
logging.warning("mixed up blogger tags")
else:
self.out_html += "</%s>"%tag
self.tag_stack.pop()
def get_blog_info(short_summary = False, page = 1, year = "", month = "", day = "", post = None):
'''
Returns summaries of several recent blog posts to be displayed on the front page
page: which page of blog posts to get. Starts at 1.
'''
blogger_service = service.GDataService()
blogger_service.source = 'exampleCo-exampleApp-1.0'
blogger_service.service = 'blogger'
blogger_service.account_type = 'GOOGLE'
blogger_service.server = 'www.blogger.com'
blog_dict = {}
# Do the common stuff first
query = service.Query()
query.feed = '/feeds/' + blog_id_constant + '/posts/default'
query.order_by = "published"
blog_dict['entries'] = []
def get_common_entry_data(entry, summarize_len = None):
'''
Convert an entry to a dictionary object.
'''
content = entry.content.text
if summarize_len != None:
parser = BlogHTMLSummarizer(summarize_len)
parser.feed(entry.content.text)
content = parser.out_html
pubstr = time.strptime(entry.published.text[:-10], '%Y-%m-%dT%H:%M:%S')
safe_title = entry.title.text.replace(" ","_")
for c in ":,.<>!##$%^&*()+-=?/'[]{}\\\"":
# remove nasty characters
safe_title = safe_title.replace(c, "")
link = "%d/%d/%d/%s/"%(pubstr.tm_year, pubstr.tm_mon, pubstr.tm_mday,
urllib.quote_plus(safe_title))
return {
'title':entry.title.text,
'alllinks':[x.href for x in entry.link] + [link], #including blogger links
'link':link,
'content':content,
'day':pubstr.tm_mday,
'month':Months[pubstr.tm_mon-1],
'summary': True if summarize_len != None else False,
}
def get_blogger_feed(query):
feed = cache.get(query.ToUri())
if not feed:
logging.info("GET Blogger Page: " + query.ToUri())
try:
feed = blogger_service.Get(query.ToUri())
except DownloadError:
logging.error("Cant download blog, rate limited? %s"%str(query.ToUri()))
return None
except Exception, e:
web_exception('get_blogger_feed', e)
return None
cache.set(query.ToUri(), feed, 3600)
return feed
def _in_one(a, allBs):
# Return true if a is in one of allBs
for b in allBs:
if a in b:
return True
return False
def _get_int(i):
try:
return int(i)
except ValueError:
return None
(year, month, day) = (_get_int(year), _get_int(month), _get_int(day))
if not short_summary and year and month and day:
# Get one more than we need so we can see if we have more
query.published_min = "%d-%02d-%02dT00:00:00-08:00"%(year, month, day)
query.published_max = "%d-%02d-%02dT23:59:59-08:00"%(year, month, day)
feed = get_blogger_feed(query)
if not feed:
return {}
blog_dict['detail_view'] = True
blog_dict['entries'] = map(lambda e: get_common_entry_data(e, None), feed.entry)
elif not short_summary and year and month and not day:
# Get one more than we need so we can see if we have more
query.published_min = "%d-%02d-%02dT00:00:00-08:00"%(year, month, 1)
query.published_max = "%d-%02d-%02dT23:59:59-08:00"%(year, month, 31)
feed = get_blogger_feed(query)
if not feed:
return {}
blog_dict['detail_view'] = True
blog_dict['entries'] = map(lambda e: get_common_entry_data(e, None), feed.entry)
if post:
blog_dict['entries'] = filter(lambda f: _in_one(post, f['alllinks']), blog_dict['entries'])
elif short_summary:
# Get a summary of all posts
query.max_results = str(3)
query.start_index = str(1)
feed = get_blogger_feed(query)
if not feed:
return {}
feed.entry = feed.entry[:3]
blog_dict['entries'] = map(lambda e: get_common_entry_data(e, 18), feed.entry)
else:
# Get a summary of all posts
try:
page = int(page)
except ValueError:
page = 1
# Get one more than we need so we can see if we have more
query.max_results = str(blog_pages_at_once + 1)
query.start_index = str((page - 1)* blog_pages_at_once + 1)
logging.info("GET Blogger Page: " + query.ToUri())
feed = blogger_service.Get(query.ToUri())
has_older = len(feed.entry) > blog_pages_at_once
feed.entry = feed.entry[:blog_pages_at_once]
if page > 1:
blog_dict['newer_page'] = str(page-1)
if has_older:
blog_dict['older_page'] = str(page+1)
blog_dict['entries'] = map(lambda e: get_common_entry_data(e, 80), feed.entry)
return blog_dict
You will have to parse the html. A nice lib for doing that is BeautifulSoup. It will allow to remove specific tags and extract values (text between tags). The text can than be relatively easily cut down to four sentences, though I'd go for a fixed number of characters, as the sentence length might vary a lot.