I am trying to build a GAE app that processes an RSS feed and stores all the data from the feed into Google Datastore. I use Minidom to extract content from the RSS feed. I also tried using Feedparser and BeautifulSoup but they did not work for me.
My app currently parses the feed and saves it in the Google datastore in about 25 seconds on my local machine. I uploaded the app and I when I tried to use it, I got the "DeadLine Exceeded Error".
I would like to know if there are any possible ways to speed up this process? The feed I use will eventually grow to have more than a 100 items over time.
It shouldn't take anywhere near that long. Here is how you might use the Universal Feed Parser.
# easy_install feedparser
And an example of using it:
import feedparser
feed = 'http://stackoverflow.com/feeds/tag?tagnames=python&sort=newest'
d = feedparser.parse(feed)
for entry in d['entries']:
print entry.title
The documentation shows you how to pull other things out of a feed. If there is a specific issue you have, please post the details.
I found a way to work around this issue, though I am not sure if this is the optimal solution.
Instead of Minidom I have used cElementTree to parse the RSS feed. I process each "item" tag and its children in a seperate task and add these tasks to the task queue.
This has helped me avoid the DeadlineExceededError. I get the "This resource uses a lot of CPU resources" warning though.
Any idea on how to avoid the warning?
A_iyer
I have a GAE RSS reader demo / prototype working using Feedparser - http://deliciourss.appspot.com/. Here's some code -
Fetch your feed.
data = urlfetch.fetch(feedUrl)
Parse with Feedparser
parsedData = feedparser.parse(data.content)
Change some features of the feed
# set main section to description if empty
for ix in range(len(parsedData.entries)):
bItem = 0
if hasattr(parsedData.entries[ix],'content'):
for item in parsedData.entries[ix].content:
if item.value:
bItem = 1
break
if bItem == 0:
parsedData.entries[ix].content[0].value = parsedData.entries[ix].summary
else:
parsedData.entries[ix].content = [{'value':parsedData.entries[ix].summary}]
Template if you are using Django/webapp
<?xml version="1.0" encoding="utf-8"?>
<channel>
<title>{{parsedData.channel.title}}</title>
<url>{{feedUrl}}</url>
<id>{{parsedData.channel.id}}</id>
<updated>{{parsedData.channel.updated}}</updated>
{% for entry in parsedData.entries %}
<item>
<id>{{entry.id}}</id>
<title>{{entry.title}}</title>
<link>
{% for link in entry.links %}
{% ifequal link.rel "alternate" %}
{{link.href|escape}}
{% endifequal %}
{% endfor %}
</link>
<author>{{entry.author_detail.name}}</author>
<pubDate>{{entry.published}}</pubDate>
<description>{{entry.summary|escape}}</description>
{% for item in entry.content %}
{% if item.value %}
<content>{{item.value|escape}}</content>
{% endif %}
{% endfor %}
</item>{% endfor %}
</channel>
Related
I currently use txt editor to modify my email newsletter code that i send to vendors.
It is tedious, however it does allow for me to create a html table and edit it and send out to customers.
I thought it might be easier to code something that would allow me to type the input and the program would then give me and output of the formatted HTML code ready to copy and paste to my newsletter email service.
this is what i currently have, but it does not print the code with the necessary input values.
this is what i have so far
"""
VAN_REEFER = input("VAN OR REEFER OR POWER ONLY?")
PICKUP_LOCATION = input("WHERE DOES THIS PICKUP?")
PICKUP_TIME = input("WHAT TIME DOES THIS PICKUP?")
DROP_LOCATION = input("WHERE DOES THIS DROP?")
DROP_TIME = input("WHAT TIME DOES THIS DROP?")
ACCEPT_NOW = input("ACCEPT NOW RATE")
print ("""<tr>
<td>(VAN_REEFER)</td>
<td>(PICKUP_LOCATION)<br><b>(PICKUP_TIME)</b></br></td>
<td>(DROP_LOCATION)<br><b>DROP_TIME</b></br></td>
<TD><B>(ACCEPT_NOW)</B></TD>
<td>
<a href="mailto:Dispatch%40MYEMAIL.com?subject=%20%F0%9F%90%A2%20-%20%20-
PICK-
(PICKUP_TIME)
%20-%20-
(PICKUP_LOCATION)
%20
TO
%20-
(DROP_TIME)
-%20
(DROP_LOCATION)
DROP-
%20-%20
(VAN_REEFER)
%0A&body=
I%20HAVE%20A%20TRUCK%20FOR%20THIS%20LOAD-
%20%0A
MY%20RATE%20IS-
%20%0A
THIS%20IS%20MY%20ETA%20TO%20PICKUP%20THIS%20LOAD-
%20%0A
THIS%20IS%20MY%20PHONE%20NUMBER-
%20%0A">🐢<b>BID HERE</b>🐢</a>
</td>
</tr>
""")
"""
what i would like for this do is to take as many inputs that i want and when i am done it would recreate that code, with the different inputs that were entered by the end user. sometimes there might be 1 or 2 sets of inputs, sometimes there might be a dozen. once completed the program would compile each input into the code that i need to copy and paste to my email marketing service.
can someone help?
Take a look at template engines such as Jinja or mako.
You can write html templates like this (example is from this Jinja tutorial):
<p>My string: {{my_string}}</p>
<p>Value from the list: {{my_list[3]}}</p>
<p>Loop through the list:</p>
<ul>
{% for n in my_list %}
<li>{{n}}</li>
{% endfor %}
</ul>
and the template engine will dynamically replace the variables given in the double curly brackets {{}}. You can even create loops and conditions using {% ... %}.
I'd like to extend the behaviour of trans by rendering variables not as as values from the context, but instead as html (without using the context). My aim is to be able to populate those variables on the client through JavaScript.
Jinja as it seems doesn't allow for a great deal of customisation of this kind or I'm just unable to find the right hooks.
Here's what I'd like to achieve:
{% etrans name=username %}
My name is {{ name }}
{% endetrans %}
This should render to:
My name is <span id='#username'></span>
Of course, I could just use the normal {% trans %} directive and pass my html code to template.render(html_code_params), but that would require to have them defined in the template and the rendering code which I'd like to avoid.
Here's what I got so far (not much) which allows for a new etrans tag and the ability to use whatever goodies InternationalizationExtension has to offer.
from jinja2.ext import InternationalizationExtension
from jinja2.runtime import concat
class JavaScriptVariableExtension(InternationalizationExtension):
tagname = 'etrans'
tags = set([tagname])
def _parse_block(self, parser, allow_pluralize):
"""Parse until the next block tag with a given name.
Copy from InternationalizationExtension, as this uses hardcoded
`name:endtrans` instead of relying on tag name
"""
referenced = []
buf = []
while 1:
if parser.stream.current.type == 'data':
buf.append(parser.stream.current.value.replace('%', '%%'))
next(parser.stream)
elif parser.stream.current.type == 'variable_begin':
next(parser.stream)
name = parser.stream.expect('name').value
referenced.append(name)
buf.append('%%(%s)s' % name)
parser.stream.expect('variable_end')
elif parser.stream.current.type == 'block_begin':
next(parser.stream)
# can't use hardcoded "endtrans"
# if parser.stream.current.test('name:endtrans'):
if parser.stream.current.test('name:end%s' % self.tagname):
break
elif parser.stream.current.test('name:pluralize'):
if allow_pluralize:
break
parser.fail('a translatable section can have only one '
'pluralize section')
parser.fail('control structures in translatable sections are '
'not allowed')
elif parser.stream.eos:
parser.fail('unclosed translation block')
else:
assert False, 'internal parser error'
return referenced, concat(buf)
i18n_extended = JavaScriptVariableExtension
I don't mind overloading more methods (although the reason for above one should perhaps fixed upstream).
Stepping through the code is quite an interesting adventure. However, I hit a snag and am interested if anyone can give some advice.
The problem I see is that during the compilation, the function context.resolve() gets baked into the compiled code. jinja2.jinja2.compiler.CodeGenerator doesn't really allow any different handling here (correct me if I'm wrong). Ideally, I would define another node (for the variable) and this node would handle the way it's dealt with during compilation, but I don't see how this is possible. I might be too focussed on this as a solution, so perhaps someone can provide alternatives.
As suggested by #Garrett's comment, a much easier solution is to pass in a function to the template renderer that interpolates the variables. In my case, my target client-side framework is Angular, but this also works for any JS variables that you want to use within a {% trans %} environment. Here are the building blocks:
def text_to_javascript(string):
# modify as needed...
return "<span>{{ %s }}</span>" % string
def render():
tmpl = jinja_env.get_template(template_filename)
return tmpl.render({'js': text_to_javascript})
And this how I make use of it in the template file:
{% trans username=js('user.name') %}
My name is {{ username }}
{% endtrans %}
In the Angular controller, the variable user is bound to the $scope like so:
$scope.user = {'name': 'Bugs Bunny'}
I am running into a rather weird issue while parsing results of a salt command. The command I am running is
{% set hostname = salt['publish.publish']('roles:*{}*'.format(role), 'grains.item', 'fqdn', 'grain') %}
And output looks below:
OrderedDict([('1.server.com', OrderedDict([('fqdn', '1.server.com')])), ('0.server.com', OrderedDict([('fqdn', '0.server.com')]))])
Now my understanding is when I do items() on above result with a line below, it should work
{% for hostname, fqdn in salt['publish.publish']('roles:*{}*'.format(role), 'grains.item', 'fqdn', 'grain').items() %}
But the moment I use items() in above line I start running into an error:
failed: Jinja variable 'None' has no attribute 'items'
I tried a couple of other ways (Doing items().items() or storing result in a variable and then running for loop over) to get the list out of OrderedDict but none of ways seem to help.
Either I don't know Python enough or there is something weird going on. Simply adding a check has made the above work. So working block looks like (Partial code of course):
{% set hostname = salt['publish.publish']('roles:*{}*'.format(role), 'grains.item', 'fqdn', 'grain') %}
{% if hostname is not none %}
{% for host, site in hostname.items() %}
My understanding is if check was only meant for checking just in case hostname is empty. But looks like even if there is data - an if check is needed. Still curious to know the mystery!
I would like to do some basic pillar value settings for all boxes so that I can use them later in a unified way. Our minions are usually named in this format:
<project>-<env>.<role>-<sequence>.<domain>
Example pillar/base/top.sls:
base:
'*':
- basics
'I#project:mycoolproject and I#role:nginx':
- etc.
Example pillar/base/basics/init.sls:
{% if '-live.' in grains['id'] %}
env: production
{% elif '-qa.' in grains['id'] %}
env: qa
{% elif '-staging.' in grains['id'] %}
env: staging
{% else %}
env:
{% endif %}
{% set role = re.match("(?:live|qa|staging)\.([a-z_\-]+)\-', grains['id']).group(1) -%}
role: {{ role }}
The env part obviously works but I can't get the regex working. As far as I understood there is no way to import python module (i.e. import re) in jinja template. Any suggestions how to get regex functionality available in the pillar file if possible at all?
The simple answer is, "no". There is not a way to inject regex functionality directly into the jinja environment (I'm sure there's a way to extend jinja, but anyway..)
The way I addressed this was with an external module function, id_info.explode() and an external pillar.
Enable external modules on the master:
external_modules: /srv/extmod
External modules do not require any sort of special infrastructure--they are just regular python modules (not packages, mind you--the loader doesn't currently know how to properly side-load a package yet)
Put your python+regex logic there. Return a dictionary, assembled to your your liking.
Your external module would go in /srv/extmod/modules. You can call call this function from your pillar.sls
{% id_info = __salt__[id_info.explode()] -%}
{% subcomponent = id_info['subcomponent'] -%}
{% project = id_info['project'] -%}
etc...
A couple things to know:
The salt-master has to be restarted when an external module is added or modified. There isn't a way that I know of to incite the equivalent of a saltutil.refresh_modules() call on the salt-master, so there ya go.
The external_modules directive is not just for execution modules. In this scenario, you would also create /srv/extmod/{pillar,runners,outputers,etc}.
These modules are only available on the master
I'm getting the response from server that is escaped:
'item':'<b> Some Data </b>'
I pass such data to template useing item= json.loads(response)
By default django templates (in Google App Engine) escapes it further,
so its double escaped in results.
I can use safe to remove one level of escaping like:
{{item|safe}}
How do i turn entities to their corresponding signs?
You can do this:
{% autoescape off %}
{{ your_text_var }}
{% endautoescape %}
Warning - THIS IS NOT A RECOMMENDED SOLUTION. You should be using autoescaping instead (check Rafael's answer).
Following should do the job.
response.replace('&', '&').replace('<', '<').replace('>', '>')
Update -
After suggestion by Jan Schär, you should rather use the following :
response.replace('<', '<').replace('>', '>').replace('&', '&')
Because, if response is >, it would result in > instead of the correct >. You should resolve & in the last.