I am trying to use this package: http://pypi.python.org/pypi/guess-language/0.1
I've read the documentation/wiki, but can't find the solution.
Basically, this package allows you to pass in a string, and it will return a "language".
I'm able to make it print out "en".
htmlSource = download('http://feeds.feedburner.com/nchild')
soup = BeautifulStoneSoup(htmlSource)
justwords = ''.join(soup.findAll(text=True))
justwords = justwords.encode('utf-8')
true_lang = guess_language.guessLanguage(justwords)
I'd like to know...how does this person print out the scores/accuracy of the guess?
Am I passing the string correctly to the python library?
By modifying the source code of the module, from the sounds of this quote (taken from the answer you linked to):
Update after some experimentation, including inserting a print statement to show what script blocks were detected with what percentages
(Emphasis added by me.)
Related
I am using an API I found online for one of my scripts, and I am wondering if I can change one word from the API to something else. My code is:
import requests
people = requests.get('https://insult.mattbas.org/api/insult')
print("Welcome to the insult machine!\nType somebody you want to insult!")
b = input()
print(people.replace("You", b))
Is replace not a command? If so, what plugin and/or commands would I need to do it? Thanks!
The value returned from requests.get isn’t a string, it’s a response object and that class has no replace method.
Have a look at the structure of that class. For example, you can do r = requests.get(...) and r.text.replace(...).
In other words, you need to operate on the text part of the response object.
This is a WebSphere related question.
I am trying to turn this command into variables
AdminConfig.modify('(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)'
I've found that this command:
AdminConfig.list('J2EEResourceProperty', 'URL*cam_group*)').splitlines()
Will return:
['URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)', 'URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1355156316906)']
So I turned that command into a variable:
j2ee = AdminConfig.list('J2EEResourceProperty', 'URL*cam_group*)').splitlines()
And i'm able to get the string that I want by typing "j2ee[0]" I get
'URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)'
So that is exactly what I wanted, minus the URL part in the front. How can I get rid of those characters?!
I'm not sure if I understood your requirement, but it seems to me that you want to modify some attributes of J2EEResourceProperty object.
If this is the case, then you don't need to remove that "URL" string, actually you shouldn't do that. The string 'URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)' fully identifies WebSphere configuration object. Try this:
AdminConfig.modify('URL(cells/taspmociias204Cell01/clusters/cam_group|resources.xml#J2EEResourceProperty_1324400045826)', [['value', 'the new value'], ['description', 'the new description']])
BTW: you can also try using WDR library (https://github.com/WDR/wdr/). Then your script would look as follows:
prop = listConfigObjects('J2EEResourceProperty')[0]
prop.value = 'the new value'
prop.description = 'the new description'
Disclosure: I'm one of WDR contributors.
You could always use a simple replace regular expression to parse out the URL part.
For example:
import re
mystr = 'URL(blahblahblah)'
re.sub(r'^URL', "", mystr)
This is a handy tool to learn and test your regular expressions to make sure they are correct.
http://gskinner.com/RegExr/
I'm using a library ABPY (library here) for python but it is in older version i think. I'm using Python 3.3.
I did fix some PRINT errors, but that's how much i know, I'm really new on programing.
I want to fetch some webpage and filter it from advertising and then print it again.
EDITED after Sg'te'gmuj told me how to convert from python 2.x to 3.x this is my new code:
#!/usr/local/bin/python3.1
import cgitb;cgitb.enable()
import urllib.request
response = urllib.request.build_opener()
response.addheaders = [('User-agent', 'Mozilla/5.0')]
response = urllib.request.urlopen("http://www.youtube.com")
html = response.read()
from abpy import Filter
with open("easylist.txt") as f:
ABPFilter = Filter(file('easylist.txt'))
ABPFilter.match(html)
print("Content-type: text/html")
print()
print (html)
Now it is displaying a blank page
Just took a peek at the library, it seems that the file "easylist.txt" does not exist; you need to create the file, and populate it with the appropriate filters (in whatever format ABP specifies).
Additionally, it appears it takes a file object; try something like this instead:
with open("easylist.txt") as f:
ABPFilter = Filter(f)
I can't say this is wholly accurate though since I have no experience with the library, but looking at it's code I'd suspect either of the two are the problem, if not both.
Addendum #1
Looking at the code more in-depth, I have to agree that even if that fix I supplied does work, you're going to have more problems (it's in 2.x as you suggested, when you're using 3.x). I'd suggest utilizing Python's 2to3 function, to convert from typical Python 2 to Python 3 code (it's not foolproof though). The command line would be as so:
2to3 -w abpy.py
That will convert it from Python 2.x to 3.x code, and re-write the source file.
Addendum #2
The code to pass the file object should be the "f" variable, as shown above (modified to represent that; I wasn't paying attention and just left the old file function call in the argument).
You need to pass a URI to the function as well:
ABPFilter.match(URI)
You'll need to modify the code to pass those items into an array (I'm assuming at least); I'm playing with it now to see. At present I'm getting a rule error (not a Python error; but merely error handling used by abpy.py, which is good because it suggests that it's the right train of thought).
The code for the Filter.match function is as following (after using the 2to3 Python script):
def match(self, url, elementtype=None):
tokens = RE_TOK.split(url)
print(tokens)
for tok in tokens:
if len(tok) > 2:
if tok in self.index:
for rule in self.index[tok]:
if rule.match(url, elementtype=elementtype):
print(str(rule))
What this means is you're, at present, at a point where you need to program the functionality; it appears this module only indicates the rule. However, that is still useful.
What this means is that you're going to have to modify this function to take the HTML, in place of the the "url" parameter. You're going to regex the HTML (this may be rather intensive) for a list of URIs and then run each item through the match loop Where you go from there to actually filter the nodes, I'm not sure; but there is a list of filter types, so I'm assuming there is a typical procedural ABP does to remove the nodes (possibly, in some cases merely by removing the given URI from the HTML?)
References
http://docs.python.org/3.3/library/2to3.html
I have an application in which the main strings are in English and then various translations are made in various .po/.mo files, as usual (using Flask and Flask-Babel). Is it possible to get a list of all the English strings somewhere within my Python code? Specifically, I'd like to have an admin interface on the website which lets someone log in and choose an arbitrary phrase to be used in a certain place without having to poke at actual Python code or .po/.mo files. This phrase might change over time but needs to be translated, so it needs to be something Babel knows about.
I do have access to the actual .pot file, so I could just parse that, but I was hoping for a cleaner method if possible.
You can use polib for this.
This section of the documentation shows examples of how to iterate over the contents of a .po file. Here is one taken from that page:
import polib
po = polib.pofile('path/to/catalog.po')
for entry in po:
print entry.msgid, entry.msgstr
If you alredy use babel you can get all items from po file:
from babel.messages.pofile import read_po
catalog = read_po(open(full_file_name))
for message in catalog:
print message.id, message.string
See http://babel.edgewall.org/browser/trunk/babel/messages/pofile.py.
You alredy can try get items from mo file:
from babel.messages.mofile import read_mo
catalog = read_po(open(full_file_name))
for message in catalog:
print message.id, message.string
But when I try use it last time it's not was availible. See http://babel.edgewall.org/browser/trunk/babel/messages/mofile.py.
You can use polib as #Miguel wrote.
I have an app that will show images from reddit. Some images come like this http://imgur.com/Cuv9oau, when I need to make them look like this http://i.imgur.com/Cuv9oau.jpg. Just add an (i) at the beginning and (.jpg) at the end.
You can use a string replace:
s = "http://imgur.com/Cuv9oau"
s = s.replace("//imgur", "//i.imgur")+(".jpg" if not s.endswith(".jpg") else "")
This sets s to:
'http://i.imgur.com/Cuv9oau.jpg'
This function should do what you need. I expanded on #jh314's response and made the code a little less compact and checked that the url started with http://imgur.com as that code would cause issues with other URLs, like the google search I included. It also only replaces the first instance, which could causes issues.
def fixImgurLinks(url):
if url.lower().startswith("http://imgur.com"):
url = url.replace("http://imgur", "http://i.imgur",1) # Only replace the first instance.
if not url.endswith(".jpg"):
url +=".jpg"
return url
for u in ["http://imgur.com/Cuv9oau","http://www.google.com/search?q=http://imgur"]:
print fixImgurLinks(u)
Gives:
>>> http://i.imgur.com/Cuv9oau.jpg
>>> http://www.google.com/search?q=http://imgur
You should use Python's regular expressions to place the i. As for the .jpg you can just append it.