Removing \r and \n from list - python

I'm trying to remove \r and \n from a urban dictionary json api but everytime I use re.sub I get this:
expected string or buffer
I'm not sure why though, but here's the code:
elif used_prefix and cmd == "udi" and len(args) > 0 and self.getAccess(user) >= 1:
try:
f = urllib.request.urlopen("http://api.urbandictionary.com/v0/define?term=%s" % args.lower().replace(' ', '+'))
data = json.loads(f.readall().decode("utf-8"))
data = re.sub(r'\s+', ' ', data).replace("\\","")
if (len(data['list']) > 0):
definition = data['list'][0][u'definition']
example = data['list'][0][u'example']
permalink = data['list'][0][u'permalink']
room.message("Urban Dictionary search for %s: %s Example: %s Link: %s" % (args.title(), definition, example, permalink), True)
else: room.message("Word not found.")
except:
room.message((str(sys.exc_info()[1])))
print(traceback.format_exc())
This is the traceback:
Traceback (most recent call last): File "C:\Users\dell\Desktop\b0t\TutorialBot.py", line 2186, in onMessage data = re.sub(r'\s+', ' ', data).replace("\\","") File "C:\lib\re.py", line 170, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer

The problem is that you are trying to use re.sub on a dict rather than a string. Further, your code seems to be a little messy in places. Try this instead:
import urllib2
import json
import re
def test(*args):
f = urllib2.urlopen("http://api.urbandictionary.com/v0/define?term=%s" % '+'.join(args).lower()) # note urllib2.urlopen rather than urllib.request.urlopen
data = json.loads(f.read().decode("utf-8")) # note f.read() instead of f.readall()
if len(data['list']) > 0:
definition = data['list'][0][u'definition']
example = data['list'][0][u'example']
permalink = data['list'][0][u'permalink']
return "Urban Dictionary search for %s: %s Example: %s Link: %s" % (str(args), definition, example, permalink) # returns a string
print test('mouth', 'hugging').replace('\n\n', '\n') # prints the string after replacing '\n\n' with '\n'
The result:
Urban Dictionary search for ('mouth', 'hugging'): When you put a beer bottle in your mouth, and keep your mouth wrapped around it all day. Example: Josh: "mhmgdfhwrmhhh (attempts to talk while drinking a beer)"
Ryan: "You know I can't hear you when you're mouth hugging."
Josh: "mmmffwrrggddsshh" Link: http://mouth-hugging.urbanup.com/7493517

Related

Python Error: Need help resolving 2 errors (I'm a beginner)

I have a python script that downloads all the quotes of goodreads, from the given author by running: goodreadsquotes.py https://www.goodreads.com/author/quotes/1791.Seth_Godin > godin
However, I have problems executing it, since I'm a beginner in using Python. At the moment I have 2 errors. The code is as follows:
from pyquery import PyQuery
import sys, random, re, time
AUTHOR_REX = re.compile('\d+\.(\w+)$')
def grabber(base_url, i=1):
url = base_url + "?page=" + str(i)
page = PyQuery(url)
quotes = page(".quoteText")
auth_match = re.search(AUTHOR_REX, base_url)
if auth_match:
author = re.sub('_', ' ', auth_match.group(1))
else:
author = False
# sys.stderr.write(url + "\n")
for quote in quotes.items():
quote = quote.remove('script').text().encode('ascii', 'ignore')
if author:
quote = quote.replace(author, " -- " + author)
print (quote)
print ('%')
if not page('.next_page').hasClass('disabled'):
time.sleep(10)
grabber(base_url, i + 1)
if __name__ == "__main__":
grabber(''.join(sys.argv[1:]))
After executing:
py goodreadsquotes.py https://www.goodreads.com/author/quotes/1791.Seth_Godin > godin
The error is as follows:
Traceback (most recent call last):
File "goodreadsquotes.py", line 43, in <module>
grabber(''.join(sys.argv[1:]))
File "goodreadsquotes.py", line 34, in grabber
quote = quote.replace(author, " -- " + author)
TypeError: a bytes-like object is required, not 'str'
From the screenshot you have posted ...encode() method in python returns a bytes object, so now quote is no more a string, it is bytes object. So calling replace() on quote requires both the parameters in bytes not str. You can convert author and "--"+author to bytes as shown below : (line 34)
author_bytes = bytes(author, 'ascii')
replace_string_bytes = bytes("--"+author, 'ascii')
#converted author and the replacement string both to bytes
if author_bytes:
quote = quote.replace(author_bytes, replace_string_bytes)

Python 3 - IndexError: string index out of range

I'm currently creating a programming language in Python 3.6 and for some reason, the following code produces an IndexError: string index out of range.
When I try to execute the following code in a Windows Batch File:
#echo off
python run-file.py test.ros
pause
But I'm getting the following output:
Traceback (most recent call last):
File "run-file.py", line 16, in <module>
if not(value[1][0] == "!") and ignoreline == False:
IndexError: string index out of range
Press any key to continue . . .
The run-file.py file looks like this:
from sys import argv as args
from sys import exit as quit
import syntax
try:
args[1]
except IndexError:
print("ERROR: No ROS Code file provided in execution arguments")
print("Ensure the execution code looks something like this: python run-file.py test.ros")
with open(args[1]) as f:
ignoreline = False
content = f.readlines()
content = [x.strip() for x in content]
for value in enumerate(content):
if not(value[1][0] == "!") and ignoreline == False:
firstpart = value[1].split(".")[0]
lenoffirstpart = len(value[1].split(".")[0])
afterpart = str(value[1][lenoffirstpart + 1:])
apwithcomma = afterpart.replace(".", "', '")
preprint = str(firstpart + "(" + apwithcomma + ")")
printtext = preprint.replace("(", "('")
lastprinttext = printtext.replace(")", "')")
try:
exec(str("syntax." + lastprinttext))
except Exception as e:
template = "ERROR: An error of type {0} occured while running line {1} because {2}"
message = template.format(
type(e).__name__, str(value[0] + 1), str(e.args[0]))
print(message)
quit(1)
elif content[value[0]][0] == "!!!":
ignoreline = not(ignoreline)
quit(0)
The syntax.py file looks like this:
def print_message(contents=''):
print(contents)
The test.ros file looks like this:
! This is a single line comment
!!!
This line should be ignored
and this one as well
!!!
print_message.Hello World
The problem appears to be in line 16 of the run-file.py file:
if not(value[1][0] == "!") and ignoreline == False:
I've already tried replacing value[1][0] with (value[1])[0] and other combinations with brackets to no avail.
It seems like when I try to print the value it behaves as expected and gives me ! which is the first character of the test.ros file but for some reason, it throws an exception when it's in the if statement.
If you want any more of the source, it's on Github and you can find the exact commit containing all the files here
Update/Solution
Big thanks to Idanmel and Klaus D. for helping me resolve my issue. You can view the changes I've made here
This happens because the 2nd line in test.ros is empty.
You create content in this example to be:
['! This is a single line comment',
'',
'!!!',
'This line should be ignored',
'and this one as well',
'!!!',
'',
'print_message.Hello World']
When you try to access content[1][0], you get an IndexError because it's an empty string.
Try removing the empty lines from content by adding an if to the list comprehenssion:
content = [x.strip() for x in content if x.strip()]

Get rid of parenthesis in output

I think this is an easy question for you as i am a beginner on python3.
When printing header of fasta file it contains parenthesis. How can i remove them ??
import sys
from Bio import Entrez
from Bio import SeqIO
#define email for entrez login
db = "nuccore"
Entrez.email = "someone#email.com"
#load accessions from arguments
if len(sys.argv[1:]) > 1:
accs = sys.argv[1:]
else: #load accesions from stdin
accs = [ l.strip() for l in sys.stdin if l.strip() ]
#fetch
sys.stderr.write( "Fetching %s entries from GenBank: %s\n" % (len(accs), ", ".join(accs[:10])))
for i,acc in enumerate(accs):
try:
sys.stderr.write( " %9i %s \r" % (i+1,acc))
handle = Entrez.efetch(db=db, rettype="fasta", id=acc)
seq_record = SeqIO.read(handle, "fasta")
if (len(seq_record.seq) > 0):
header = ">" + seq_record.description + " Len:" , len(seq_record.seq)
print(header)
print(seq_record.seq)
except:
sys.stderr.write( "Error! Cannot fetch: %s \n" % acc)
./acc2fasta.py 163345 303239
It will return
(">M69206.1 Bovine MHC class I AW10 mRNA (haplotype AW10), 3' end Len:", 1379)
TCCTGCTGCTCTCGGGGGTCCTGGTCCTGACCGAGACCCGGGCTGGCTCCCACTCGATGAGGTATTTCAGCACCGCCGTGTCCCGGCCCGGCCTCGGGGAGCCCCGGTACCTGGAAGTCGGCTACGTGGACGACACGCAGTTCGTGCGGTTTGACAGCGACGCCCCGAATCCGAGGATGGAGCCGCGGGCGCGGTGGGTGGAGCAGGAGGGGCCGGAGTATTGGGATCGGGAGACGCAAAGGGCCAAGGGCAACGCACAATTTTTCCGAGTGAGCCTGAACAACCTGCGCGGCTACTACAACCAGAGCGAGGCCGGGTCTCACACCCTCCAGTGGATGTCCGGCTGCTACGTGGGGCCGGACGGGCGTCCTCCGCGCGGGTTCATGCAGTTCGGCTACGACGGCAGAGATTACCTCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCGGTGGAGACGATGGCTCAGATCTCCAAACGCAAGATGGAGGCGGCCGGTGAAGCTGAGGTACAGAGGAACTACCTGGAGGGCCGGTGCGTGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGACACGCTGCTGCGCGCAGACCCTCCAAAGGCACATGTGACCCGTCACCCGATCTCTGGTCGTGAGGTCACCCTGAGGTGCTGGGCCCTGGGCTTCTACCCTGAAGAGATCTCACTGACCTGGCAGCGCAATGGGGAGGACCAGACCCAGGACATGGAGCTTGTGGAGACCAGGCCTTCAGGGGACGGAAACTTCCAGAAGTGGGCGGCCCTGTTGGTGCCTTCTGGAGAGGAGCAGAAATACACATGCCAAGTGCAGCACGAGGGGCTTCAGGAGCCCCTCACCCTGAAATGGGAACCTCCTCAGCCCTCCTTCCTCACCATGGGCATCATTGTTGGCCTGGTTCTCCTCGTGGTCACTGGAGCTGTGGTGGCTGGAGTTGTGATCTGCATGAAGAAGCGCTCAGGTGAAAAACGAGGGACTTATATCCAGGCTTCAAGCAGTGACAGTGCCCAGGGCTCTGATGTGTCTCTCACGGTTCCTAAAGTGTGAGACACCTGCCTTCGGGGGACTGAGTGATGCTTCATCCCGCTATGTGACATCAGATCCCCGGAACCCCTTTTTCTGCAGCTGCATCTGAATGTGTCAGTGCCCCTATTCGCATAAGTAGGAGTTAGGGAGACTGGCCCACCCATGCCCACTGCTGCCCTTCCCCACTGCCGTCCCTCCCCACCCTGACCTGTGTTCTCTTCCCTGATCCACTGTCCTGTTCCAGCAGAGACGAGGCTGGACCATGTCTATCCCTGTCTTTGCTTTATATGCACTGAAAAATGATATCTTCTTTCCTTATTGAAAATAAAATCTGTC
Error! Cannot fetch: 303239
How to get rid of parenthesis in output ??
header = ">" + seq_record.description + " Len:" , len(seq_record.seq)
print(header)
You're printing the representation of the tuple by doing so, with commas (expected) but also parentheses (unrequired)
The best way would be to join the data instead, so comma is inserted between the string fields, but tuple representation is left out:
print(",".join(header))
In your case it's a little tricker, you have to convert non-string arguments to string (tuple representation did the conversion but join doesn't):
print(",".join([str(x) for x in header]))
result:
>M69206.1 Bovine MHC class I AW10 mRNA (haplotype AW10), 3' end Len:,1379

Python JSON KeyError for non missing key

For some unknown reason, when I run the below script, the following error is returned along with the desired output. For some reason, this was working without any errors last night. The API output does change every minute but I wouldn't expect a KeyError to be returned. I can't simply pinpoint where this error is coming from:
[u'#AAPL 151204C00128000'] <----- What I want to see printed
Traceback (most recent call last):
File "Options_testing.py", line 60, in <module>
main()
File "Options_testing.py", line 56, in main
if quotes[x]['greeks']['impvol'] > 0: #change this for different greek vals
KeyError: 'impvol'
Here is a little snippet of data:
{"results":{"optionchain":{"expire":"all","excode":"oprac","equityinfo":{"longname":"Apple Inc","shortname":"AAPL"},"money":"at","callput":"all","key":{"symbol":["AAPL"],"exLgName":"Nasdaq Global Select","exShName":"NGS","exchange":"NGS"},"symbolstring":"AAPL"},"quote":[{"delaymin":15,"contract":{"strike":108,"openinterest":3516,"contracthigh":6.16,"contractlow":0.02,"callput":"Put","type":"WEEK","expirydate":"2015-11-13"},"root":{"equityinfo":{"longname":"Apple Inc","shortname":"AAPL"},"key":{"symbol":["AAPL"],"exLgName":"Nasdaq Global Select","exShName":"NGS","exchange":"NGS"}},"greeks":{"vega":0,"theta":0,"gamma":0,"delta":0,"impvol":0,"rho":0}
Code:
#Options screener using Quotemedia's API
import json
import requests
#import csv
def main():
url_auth= "https://app.quotemedia.com/user/g/authenticate/v0/102368/XXXXX/XXXXX"
decode_auth = requests.get(url_auth)
#print decode_auth.json()
#print(type(decode_auth))
auth_data = json.dumps(decode_auth.json())
#Parse decode_auth, grab 'sid'
sid_parsed = json.loads(auth_data)["sid"]
#print sid_parsed
#Pass sid into qm_options
#Construct URL
symbol = 'AAPL'
SID = sid_parsed
url_raw = 'http://app.quotemedia.com/data/getOptionQuotes.json?webmasterId=102368'
url_data = url_raw + '&symbol=' + symbol + '&greeks=true' + '&SID=' + SID
#print url_data
response = requests.get(url_data)
#print response
data = json.dumps(response.json())
#print data
#save data to a file
with open('AAPL_20151118.json', 'w') as outfile:
json.dumps (data, outfile)
#Turn into json object
obj = json.loads(data)
#slim the object
quotes = obj['results']['quote']
#find the number of options contracts
range_count = obj['results']['symbolcount']
#print all contracts with an implied vol > 0
for x in range(0,range_count):
if quotes[x]['greeks']['impvol'] > 0: #change this for different greek vals
print quotes[x]['key']['symbol']
if __name__ == '__main__':
main()
I can provide sample data if necessary.
for x in range(0,range_count):
if quotes[x]['greeks']['impvol'] > 0: #change this for different greek vals
print quotes[x]['key']['symbol']
This loops throug multiple quotes, so maybe there is even just one that does not have an impvol property.
You should add some error handling, so you find out when that happens. Something like this:
# no need to iterate over indexes, just iterate over the items
for quote in quotes:
if 'greeks' not in quote:
print('Quote does not contain `greeks`:', quote)
elif 'impvol' not in quote['greeks']:
print('Quote does not contain `impvol`:', quote)
elif quote['greeks']['impvol'] > 0:
print quote['key']['symbol']

Dictionary changed size during iteration but I don't see where I've changed it

I'm new to Python, so bear with me, but I've tried to create a script that gets synonyms to a word if I don't already have it and add it to my dictionary in JSON format.
Here is my code:
import json, sys, urllib
from urllib.request import urlopen
f = open('dict.json', 'r')
string = json.loads(f.read())
tempString = string
url = 'http://words.bighugelabs.com/api/2/myapicode/%s/json'
def main():
crawl()
def crawl():
for a in string:
for b in string[a]:
for c in string[a][b]:
for d in string[a][b][c]:
if not isInDict(d):
addWord(d, getWord(url % d))
else:
print('[-] Ignoring ' + d)
f.seek(0)
f.write(tempString)
f.truncate()
f.close()
def isInDict(value):
for x in list(tempString.keys()):
if x == value:
return True
return False
def getWord(address):
try:
return urlopen(address).read().decode('utf-8')
except:
print('[!] Failed to get ' + address)
return ''
def addWord(word, content):
if content != None and content != '':
print('[+] Adding ' + word)
tempString[word] = content
else:
print('[!] Ignoring ' + word + ': content empty')
if __name__ == '__main__':
main()
And when running, it works fine up until 'amour' and it give me this:
working fine
[+] Adding sex activity
[+] Adding sexual activity
[+] Adding sexual desire
[+] Adding sexual practice
[-] Ignoring amour
Traceback (most recent call last):
File "crawler.py", line 47, in <module>
main()
File "crawler.py", line 10, in main
crawl()
File "crawler.py", line 13, in crawl
for a in string:
RuntimeError: dictionary changed size during iteration
But I don't see where I've change anything on string and only tempString...
PS: If you want the JSON data I read:
{
"love": {
"noun": {
"syn": ["passion", "beloved", "dear", "dearest", "honey", "sexual love", "erotic love", "lovemaking", "making love", "love life", "concupiscence", "emotion", "eros", "loved one", "lover", "object", "physical attraction", "score", "sex", "sex activity", "sexual activity", "sexual desire", "sexual practice"],
"ant": ["hate"],
"usr": ["amour"]
},
"verb": {
"syn": ["love", "enjoy", "roll in the hay", "make out", "make love", "sleep with", "get laid", "have sex", "know", "do it", "be intimate", "have intercourse", "have it away", "have it off", "screw", "jazz", "eff", "hump", "lie with", "bed", "have a go at it", "bang", "get it on", "bonk", "copulate", "couple", "like", "mate", "pair"],
"ant": ["hate"]
}
}
}
In this line:
string = json.loads(f.read())
tempString = string
You assign tempString to refer to the same dictionary object as string. Then, in addWord you change tempString:
tempString[word] = content
Because tempString is just another reference to the same dictionary object as string, that also changes string.
To avoid this, use:
import copy
tempString = copy.deepcopy(string)
Also, it's generally a bad practice to use variable names like string that are also the names of built in functions. It's not very descriptive, and it'll make you unable to access to built in function conveniently while the name is in scope.
Lets take an example:
>>> for i in d:
... if d[i] == 2:
... d.pop(i)
...
2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around this, here is what is done:
>>> for i in d.keys():
... if d[i] == 2:
... d.pop(i)
...
>>> d
{'one': 1}
SO, for your specific code:
try changing this:
def crawl():
for a in string:
to:
def crawl():
for a in string.keys():
If this does not work, I will look in to your code in more depth at a later time today.

Categories