I'm pretty new to python and programming in general. I'm currently working on a script to scrape stock quotes from Google finance. Here is my code:
import urllib.request as ur
import re
def getquote(symbol):
base_url = 'http://finance.google.com/finance?q='
content = ur.urlopen(base_url + symbol).read()
m = re.search(b'id="ref_(.*?)">(.*?)<', content)
if m:
quote = m.group(2)
else:
quote = 'no quote available for: ' + symbol
return quote
which returns:
b'655.65'
(655.65 is the current price of Google stock which is the symbol I passed in)
My question is: is there a way for me to either scrub the return so I just get the price without the b or the quotations? Ideally I'd like to have it returned as a float but if need be I can have it return as a string and convert it to a float when I need it later.
I've referenced these other posts:
How to create a stock quote fetching app in python
Python TypeError on regex
How to convert between bytes and strings in Python 3?
Convert bytes to a Python string
Perhaps I've missed something in one of those but I believe I've tried everything I could find and it is still returning in the format shown above.
SOLVED
The problem I was having wasn't displaying a string without quotes, it was that I had a value set to a byte literal that needed to first be converted to a string literal and then to a float. I had tried this but I tried this outside of the if statement (noob move). the solution was as v1k45 suggested:
add a line in the if statement
quote = float(quote.decode('utf-8'))
to decode it and convert to float.
thanks for the help!
Add a line in the if condition:
quote = float(quote.decode('utf-8'))
You have to decode the bytes to unicode to return a proper string. Use float() to convert it into a float.
Related
I have a python script that iterate among data format values and returns back just hour.
Below is the similar script(that I use for iteration):
zaman = "06:00:00" (hours:minutes:seconds)
hm = zaman.split(":")
vaxt = [hm[1]]
saat = float(hm[0]) + float(float(hm[1])/60)
print(f"{saat:,.2f}")
In one of the files which has several rows I get the error:
ValueError: could not convert string to float: ' caing low enough for rotary follow on'
I have checked myself, that this row do not differ from the previous ones, but I get an error on that one.
Do you have suggestions on how to solve it? (may be getting hours from DateTime in another way)
The issue is that you're not correctly identifying the datetime in the string, so you end up trying to convert the wrong bit to a float.
One potential fix for this would be to not rely on splitting the string at the :s, but instead to use a regex to look for the part of the string with the appropriate format.
For example:
import re
test_string = 'this is a string with 06:00:00 in it somewhere'
matches = re.search('(\d{2}):(\d{2}):(\d{2})', test_string)
matches = [float(m) for m in matches.groups()]
print(matches)
# [6.0,0.0,0.0]
I have tested the code you provided above and it works. However, after doing some research it appears:
The Python "ValueError: could not convert string to float" occurs when we pass a string that cannot be converted to a float (e.g. an empty string or one containing characters) to the float() class. To solve the error, remove all unnecessary characters from the string.
So check your file to make sure the input is clean for float() to work perfectly.
Hi I have the following data (abstracted) that comes from an API.
"Product" : "T\u00e1bua 21X40"
I'm using the following code to decode the data byte:
var = json.loads(cleanhtml(str(json.dumps(response.content.decode('utf-8')))))
The cleanhtml is a regex function that I've created to remove html tags from the returned data (It's working correctly). Although, decode(utf-8) is not removing characters like \u00e1. My expected output is:
"Product" : "Tábua 21X40"
I've tried to use replace("\\u00e1", "á") but with no success. How can I replace this type of character and what type of character is this?
\u00e1 is another way of representing the á character when displaying the contents of a Python string.
If you open a Python interactive session and run print({"Product" : "T\u00e1bua 21X40"}) you'll see output of {'Product': 'Tábua 21X40'}. The \u00e1 doesn't exist in the string as those individual characters.
The \u escape sequence indicates that the following numbers specify a Unicode character.
Attempting to replace \u00e1 with á won't achieve anything because that's what it already is. Additionally, replace("\\u00e1", "á") is attempting to replace the individual characters of a slash, a u, etc and, as mentioned, they don't actually exist in the string in that way.
If you explain the problem you're encountering further then we may be able to help more, but currently it sounds like the string has the correct content but is just being displayed differently than you expect.
what type of character is this
Here
"Product" : "T\u00e1bua 21X40"
you might observe \u escape sequence, it is followed by 4 hex digits: 00e1, note that this is different represenation of same character, so
print("\u00e1" == "á")
output
True
These type of characters are called character entities. There are different types of entities and this is JSON entity. For demonstration, enter your string here and click unescape.
For your question, if you are using python then you can solve the issue by importing json module. Then you have to decode it as follows.
import json
string = json.loads('"T\u00e1bua 21X40"')
print(string)
I have to parse some web data that is fetched from web. It is quite possible that web content can be of different regional languages that I am handling witout any problem. But there are some invalid characters appearing in some string like
I am wokring
8qîÚ4½-ôMºÝCQ´Dɬ)Q+R±}Ûýï7üÛ²ëlY&53|8ïôóg/^ÿûêþ?ï¯a #ï?¼ºy{5+B^ß¿ß~¾¿½¦ÓûÆk.c¹~WÚ#ë¤KÈh4rF-G¦!¹ÿ¬¦a~µuÓñµ_»|þì
daily statstistics
I have to remove such strange character and onyl extract valid string. I am using python. I am encoding each string with utf-8.
If you mean not-ascii by strange, you could try:
import string
"".join(filter(lambda char: char in string.printable, s))
Where s is your string.
Here are some string constants you can filter for:
https://docs.python.org/3/library/string.html
I am using python requests to connect to a website. I am passing some strings to get data about them.
The problem is some string contains slash /, so when they are passed in url, I got a ValueError.
this is my url:
https://api.flipkart.net/sellers/skus/%s/listings % string
when string is passed (string that does not contain slash), I get:
https://api.flipkart.net/sellers/skus/A35-Charry-228_39/listings
It returns a valid response. but when i pass string which contains a slash:
string = "L20-ORG/BLUE-109(38)"
I get url like:
https://api.flipkart.net/sellers/skus/L20-ORG/BLUE-109(38)/listings
Which throws the error.
how to solve this?
Raw string literals in Python
string = r"L20-ORG/BLUE-109(38)"
You could find more info here and here.
urllib.quote_plus is your friend. As urllib is a module from the standard library, you just have to import it with import urllib.
If you want to be conservative, just use it with default value:
string = urllib.quote_plus("L20-ORG/BLUE-109(38)")
gives 'L20-ORG%2FBLUE-109%2838%29'
If you know that some characters are harmless for your use case (say parentheses):
string = urllib.quote_plus("L20-ORG/BLUE-109(38)", '()')
gives 'L20-ORG%2FBLUE-109(38)'
I'm having an issue figuring out how to properly input base64 data into a string format in python 2.7. Here's the relevant code snippet:
fileExec = open(fileLocation, 'w+')
fileExec.write(base64.b64decode('%s')) %(encodedFile) # encodedFile is base64 data of a file grabbed earlier in the script.
fileExec.close()
os.startfile(fileLocation)
As silly as it may seem, I am required to use the string formatting in this case, due to the what this script is actually doing, but when I launch the script, I receive the following error:
TypeError: Incorrect Padding
I'm not quite sure what I need to do to the '%s' to get this to work. Any suggestions? Am I using the wrong string format?
Update: Here's a better idea of what I'm ultimately trying to accomplish:
encodedFile = randomString() # generates a random string for the variable name to be written
fileExec = randomString()
... snip ...
writtenScript += "\t%s.write(base64.b64decode(%s))\n" %(fileExec, encodedFile) # where writtenScript is the contents of the .py file that we are dynamically generating
I must use string formatting because the variable name will not always be the same in the python file we making.
That error usually means your base64 string may not be encoded properly. But here it is just a side-effect of a logic error in your code.
What you have done is basically this:
a = base64.b64decode('%s')
b = fileExec.write(a)
c = b % (encodedFile)
So you are attempting to decode the literal string "%s", which fails.
It should look more like this:
fileExec.write(base64.b64decode(encodedFile))
[edit: using redundant string format... pls don't do this in real code]
fileExec.write(base64.b64decode("%s" % encodedFile))
Your updated question shows that the b64decode part is inside of a string, not in your code. That is a significant difference. The code in your string is also missing a set of inner quotes around the second format:
writtenScript += "\t%s.write(base64.b64decode('%s'))\n" % (fileExec, encodedFile)
(notice the single quotes...)