words into numbers scrape

words into numbers scrape - python

When I scrape a webpage it returns this: 42,834.56
Apparently, it extracts it as a word (because when I try to sum it whit the other ones retrieve in excel it doesn't work). How can I convert it into a number?
I'm copying from the cmd:
Here is an error when I write it with int():
Traceback (most recent call last):
File "C:\Users\Windows\Desktop\py4e\callao.py", line 337, in <module>
print(int(peso))
ValueError: invalid literal for int() with base 10: '42,834.56\xa0'
Here is an error when I write it with float():
Traceback (most recent call last):
File "C:\Users\Windows\Desktop\py4e\callao.py", line 337, in <module>
print(float(peso))
ValueError: could not convert string to float: '42,834.56\xa0'

You might need to remove ',' from the number. Try this:
float("".join(peso.split(',')))

a = "42,834.56"
b = float(a.replace(",",""))
print(type(b))
# output: <class 'float'>

You can try this.
Strip off the \xa0 character.
Remove the , from the string
Convert it to float.
s = '42,834.56\xa0'
s = s.strip()
print(float(s.replace(',','')))

You can store the scraped value in a variable and convert to float in python. Here's the function:
def convert_to_float(peso):
return float(peso.replace(',',''))
And here's the call to the function:
peso = '42,834.56'
convert_to_float(peso)
Output:
42834.56
Now you can sum it with others.
Edit -
It seems you have scraped \xa0 also along with the string. So to handle that:
def convert_to_float(peso):
peso = peso.split("\\")[0]
return float(peso.replace(',',''))
peso = '42,834.56\xa0'
convert_to_float(peso)
Output will be same as above.

Related

How can I create an array of Unicode characters in Python?

I am using Repl.it.
import array as arr
my_array = arr.array("u", [u"3", u"6", u"9", u"12"])
print(my_array)
print(type(my_array))
print(type(my_array[0]))
The above source code produces the following error:
Traceback (most recent call last):
File "main.py", line 3, in <module>
my_array = arr.array("u", [u"3", u"6", u"9", u"12"])
TypeError: array item must be unicode character
Why the source code isn't working?

Array type "u" corresponds to a single Unicode character. The initializer contains a two-character item u"12". Perhaps you want something like:
arr.array("u", [u"3", u"6", u"9", u"1", u"2"])

ValueError: invalid literal for int() with base 10: '11,440'

This code prints out the tasklist of the OS. However, the following error is seen when it is run. Is there an easy solution to this problem?
"mem_usage":int(m.group(5).decode('ascii', 'ignore'))})
ValueError: invalid literal for int() with base 10: '11,440'
Code:
import re
from subprocess import Popen, PIPE, check_output
def get_processes_running():
"""
Takes tasklist output and parses the table into a dict
"""
tasks = check_output(['tasklist']).decode('cp866',
'ignore').split("\r\n")
p = []
for task in tasks:
m = re.match(b'(.*?)\\s+(\\d+)\\s+(\\w+)\\s+(\\w+)\\s+(.*?)\\s.*', task.encode())
if m is not None:
p.append({"image":m.group(1).decode(),
"pid":int(m.group(2).decode()),
"session_name":m.group(3).decode(),
"session_num":int(m.group(4).decode()),
"mem_usage":int(m.group(5).decode('ascii', 'ignore'))})
return p
def test():
print(*[line.decode('cp866', 'ignore') for line in Popen('tasklist', stdout=PIPE).stdout.readlines()])
lstp = get_processes_running()
for p in lstp:
print(p)
return
if __name__ == '__main__':
test()

I haven't fully read your code, but from your error, I would suggest using the .replace method on the string that you are trying to convert to an integer to remove all commas.
For instance, at the moment, you are doing something like:
>>> int('123,456')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '123,456'
which throws an error due to the comma marks. Simply remove all commas to fix this:
>>> int('123,456'.replace(',',''))
123456

See this:-
>>> int('23ds4')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '23ds4'
This is the same case as yours. That character "," is also not an integer in that provided string. So it is not a valid argument.
You better replace those commas from your integer contained string:-
>>> s = '11,234,234'
>>> s.replace(',','')
'11234234'

Binascii.unhexlify typeerror Odd length string

I have written following script to get rid of non-alphanumeric characters, and get them back afterwards. However I can't seem to figure out why the unhexlify will not work. Any suggestions?
import binascii, timeit, re
damn_string = "asjke5234nlkfs$sfj3.$sfjk."
def convert_string(s):
return ''.join('__UTF%s__' % binascii.hexlify(c.encode('utf-16')) if not c.isalnum() else c for c in s.lower())
def convert_back(s):
for i in re.findall('__UTF([a-f0-9]{8})__', s): # For testing
print binascii.unhexlify(i).decode('utf-16')
return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)
convert = convert_string(damn_string)
print convert
print convert_back(convert)
result in the following output:
asjke5234nlkfs__UTFfffe2400__sfj3__UTFfffe2e00____UTFfffe2400__sfjk__UTFfffe2e00__
$
.
$
.
Traceback (most recent call last):
File "test.py", line 131, in <module>
print convert_back(convert)
File "test.py", line 127, in convert_back
return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)
TypeError: Odd-length string

My bad. I took me a bit too long to realize that re.sub cannot submit the group string in this manner. One way of doing this is:
return re.sub('__UTF([a-f0-9]{8})__', lambda x: binascii.unhexlify(x.group(1)).decode('utf-16'), s)

Why am I receiving this error when sorting a Text file by a certain column based upon number?

My code for the sorting of the file.
g = open('Lapse File.txt', 'r')
column = []
i = 1
next(g)
for line in g:
column.append(int(line.split('\t')[2]))
column.sort()
This is the error I get.
Traceback (most recent call last):
File "E:/Owles/new lapse .py", line 51, in <module>
column.append(int(line.split('\t')[2]))
ValueError: invalid literal for int() with base 10: '-8.3\n
My main question is why is there a \n. Earlier in the code I had written to another text file and wrote it by column from a previously read in file.
This is my code for writing the file
for columns in (raw.strip().split() for raw in Sounding):
if (i >2 and i <=33):
G.write(columns [3]+'\t'+columns[2]+'\t'+columns[4]+'\n')
i = i + 1
elif (i >= 34):
G.write(columns [0]+'\t'+columns[1]+'\t'+columns[2]+'\n')
i = i + 1
else:
i = i + 1
I am unsure if writing the lines like that is the issue because I have inserted the new line function.

The traceback is telling you exactly what happened:
ValueError: invalid literal for int() with base 10: '-8.3\n'
The problem here is that, while int() can handle the negative sign and the trailing newline character, it can't handle the decimal point, '.'. As you know, -8.3 may be a real, rational number, but it's not an integer. If you want to preserve the fractional value to end up with -8.3, use float() instead of int(). If you want to discard the fractional value to end up with -8, use float() to parse the string and then use int() on the result.
-8.3:
column.append(float(line.split('\t')[2]))
-8:
column.append(int(float(line.split('\t')[2])))

Because only numeric strings can be cast to integers; look at this:
numeric_string = "109"
not_numeric_string = "f9"
This is okay:
>>> int(numeric_string)
109
And it cannot be cast:
>>> int(not_numeric_string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'f9'
So somewhere in your script it is getting a non-numeric string.
It seems as though the "-8.3\n" string sequence has raised the error, so you must strip escape chars as well.

python convert string "0" to float gives error

i have the following code in my python script, to launch an application and grab the output of it.
An example of this output would be 'confirmed : 0'
Now i only want to know the number, in this case zero, but normally this number is float, like 0.005464
When i run this code it tells me it cannot convert "0" to float. What am i doing wrong?
This is the error i get now:
ValueError: could not convert string to float: "0"
cmd = subprocess.Popen('/Applications/Electrum.app/Contents/MacOS/Electrum getbalance', shell=True, stdout=subprocess.PIPE)
for line in cmd.stdout:
if "confirmed" in line:
a,b=line.split(': ',1)
if float(b)>0:
print "Positive amount"
else:
print "Empty"

According to the exception you got, the value contained in b is not 0, but "0" (including the quotes), and therefore cannot be converted to a float directly. You'll need to remove the quotes first, e.g. with float(b.strip('"')).
As can be seen in the following examples, the exception description does not add the quotes, so they must have been part of the original string:
>>> float('"0"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: "0"
>>> float('a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: a

I have tested the code and found that split(': ', 1) result contains string
>>> line = "1: 456: confirmed"
>>> "confirmed" in line
True
>>> a,b=line.split(': ', 1)
>>> a
'1'
>>> b
'456: confirmed'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

words into numbers scrape - python

You might need to remove ',' from the number. Try this: float("".join(peso.split(',')))

a = "42,834.56" b = float(a.replace(",","")) print(type(b)) # output: <class 'float'>

You can try this. Strip off the \xa0 character. Remove the , from the string Convert it to float. s = '42,834.56\xa0' s = s.strip() print(float(s.replace(',','')))

Related

How can I create an array of Unicode characters in Python?

ValueError: invalid literal for int() with base 10: '11,440'

Binascii.unhexlify typeerror Odd length string

Why am I receiving this error when sorting a Text file by a certain column based upon number?

python convert string "0" to float gives error

Categories

Resources