I have to use python to read a file, this file contains combination of characters, numbers and other stuff.
After reading a single line from the file, how do I check if this line is an integer or a float? (I have to know this information that this line is an integer and float)
I have tried these string methods .isdigit(), .isdecimal(), .isnumeric(), it seems like these methods only return True when there are only all decimal numbers within the string.
Is there any method that can help me to do this task?
P.S.: Can't use try or any exception approach.
============== Content of my File =================
0
[Begin Description]
xxx
[End Description]
1.1
[Begin Description]
....
I want to know if the current line I am reading is integer 0 or float 1.1. That makes my question.
I hope this will help
import re
s = "1236.0"
r = re.compile(r'[1-9]')
r2 = re.compile(r'(\.)')
if re.search(r,s) and re.search(r2,s):
print("Float")
if re.search(r,s) and not re.search(r2,s):
print("Integer")
You should use try and except:
But if you dont want to use it and need different way then use regex:
if re.match(r"[-+]?\d+(\.0*)?$", s):
print("match")
For each line in the file you can check with regex whether it is a float or int or normal string
import re
float_match = re.compile("^[-+]?[0-9]*[.][0-9]+$")
int_match = re.compile("^[-+]?[0-9]+$")
lines = ["\t23\n", "24.5", "-23", "0.23", "-23.56", ".89", "-122", "-abc.cb"]
for line in lines:
line = line.strip()
if int_match.match(line):
print("int")
elif float_match.match(line):
print("float")
else:
print("str")
Result:
int
float
int
float
float
float
int
str
How it works:
int_match = re.compile("^[-+]?[0-9]+$")
^: at the str beginning
[-+]?: optional + or -
[0-9]+: one or more numbers
$: end of string
float_match = re.compile("^[-+]?[0-9]*[.][0-9]+$")
^[-+]?: start with either + or - optional.
[0-9]*: any number of digits or none.
[.]: dot
[0-9]+: one or more digits
$:end
This is faster than re
although this is not type checking but as you are reading string 0 or 1.1 you can do simply like
line='1.1'
if '.' in line:
print("float")
else:
print("int")
Try this :
import re
line1 = '0'
line2 = 'description one'
line3 = '1.1'
line4 = 'begin description'
lines = [line1, line2, line3, line4] # with readlines() you can get it directly
for i in lines:
if re.findall("[+-]?\d+", i) and not re.findall("[+-]?\d+\.\d+", i):
print('int found')
elif re.findall("[+-]?\d+\.\d+", i):
print('float found')
else:
print('no numeric found')
OUTPUT :
int found
no numeric found
float found
no numeric found
You could split it into words using .split() and use string methods.
Example Code (Note that split method argument should be changed to comma if you use it in floats instead of dot):
def float_checker(strinput):
digit_res = None
for part in strinput.split('.'):
digit_res = True if part.isnumeric() else False
if digit_res:
return True
return False
if __name__ == '__main__':
while True:
print(float_checker(input('Input for float check (Stop with CTRL+C): ')))
Related
I have this python function here. However when running I'm getting an error "could not convert string to float: '$50.50'". I need to add the input as a string including the "$" sign and then convert it to a float number in the backend. The function seems to work adding only "50.50" as input.
def main():
dollars = dollars_to_float(input("How much was the meal? "))
percent = percent_to_float(input("What percentage would you like to tip? "))
tip = dollars * percent/100
print(f"Leave $" , str(tip) )
def dollars_to_float(d):
str.lstrip(d)
return float(d)
def percent_to_float(p):
str.rstrip(p)
return float(p)
main()
Error:
File "/Users/nelsonlamounier/indoor/indoor.py", line 13, in <module>
main()
File "/Users/nelsonlamounier/indoor/indoor.py", line 9, in dollars_to_float
return float(d)
File "/Users/nelsonlamounier/indoor/indoor.py", line 9, in dollars_to_float
return float(d)
ValueError: could not convert string to float: '$50.50'
For a more general case, update dollars_to_float with the following:
def dollars_to_float(d):
d = ''.join([ch for ch in d if ch.isnumeric() or ch == '.'])
return float(d)
Just remove $ before number like:
def dollars_to_float(d):
str.lstrip(d)
return float(d.replace("$",""))
Your code is error prone, as a single space character would already be able to break it. A more robust version of the currency to float conversion can be found here Remove currency symbols and literals from a string with a price universal solution
You use locale to see what is the decimal point character (, or .), and regex to remove all characters other than numerals and the decimal point.
import re
import locale
def dollars_to_float(price_string: str) -> float:
decimal_point_char = locale.localeconv()['decimal_point']
clean = re.sub(r'[^0-9'+decimal_point_char+r']+', '', str(price_string))
return float(clean)
If you want to remove specific characters from the beginning or end of a string then use strip()
In this case your dollars_to_float and percent_to_float could be generalised to:
def to_float(d):
return float(d.strip('$%'))
This has the advantage of supporting the $ and % characters at either the beginning or end of the string.
However, say the input is like this '$1,200'
That's going to fail due to the thousands separator. Also, why limit functionality to just dollars. Why not allow for an arbitrary preamble to the input string - e.g., USD1,200 or £1,234,50
Here's a reasonably robust approach to this problem.
import re
from locale import LC_NUMERIC, setlocale, localeconv
class CC:
dp = None
def __init__(self):
if CC.dp is None:
setlocale(LC_NUMERIC, '')
dp = localeconv()['decimal_point']
CC.dp = f'[^0-9{dp}+-]+'
def conv(self, s):
return float(re.sub(CC.dp, '', s))
print(CC().conv('$1,200.5'))
print(CC().conv('£1,200'))
print(CC().conv('EUR1234.5'))
print(CC().conv('6543.21'))
Output:
1200.5
1200.0
1234.5
6543.21
I'm trying to read a null terminated string but i'm having issues when unpacking a char and putting it together with a string.
This is the code:
def readString(f):
str = ''
while True:
char = readChar(f)
str = str.join(char)
if (hex(ord(char))) == '0x0':
break
return str
def readChar(f):
char = unpack('c',f.read(1))[0]
return char
Now this is giving me this error:
TypeError: sequence item 0: expected str instance, int found
I'm also trying the following:
char = unpack('c',f.read(1)).decode("ascii")
But it throws me:
AttributeError: 'tuple' object has no attribute 'decode'
I don't even know how to read the chars and add it to the string, Is there any proper way to do this?
Here's a version that (ab)uses __iter__'s lesser-known "sentinel" argument:
with open('file.txt', 'rb') as f:
val = ''.join(iter(lambda: f.read(1).decode('ascii'), '\x00'))
How about:
myString = myNullTerminatedString.split("\x00")[0]
For example:
myNullTerminatedString = "hello world\x00\x00\x00\x00\x00\x00"
myString = myNullTerminatedString.split("\x00")[0]
print(myString) # "hello world"
This works by splitting the string on the null character. Since the string should terminate at the first null character, we simply grab the first item in the list after splitting. split will return a list of one item if the delimiter doesn't exist, so it still works even if there's no null terminator at all.
It also will work with byte strings:
myByteString = b'hello world\x00'
myStr = myByteString.split(b'\x00')[0].decode('ascii') # "hello world" as normal string
If you're reading from a file, you can do a relatively larger read - estimate how much you'll need to read to find your null string. This is a lot faster than reading byte-by-byte. For example:
resultingStr = ''
while True:
buf = f.read(512)
resultingStr += buf
if len(buf)==0: break
if (b"\x00" in resultingStr):
extraBytes = resultingStr.index(b"\x00")
resultingStr = resultingStr.split(b"\x00")[0]
break
# now "resultingStr" contains the string
f.seek(0 - extraBytes,1) # seek backwards by the number of bytes, now the pointer will be on the null byte in the file
# or f.seek(1 - extraBytes,1) to skip the null byte in the file
(edit version 2, added extra way at the end)
Maybe there are some libraries out there that can help you with this, but as I don't know about them lets attack the problem at hand with what we know.
In python 2 bytes and string are basically the same thing, that change in python 3 where string is what in py2 is unicode and bytes is its own separate type, which mean that you don't need to define a read char if you are in py2 as no extra work is required, so I don't think you need that unpack function for this particular case, with that in mind lets define the new readString
def readString(myfile):
chars = []
while True:
c = myfile.read(1)
if c == chr(0):
return "".join(chars)
chars.append(c)
just like with your code I read a character one at the time but I instead save them in a list, the reason is that string are immutable so doing str+=char result in unnecessary copies; and when I find the null character return the join string. And chr is the inverse of ord, it will give you the character given its ascii value. This will exclude the null character, if its needed just move the appending...
Now lets test it with your sample file
for instance lets try to read "Sword_Wea_Dummy" from it
with open("sword.blendscn","rb") as archi:
#lets simulate that some prior processing was made by
#moving the pointer of the file
archi.seek(6)
string=readString(archi)
print "string repr:", repr(string)
print "string:", string
print ""
#and the rest of the file is there waiting to be processed
print "rest of the file: ", repr(archi.read())
and this is the output
string repr: 'Sword_Wea_Dummy'
string: Sword_Wea_Dummy
rest of the file: '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
other tests
>>> with open("sword.blendscn","rb") as archi:
print readString(archi)
print readString(archi)
print readString(archi)
sword
Sword_Wea_Dummy
ÍÌÌ=p=Š4:¦6¿JÆ=
>>> with open("sword.blendscn","rb") as archi:
print repr(readString(archi))
print repr(readString(archi))
print repr(readString(archi))
'sword'
'Sword_Wea_Dummy'
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6='
>>>
Now that I think about it, you mention that the data portion is of fixed size, if that is true for all files and the structure on all of them is as follow
[unknow size data][know size data]
then that is a pattern we can exploit, we only need to know the size of the file and we can get both part smoothly as follow
import os
def getDataPair(filename,knowSize):
size = os.path.getsize(filename)
with open(filename, "rb") as archi:
unknown = archi.read(size-knowSize)
know = archi.read()
return unknown, know
and by knowing the size of the data portion, its use is simple (which I get by playing with the prior example)
>>> strins_data, data = getDataPair("sword.blendscn", 80)
>>> string_data, data = getDataPair("sword.blendscn", 80)
>>> string_data
'sword\x00Sword_Wea_Dummy\x00'
>>> data
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
>>> string_data.split(chr(0))
['sword', 'Sword_Wea_Dummy', '']
>>>
Now to get each string a simple split will suffice and you can pass the rest of the file contained in data to the appropriated function to be processed
Doing file I/O one character at a time is horribly slow.
Instead use readline0, now on pypi: https://pypi.org/project/readline0/ . Or something like it.
In 3.x, there's a "newline" argument to open, but it doesn't appear to be as flexible as readline0.
Here is my implementation:
import struct
def read_null_str(f):
r_str = ""
while 1:
back_offset = f.tell()
try:
r_char = struct.unpack("c", f.read(1))[0].decode("utf8")
except:
f.seek(back_offset)
temp_char = struct.unpack("<H", f.read(2))[0]
r_char = chr(temp_char)
if ord(r_char) == 0:
return r_str
else:
r_str += r_char
In order to make sure I start and stop reading a text file exactly where I want to, I am providing 'start1'<->'end1', 'start2'<->'end2' as tags in between the text file and providing that to my python script. In my script I read it as:
start_end = ['start1','end1']
line_num = []
with open(file_path) as fp1:
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
line_num.append(num)
fp1.close()
print '\nLine number: ', line_num
fp2 = open(file_path)
for k, line2 in enumerate(fp2):
for x in range(line_num[0], line_num[1] - 1):
if k == x:
header.append(line2)
fp2.close()
This works well until I reach start10 <-> end10 and further. Eg. it checks if I have "start2" in the line and also reads the text that has "start21" and similarly for end tag as well. so providing "start1, end1" as input also reads "start10, end10". If I replace the line:
if i in line:
with
if i == line:
it throws an error.
How can I make sure that the script reads the line that contains ONLY "start1" and not "start10"?
import re
prog = re.compile('start1$')
if prog.match(line):
print line
That should return None if there is no match and return a regex match object if the line matches the compiled regex. The '$' at the end of the regex says that's the end of the line, so 'start1' works but 'start10' doesn't.
or another way..
def test(line):
import re
prog = re.compile('start1$')
return prog.match(line) != None
> test('start1')
True
> test('start10')
False
Since your markers are always at the end of the line, change:
start_end = ['start1','end1']
to:
start_end = ['start1\n','end1\n']
You probably want to look into regular expressions. The Python re library has some good regex tools. It would let you define a string to compare your line to and it has the ability to check for start and end of lines.
If you can control the input file, consider adding an underscore (or any non-number character) to the end of each tag.
'start1_'<->'end1_'
'start10_'<->'end10_'
The regular expression solution presented in other answers is more elegant, but requires using regular expressions.
You can do this with find():
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
# make sure the next char isn't '0'
if line[line.find(i)+len(i)] != '0':
line_num.append(num)
I need to to a RegEx search and replace of all commas found inside of quote blocks.
i.e.
"thing1,blah","thing2,blah","thing3,blah",thing4
needs to become
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
my code:
inFile = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()
p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
pg = p.search(line)
# found comment block
if pg:
q = re.compile(r'[^\\],')
# found comma within comment block
qg = q.search(pg.group(0))
if qg:
# Here I want to reconstitute the line and print it with the replaced text
#print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))
I need to filter only the columns I want based on a RegEx, filter further,
then do the RegEx replace, then reconstitute the line back.
How can I do this in Python?
The csv module is perfect for parsing data like this as csv.reader in the default dialect ignores quoted commas. csv.writer reinserts the quotes due to the presence of commas. I used StringIO to give a file like interface to a string.
import csv
import StringIO
s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()
result:
"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"
General Edit
There was
"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4
in the question, and now it is not there anymore.
Moreover, I hadn't remarked r'[^\\],'.
So, I completely rewrite my answer.
"thing1,blah","thing2,blah","thing3,blah",thing4
and
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
being displays of strings (I suppose)
import re
ss = '"thing1,blah","thing2,blah","thing3\,blah",thing4 '
regx = re.compile('"[^"]*"')
def repl(mat, ri = re.compile('(?<!\\\\),') ):
return ri.sub('\\\\',mat.group())
print ss
print repr(ss)
print
print regx.sub(repl, ss)
print repr(regx.sub(repl, ss))
result
"thing1,blah","thing2,blah","thing3\,blah",thing4
'"thing1,blah","thing2,blah","thing3\\,blah",thing4 '
"thing1\blah","thing2\blah","thing3\,blah",thing4
'"thing1\\blah","thing2\\blah","thing3\\,blah",thing4 '
You can try this regex.
>>> re.sub('(?<!"),(?!")', r"\\,",
'"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4
The logic behind this is to substitute a , with \, if it is not immediately both preceded and followed by a "
I came up with an iterative solution using several regex functions:
finditer(), findall(), group(), start() and end()
There's a way to turn all this into a recursive function that calls itself.
Any takers?
outfile = open(outfileName,'w')
p = re.compile(r'["]([^"]*)["]')
q = re.compile(r'([^\\])(,)')
for line in outfileRl:
pg = p.finditer(line)
pglen = len(p.findall(line))
if pglen > 0:
mpgstart = 0;
mpgend = 0;
for i,mpg in enumerate(pg):
if i == 0:
outfile.write(line[:mpg.start()])
qg = q.finditer(mpg.group(0))
qglen = len(q.findall(mpg.group(0)))
if i > 0 and i < pglen:
outfile.write(line[mpgend:mpg.start()])
if qglen > 0:
for j,mqg in enumerate(qg):
if j == 0:
outfile.write( mpg.group(0)[:mqg.start()] )
outfile.write( re.sub(r'([^\\])(,)',r'\1\\\2',mqg.group(0)) )
if j == (qglen-1):
outfile.write( mpg.group(0)[mqg.end():] )
else:
outfile.write(mpg.group(0))
if i == (pglen-1):
outfile.write(line[mpg.end():])
mpgstart = mpg.start()
mpgend = mpg.end()
else:
outfile.write(line)
outfile.close()
have you looked into str.replace()?
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the
first count occurrences are replaced.
here is some documentation
hope this helps
I'm pretty new to Python programming and would appreciate some help to a problem I have...
Basically I have multiple text files which contain velocity values as such:
0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00
etc for many lines...
What I need to do is convert all the values in the text file that are less than 1 (e.g. 0.137865E+00 above) to an arbitrary value of 0.100000E+01. While it seems pretty simple to replace specific values with the 'replace()' method and a while loop, how do you do this if you want to replace a range?
thanks
I think when you are beginning programming, it's useful to see some examples; and I assume you've tried this problem on your own first!
Here is a break-down of how you could approach this:
contents='0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00'
The split method works on strings. It returns a list of strings. By default, it splits on whitespace:
string_numbers=contents.split()
print(string_numbers)
# ['0.259515E+03', '0.235095E+03', '0.208262E+03', '0.230223E+03', '0.267333E+03', '0.217889E+03', '0.156233E+03', '0.144876E+03', '0.136187E+03', '0.137865E+00']
The map command applies its first argument (the function float) to each of the elements of its second argument (the list string_numbers). The float function converts each string into a floating-point object.
float_numbers=map(float,string_numbers)
print(float_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 0.13786499999999999]
You can use a list comprehension to process the list, converting numbers less than 1 into the number 1. The conditional expression (1 if num<1 else num) equals 1 when num is less than 1, otherwise, it equals num.
processed_numbers=[(1 if num<1 else num) for num in float_numbers]
print(processed_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 1]
This is the same thing, all in one line:
processed_numbers=[(1 if num<1 else num) for num in map(float,contents.split())]
To generate a string out of the elements of processed_numbers, you could use the str.join method:
comma_separated_string=', '.join(map(str,processed_numbers))
# '259.515, 235.095, 208.262, 230.223, 267.333, 217.889, 156.233, 144.876, 136.187, 1'
typical technique would be:
read file line by line
split each line into a list of strings
convert each string to the float
compare converted value with 1
replace when needed
write back to the new file
As I don't see you having any code yet, I hope that this would be a good start
def float_filter(input):
for number in input.split():
if float(number) < 1.0:
yield "0.100000E+01"
else:
yield number
input = "0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00"
print " ".join(float_filter(input))
import numpy as np
a = np.genfromtxt('file.txt') # read file
a[a<1] = 0.1 # replace
np.savetxt('converted.txt', a) # save to file
You could use regular expressions for parsing the string. I'm assuming here that the mantissa is never larger than 1 (ie, begins with 0). This means that for the number to be less than 1, the exponent must be either 0 or negative. The following regular expression matches '0', '.', unlimited number of decimal digits (at least 1), 'E' and either '+00' or '-' and two decimal digits.
0\.\d+E(-\d\d|\+00)
Assuming that you have the file read into variable 'text', you can use the regexp with the following python code:
result = re.sub(r"0\.\d*E(-\d\d|\+00)", "0.100000E+01", text)
Edit: Just realized that the description doesn't limit the valid range of input numbers to positive numbers. Negative numbers can be matched with the following regexp:
-0\.\d+E[-+]\d\d
This can be alternated with the first one using the (pattern1|pattern2) syntax which results in the following Python code:
result = re.sub(r"(0\.\d+E(-\d\d|\+00)|-0\.\d+E[-+]\d\d)", "0.100000E+00", subject)
Also if there's a chance that the exponent goes past 99, the regexp can be further modified by adding a '+' sign after the '\d\d' patterns. This allows matching digits ending in two OR MORE digits.
I've got the script working as I want now...thanks people.
When writing the list to a new file I used the replace method to get rid of the brackets and commas - is there a simpler way?
ftext = open("C:\\Users\\hhp06\\Desktop\\out.grd", "r")
otext = open("C:\\Users\\hhp06\\Desktop\\out2.grd", "w+")
for line in ftext:
stringnum = line.split()
floatnum = map(float, stringnum)
procnum = [(1.0 if num<1 else num) for num in floatnum]
stringproc = str(procnum)
s = (stringproc).replace(",", " ").replace("[", " ").replace("]", "")
otext.writelines(s + "\n")
otext.close()