I am trying to transform into integers two strings (that are numbers) separated by a space, but I keep failing. I have tried numerous different solution, but nothing seems to work.
f= open('new.txt', 'r')
list_author = []
for line in f:
header1 = f.readline()
header2 = f.readline()
header3 = f.readline()
line = line.strip().replace('\t', ' ')
line=list(map(str,line.split()))
list_author.append([line])
print(list_author[1:10])
Output (formatted for readability):
[[['#', 'Directed', 'graph', '(each', 'unordered', 'pair', 'of', 'nodes', 'is', 'saved']],
[['3466', '937']],
[['3466', '15931']],
[['10310', '1854']],
[['10310', '9572']],
[['10310', '16310']],
[['10310', '24814']],
[['5052', '3096']],
[['5052', '5740']],
[['5052', '10235']]]
It seems that the first line is a header so you need to skip it. Then, you can use numbers = line.split("\t") on every line to have both numbers. Then you can add them into a the author´s list like this: list_author.append((int(number[0]),int(number[1])))
You can cast str to int using int("123") but i would be careful, your code will bomb out if you try to parse anything except an int, str or float and it must be a valid number, so you need to catch both TypeError and ValueError. I would wrap it in a function that you can call for each iteration:
def coerce(value: str) -> int or None:
try:
return int(value)
except (TypeError, ValueError):
# this is not convertible
return None
Related
I have to use python to read a file, this file contains combination of characters, numbers and other stuff.
After reading a single line from the file, how do I check if this line is an integer or a float? (I have to know this information that this line is an integer and float)
I have tried these string methods .isdigit(), .isdecimal(), .isnumeric(), it seems like these methods only return True when there are only all decimal numbers within the string.
Is there any method that can help me to do this task?
P.S.: Can't use try or any exception approach.
============== Content of my File =================
0
[Begin Description]
xxx
[End Description]
1.1
[Begin Description]
....
I want to know if the current line I am reading is integer 0 or float 1.1. That makes my question.
I hope this will help
import re
s = "1236.0"
r = re.compile(r'[1-9]')
r2 = re.compile(r'(\.)')
if re.search(r,s) and re.search(r2,s):
print("Float")
if re.search(r,s) and not re.search(r2,s):
print("Integer")
You should use try and except:
But if you dont want to use it and need different way then use regex:
if re.match(r"[-+]?\d+(\.0*)?$", s):
print("match")
For each line in the file you can check with regex whether it is a float or int or normal string
import re
float_match = re.compile("^[-+]?[0-9]*[.][0-9]+$")
int_match = re.compile("^[-+]?[0-9]+$")
lines = ["\t23\n", "24.5", "-23", "0.23", "-23.56", ".89", "-122", "-abc.cb"]
for line in lines:
line = line.strip()
if int_match.match(line):
print("int")
elif float_match.match(line):
print("float")
else:
print("str")
Result:
int
float
int
float
float
float
int
str
How it works:
int_match = re.compile("^[-+]?[0-9]+$")
^: at the str beginning
[-+]?: optional + or -
[0-9]+: one or more numbers
$: end of string
float_match = re.compile("^[-+]?[0-9]*[.][0-9]+$")
^[-+]?: start with either + or - optional.
[0-9]*: any number of digits or none.
[.]: dot
[0-9]+: one or more digits
$:end
This is faster than re
although this is not type checking but as you are reading string 0 or 1.1 you can do simply like
line='1.1'
if '.' in line:
print("float")
else:
print("int")
Try this :
import re
line1 = '0'
line2 = 'description one'
line3 = '1.1'
line4 = 'begin description'
lines = [line1, line2, line3, line4] # with readlines() you can get it directly
for i in lines:
if re.findall("[+-]?\d+", i) and not re.findall("[+-]?\d+\.\d+", i):
print('int found')
elif re.findall("[+-]?\d+\.\d+", i):
print('float found')
else:
print('no numeric found')
OUTPUT :
int found
no numeric found
float found
no numeric found
You could split it into words using .split() and use string methods.
Example Code (Note that split method argument should be changed to comma if you use it in floats instead of dot):
def float_checker(strinput):
digit_res = None
for part in strinput.split('.'):
digit_res = True if part.isnumeric() else False
if digit_res:
return True
return False
if __name__ == '__main__':
while True:
print(float_checker(input('Input for float check (Stop with CTRL+C): ')))
I'm trying to read a null terminated string but i'm having issues when unpacking a char and putting it together with a string.
This is the code:
def readString(f):
str = ''
while True:
char = readChar(f)
str = str.join(char)
if (hex(ord(char))) == '0x0':
break
return str
def readChar(f):
char = unpack('c',f.read(1))[0]
return char
Now this is giving me this error:
TypeError: sequence item 0: expected str instance, int found
I'm also trying the following:
char = unpack('c',f.read(1)).decode("ascii")
But it throws me:
AttributeError: 'tuple' object has no attribute 'decode'
I don't even know how to read the chars and add it to the string, Is there any proper way to do this?
Here's a version that (ab)uses __iter__'s lesser-known "sentinel" argument:
with open('file.txt', 'rb') as f:
val = ''.join(iter(lambda: f.read(1).decode('ascii'), '\x00'))
How about:
myString = myNullTerminatedString.split("\x00")[0]
For example:
myNullTerminatedString = "hello world\x00\x00\x00\x00\x00\x00"
myString = myNullTerminatedString.split("\x00")[0]
print(myString) # "hello world"
This works by splitting the string on the null character. Since the string should terminate at the first null character, we simply grab the first item in the list after splitting. split will return a list of one item if the delimiter doesn't exist, so it still works even if there's no null terminator at all.
It also will work with byte strings:
myByteString = b'hello world\x00'
myStr = myByteString.split(b'\x00')[0].decode('ascii') # "hello world" as normal string
If you're reading from a file, you can do a relatively larger read - estimate how much you'll need to read to find your null string. This is a lot faster than reading byte-by-byte. For example:
resultingStr = ''
while True:
buf = f.read(512)
resultingStr += buf
if len(buf)==0: break
if (b"\x00" in resultingStr):
extraBytes = resultingStr.index(b"\x00")
resultingStr = resultingStr.split(b"\x00")[0]
break
# now "resultingStr" contains the string
f.seek(0 - extraBytes,1) # seek backwards by the number of bytes, now the pointer will be on the null byte in the file
# or f.seek(1 - extraBytes,1) to skip the null byte in the file
(edit version 2, added extra way at the end)
Maybe there are some libraries out there that can help you with this, but as I don't know about them lets attack the problem at hand with what we know.
In python 2 bytes and string are basically the same thing, that change in python 3 where string is what in py2 is unicode and bytes is its own separate type, which mean that you don't need to define a read char if you are in py2 as no extra work is required, so I don't think you need that unpack function for this particular case, with that in mind lets define the new readString
def readString(myfile):
chars = []
while True:
c = myfile.read(1)
if c == chr(0):
return "".join(chars)
chars.append(c)
just like with your code I read a character one at the time but I instead save them in a list, the reason is that string are immutable so doing str+=char result in unnecessary copies; and when I find the null character return the join string. And chr is the inverse of ord, it will give you the character given its ascii value. This will exclude the null character, if its needed just move the appending...
Now lets test it with your sample file
for instance lets try to read "Sword_Wea_Dummy" from it
with open("sword.blendscn","rb") as archi:
#lets simulate that some prior processing was made by
#moving the pointer of the file
archi.seek(6)
string=readString(archi)
print "string repr:", repr(string)
print "string:", string
print ""
#and the rest of the file is there waiting to be processed
print "rest of the file: ", repr(archi.read())
and this is the output
string repr: 'Sword_Wea_Dummy'
string: Sword_Wea_Dummy
rest of the file: '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
other tests
>>> with open("sword.blendscn","rb") as archi:
print readString(archi)
print readString(archi)
print readString(archi)
sword
Sword_Wea_Dummy
ÍÌÌ=p=Š4:¦6¿JÆ=
>>> with open("sword.blendscn","rb") as archi:
print repr(readString(archi))
print repr(readString(archi))
print repr(readString(archi))
'sword'
'Sword_Wea_Dummy'
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6='
>>>
Now that I think about it, you mention that the data portion is of fixed size, if that is true for all files and the structure on all of them is as follow
[unknow size data][know size data]
then that is a pattern we can exploit, we only need to know the size of the file and we can get both part smoothly as follow
import os
def getDataPair(filename,knowSize):
size = os.path.getsize(filename)
with open(filename, "rb") as archi:
unknown = archi.read(size-knowSize)
know = archi.read()
return unknown, know
and by knowing the size of the data portion, its use is simple (which I get by playing with the prior example)
>>> strins_data, data = getDataPair("sword.blendscn", 80)
>>> string_data, data = getDataPair("sword.blendscn", 80)
>>> string_data
'sword\x00Sword_Wea_Dummy\x00'
>>> data
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
>>> string_data.split(chr(0))
['sword', 'Sword_Wea_Dummy', '']
>>>
Now to get each string a simple split will suffice and you can pass the rest of the file contained in data to the appropriated function to be processed
Doing file I/O one character at a time is horribly slow.
Instead use readline0, now on pypi: https://pypi.org/project/readline0/ . Or something like it.
In 3.x, there's a "newline" argument to open, but it doesn't appear to be as flexible as readline0.
Here is my implementation:
import struct
def read_null_str(f):
r_str = ""
while 1:
back_offset = f.tell()
try:
r_char = struct.unpack("c", f.read(1))[0].decode("utf8")
except:
f.seek(back_offset)
temp_char = struct.unpack("<H", f.read(2))[0]
r_char = chr(temp_char)
if ord(r_char) == 0:
return r_str
else:
r_str += r_char
I am asked to "return a list of tuples containing the subset name (as a string) and a list of floating point data values".
My code is:
def load_data(filename):
fileopen = open(filename)
result_open=[]
for line in fileopen:
answer = (line.strip().split(","))
result_open.append((answer[0],(answer[1:])))
return result_open
However, when I run the code, the following appears:
[('Slow Loris', [' 21.72', ' 29.3', ' 20.08', ' 29.98', ' 29.85', ' 26.22', ' 19......)]
Is there anyway to change the tuple to appear without the apostrophes? I want it to look like:
[('Slow Loris', [21.72, 29.3, 20.08, 29.98, 29.85, 6.22, 19......)]
line is a string, and line.strip().split(",") is a list of strings. You need to convert the string values into float or Decimal values. One way would be:
result_open.append((answer[0], [float(val) for val in answer[1:]]))
That will raise an exception on values that can't be converted to a float, so you should think about how you want to handle such input.
So I am trying to run this piece of code:
reader = list(csv.reader(open('mynew.csv', 'rb'), delimiter='\t'))
print reader[1]
number = [float(s) for s in reader[1]]
inside reader[1] i have the following values:
'5/1/2013 21:39:00.230', '46.09', '24.76', '0.70', '0.53', '27.92',
I am trying to store each one of values into an array like so:
number[0] = 46.09
number[1] = 24.09
and so on.....
My question is: how would i skip the date and the number following it and just store legitimate floats. Or store the contents in an array that are separated by comma?
It throws an error when I try to run the code above:
ValueError: invalid literal for float(): 5/1/2013 21:39:00.230
Thanks!
Just skip values which cannot be converted to float:
number = []
for s in reader[1]:
try:
number.append(float(s))
except ValueError:
pass
If it's always the first value that's not a float, you can take it out doing:
reader = list(csv.reader(open('mynew.csv', 'rb'), delimiter='\t'))
print reader[1]
number = [float(s) for s in reader[1][1:]]
Or you can search for / in the string and pass if exists, something like this
my_list_results = []
my_list = ['5/1/2013 21:39:00.230', '46.09', '24.76', '0.70', '0.53', '27.92']
for m in my_list:
if '/' not in m: #if we don't find /
my_list_results.append(m)
print my_list_results
number = []
for s in reader[1]:
number.append(int(float(s)))
this will convert string into exact float
I have a file that looks like:
1 1 C C 1.9873 2.347 3.88776
1 2 C Si 4.887 9.009 1.21
I would like to read in the contents of the file, line-by-line. When I only had numbers on the lines I used:
for line in readlines(file):
data = map(float, line.split)
But this only works when all the elements of line.split are numbers. How can I make it store the letters as strings and the numbers as floats?
$ cat 1.py
def float_or_str(x):
try:
return float(x)
except ValueError:
return x
line = '1 1 C C 1.9873 2.347 3.88776'
print map(float_or_str, line.split())
$python 1.py
[1.0, 1.0, 'C', 'C', 1.9873, 2.347, 3.88776]
for line in infile:
data = [x if x.isalpha() else float(x) for x in line.split()]
There will be issues if your data contains fields that are neither alphabetic nor valid floating-point numbers (for example, "A1"). Your data doesn't seem to have these from what you said, but if it does, the try/except approach suggested by Igor would probably suit better.
I would probably use a more generic function that can be given the types to try, however:
def tryconvert(value, *types):
for t in types:
try:
return t(value)
except (ValueError, TypeError):
continue
return value
for line in infile:
data = [tryconvert(x, int, float) for x in line.split()]
This will convert anything that be converted to an integer to an int, failing that it will try float, and then finally it just gives up and returns the original value, which we know will be a string. (If we didn't know it was a string we could just stick str on the end of our call to tryconvert().)
You can use methods str.isalpha(), str.isalphanum(), str.isdigit to decide if your string is a number or not.