Load text file python could not convert string to float - python

I have a text file that looks like this:
(1064.2966,1898.787,1064.2986,1898.787,1064.2986,1898.785,1064.2966,1898.785)
(1061.0567,1920.3816,1065.1361,1920.2276,1065.5847,1915.9657,1065.4726,1915.2927,1061.0985,1914.3955,1058.1824,1913.9468,1055.6028,1913.9468,1051.0044,1916.19,1051.5651,1918.8817,1056.0514,1918.9939,1058.9675,1919.6668,1060.8741,1920.4519)
etc (all rows have different lengths)
when I use
np.loadtxt(filename,dtype=float,delimiter=',')
I get
ValueError: could not convert string to float: (1031.4647

I think np.loadtxt expects numbers so it does not know how to convert a value which starts with a '(', I think you have two choices here:
lines = []
with open('datafile') as infile:
for line in infile:
line = line.rstrip('\n')[1:-1] # this removes first and last parentheses from the line
lines.append([float(v) for v in line.split(',')])
in this way you end up with lines which is a list of lists of values (i.e. lines[0] is a list of the values on line 1).
The other way to go is modifying the data file to remove the parentheses, which you can do in many ways depending on the platform you are working on.
In most Linux systems for instance you can just do something along the lines of this answer
EDIT: as suggested by #AlexanderHuszagh in the comments section, different systems can have different ways of representing newlines, so a more robust solution would be:
lines = []
with open('datafile') as infile:
file_lines = infile.read().splitlines()
for line in file_lines:
lines.append([float(v) for v in line[1:-1].split(',')])

You got the error because of the parentheses, you can replace it this way:
s = open(filename).read().replace('(','').replace(')','')
This return a list of arrays:
arrays = [np.array(map(float, line.split(","))) for line in s.split("\n")]

Related

Reading an nth line of a textfile in python determined from a list

I have a function gen_rand_index that generates a random group of numbers in list format, such as [3,1] or [3,2,1]
I also have a textfile that that reads something like this:
red $1
green $5
blue $6
How do I write a function so that once python generates this list of numbers, it automatically reads that # line in the text file? So if it generated [2,1], instead of printing [2,1] I would get "green $5, red $1" aka the second line in the text file and the first line in the text file?
I know that you can do print(line[2]) and commands like that, but this won't work in my case because each time I am getting a different random number of a line that I want to read, it is not a set line I want to read each time.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
I have this so far, but I am getting this
error: invalid literal for int() with base 10: '[4, 1]'
I also have gotten
TypeError: string indices must be integers
butI have tried replacing str with int and many things like that but I'm thinking the way I'm just approaching this is wrong. Can anyone help me? (I have only been coding for a couple days now so I apologize in advance if this question is really basic)
Okay, let us first get some stuff out of the way
Whenever you access something from a list the thing you put inside the box brackets [] should be an integer, eg: [5]. This tells Python that you want the 5th element. It cannot ["5"] because 5 in this case would be treated as a string
Therefore the line row = str(result[gen_rand_index]) should actually just be row = ... without the call to str. This is why you got the TypeError about list indices
Secondly, as per your description gen_rand_index would return a list of numbers.
So going by that, why don;t you try this
indices_to_pull = gen_rand_index()
file_handle = open("Foodinventory.txt", 'r')
file_contents = file_handle.readlines() # If the file is small and simle this would work fine
answer = []
for index in indices_to_pull:
answer.append(file_contents[index-1])
Explanation
We get the indices of the file lines from gen_rand_index
we read the entire file into memory using readlines()
Then we get the lines we want, Rememebr to subtract 1 as the list is indexed from 0
The error you are getting is because you're trying to index a string variable (line) with a string index (row). Presumably row will contain something like '[2,3,1]'.
However, even if row was a numerical index, you're not indexing what you think you're indexing. The variable line is a string, and it contains (on any given iteration) one line of the file. Indexing this variable will give you a single character. For example, if line contains green $5, then line[2] will yield 'e'.
It looks like your intent is to index into a list of strings, which represent all the lines of the file.
If your file is not overly large, you can read the entire file into a list of lines, and then just index that array:
with open('file.txt') as fp:
lines = fp.readlines()
print(lines[2]).
In this case, lines[2] will yield the string 'blue $6\n'.
To discard the trailing newline, use lines[2].strip() instead.
I'll go line by line and raise some issues.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
Are you sure it is gen_rand_index and not gen_rand_index()? If gen_rand_index is a function, you should call the function. In the code you have, you are not calling the function, instead you are using the function directly as an index.
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
The correct python idiom for opening a file and reading line by line is
with open("Foodinventory.txt.", "r") as f:
for line in f:
...
This way you do not have to close the file; the with clause does this for you automatically.
Now, what you want to do is to print the lines of the file that correspond to the elements in your variable row. So what you need is an if statement that checks if the line number you just read from the file corresponds to the line number in your array row.
with open("Foodinventory.txt", "r") as f:
for i, line in enumerate(f):
if i == row[i]:
print(line)
But this is wrong: it would work only if your list's elements are ordered. That is not the case in your question. So let's think a little bit. You could iterate over your file multiple times, and each time you iterate over it, print out one line. But this will be inefficient: it will take time O(nm) where n==len(row) and m == number of lines in your file.
A better solution is to read all the lines of the file and save them to an array, then print the corresponding indices from this array:
arr = []
with open("Foodinventory.txt", "r") as f:
arr = list(f)
for i in row:
print(arr[i - 1]) # arrays are zero-indiced

Getting specific element(s) from after read from file

After I read from file:
with open(fileName) as f:
for line in f:
print(line.split(",")) #split the file into multiple lists
How do I get some specific element(s) from those lists?
For example, only elements with index[0 to 3], but discard/ignore any elements after that.
If you want to save the first three items in each line, you could use a list comprehension
with open(fileName) as f:
firstitems = [line.rstrip().split(",")[0:3] for line in f]
Note that the rstrip() is needed to remove the final newline character, if there are fewer than four items in a line. Note that the "items" are all strings, even if they look like other types. If you want integers, for example, you will need to convert them to integers.
Then you can print them:
for line in firstitems:
print(line)
Try the below code:
with open('f.txt') as f:
print('\n'.join([i for i in f.read().split(',')[0:3]]))

Find list similarities - set(a).intersection(b) not working on file read line by line

I found code to find the similarities (or differences) of lists on this page: How can I compare two lists in python and return matches
>>> set(a).intersection(b)
set([5])
However, it's not working when I compare a list I made to a list made by reading a file like so:
myvalues = ['a1', '2b', '3c'] # same values found in values.txt, line by line
with open('values.txt', 'r') as f:
filevalues = f.readlines()
for line in filevalues:
line = line.strip()
matches = set(myvalues).intersection(filevalues)
print matches
output:
set([])
It DOES work on two slightly different lists I made in the script itself, and DOES work when I compare the filevalues to filevalues. Not sure what I'm missing but I'm guessing the problem has something to do with the types or format of the list that is created by reading the file's lines.
Anyone know how to go about troubleshooting this?
The elements of f.readlines() will be terminated with a \n character, that is why you are getting zero matches.
In response to the comment:
That's what I thought, but I'm even doing this before the comparison: for line in filevalues: line = line.strip()
Your loop does nothing to the lines in filevalues. Use
filevalues = [x.strip() for x in filevalues]

Trouble sorting a list with python

I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).

How to read, in a line, all characters from column A to B

is it possible in Python, given a file with 10000 lines, where all of them have this structure:
1, 2, xvfrt ert5a fsfs4 df f fdfd56 , 234
or similar, to read the whole string, and then to store in another string all characters from column 7 to column 17, including spaces, so the new string would be
"xvfrt ert5a" ?
Thanks a lot
lst = [line[6:17] for line in open(fname)]
another_list = []
for line in f:
another_list.append(line[6:17])
Or as a generator (a memory friendly solution):
another_list = (line[6:17] for line in f)
I'm going to take Michael Dillon's answer a little further. If by "columns 6 through 17" you mean "the first 11 characters of the third comma-separated field", this is a good opportunity to use the csv module. Also, for Python 2.6 and above it's considered best practice to use the 'with' statement when opening files. Behold:
import csv
with open(filepath, 'rt') as f:
lst = [row[2][:11] for row in csv.reader(f)]
This will preserve leading whitespace; if you don't want that, change the last line to
lst = [row[2].lstrip()[:11] for row in csv.reader(f)]
This technically answers the direct question:
lst = [line[6:17] for line in open(fname)]
but there is a fatal flaw. It is OK for throwaway code, but that data looks suspiciously like comma separated values, and the third field may even be space delimited chunks of data. Far better to do it like this so that if the first two columns sprout an extra digit, it will still work:
lst = [x[2].strip()[0:11] for x in [line.split(',') for line in open(fname)]]
And if those space delimited chunks might get longer, then this:
lst = [x[2].strip().split()[0:2] for x in [line.split(',') for line in open(fname)]]
Don't forget a comment or two to explain what is going on. Perhaps:
# on each line, get the 3rd comma-delimited field and break out the
# first two space-separated chunks of the licence key
Assuming, of course, that those are licence keys. No need to be too abstract in comments.
You don't say how you want to store the data from each of the 10,000 lines -- if you want them in a list, you would do something like this:
my_list = []
for line in open(filename):
my_list.append(line[7:18])
for l in open("myfile.txt"):
c7_17 = l[6:17]
# Not sure what you want to do with c7_17 here, but go for it!
This functionw will compute the string that you want and print it out
def readCols(filepath):
f = open(filepath, 'r')
for line in file:
newString = line[6:17]
print newString

Categories