Python converting strings in a list to numbers - python

I have encountered the below error message:
invalid literal for int() with base 10: '"2"'
The 2 is enclosed by single quotes on outside, and double quotes on inside. This data is in the primes list from using print primes[0].
Sample data in primes list:
["2","3","5","7"]
The primes list is created from a CSV file via:
primes=csvfile.read().replace('\n',' ').split(',')
I am trying to trying to convert strings in primes list into integers.
Via Google I have come across similar questions to mine on SE, and I have tried the two common answers that are relevant to my problem IMO.
Using map():
primes=map(int,primes)
Using list comprehension:
primes=[int(i) for i in primes]
Unfortunately when I use either of them these both give the same error message as listed above. I get a similar error message for long() when used instead of int().
Please advise.

you want:
to read each csv lines
to create a single list of integers with the flattened version of all lines.
So you have to deal with the quotes (sometimes they may even not be here depending on how the file is created) and also when you're replacing linefeed by space, that doesn't split the last number from one line with the first number of the next line. You have a lot of issues.
Use csv module instead. Say f is the handle on the opened file then:
import csv
nums = [int(x) for row in csv.reader(f) for x in row]
that parses the cells, strips off the quotes if present and flatten + convert to integer, in one line.
To limit the number of numbers read, you could create a generator comprehension instead of a list comprehension and consume only the n first items:
n = 20000 # number of elements to extract
z = (int(x) for row in csv.reader(f) for x in row)
nums = [next(z) for _ in xrange(n)] # xrange => range for python 3
Even better, to avoid StopIteration exception you could use itertools.islice instead, so if csv data ends, you get the full list:
nums = list(itertools.islice(z,n))
(Note that you have to rewind the file to call this code more than once or you'll get no elements)
Performing this task without the csv module is of course possible ([int(x.strip('"')) for x in csvfile.read().replace('\n',',').split(',')]) but more complex and error-prone.

You can try this:
primes=csvfile.read().replace('\n',' ').split(',')
final_primes = [int(i[1:-1]) for i in primes]

try this:
import csv
with open('csv.csv') as csvfile:
data = csv.reader(csvfile, delimiter=',', skipinitialspace=True)
primes = [int(j) for i in data for j in i]
print primes
or to avoid duplicates
print set(primes)

Related

How to make a list of numbers from a txt file in python?

I have text file containing the numbers 52, 2,103,592,2090,34452,0,1, but arranged as one column (under each other). I want to import the numbers into python and create a list
L=[52,2,103,592,2090,34452,0,1]
The best I have managed to do so far is:
txtfile=open('file.txt')
L=[]
for line in txtfile:
L.append(line.rstrip())
print(L)
which returns:
L=['52','2','103','592','2090','34452','0','1']
but the ' around the numbers bother me.
Any help is appreciated.
You should try using list comprehension and "with" keyword to make sure you don't forget to close the file.
with open('test.txt') as f:
l = [int(line) for line in f]
print(l)
You can use int() to convert string to integer, but I'd also like to emphasize using with keyword for handling files.
L = []
with open('file.txt') as txtfile:
for line in txtfile:
L.append(int(line.rstrip()))
Edit: You can also read without for loop, by using map and split like so:
with open('file.txt') as txtfile:
L = list(map(int, txtfile.read().split('\n')))
You can convert them to integers using int:
txtfile=open('file.txt')
L=[]
for line in txtfile:
L.append(int(line.rstrip()))
txtfile.close()
print(L)
[52, 2, 103, 592, 2090, 34452, 0, 1]
Similar to Asocia's answer, but I would define the length of the list first (this may slightly increase speed, and is arguably a better practice):
txtfile=open('file.txt')
L = [0] * len(list(txtfile))
for lineIdx, line in enumerate(txtfile):
L[lineIdx] = line.rstrip()
print(L)
I hope this helps.
try the following:
with open('s.txt') as num:
numbers = num.read()
n= numbers.split()
lst = []
for x in range(len(n)):
nu = n[x]
lst.append(int(nu))
print(lst)
output:
[1, 2, 3, 3, 4, 5, 6]
If you want to covert them to Integers, you can use the int(string) function in Python.
txtfile=open('file.txt')
L=[]
for line in txtfile:
L.append(int(line.rstrip()))
print(L)
According to the official Python documentation, a shorter and "more elegant" solution if you want to read all the lines of a file in a list you can also use list(f) or f.readlines().
yourList = open("filename.txt").readlines()
Just as a recomendation:
Also you might want to consider storing data on an a JSON file. The good thing about it, is that you can use it to communicate between applications that are written in another language.
From the docs:
Python allows you to use the popular data interchange format called JSON (JavaScript Object Notation). The standard module called json can take Python data hierarchies, and convert them to string representations; this process is called serializing. Reconstructing the data from the string representation is called deserializing. Between serializing and deserializing, the string representing the object may have been stored in a file or data, or sent over a network connection to some distant machine.
try this
txtfile=open('file.txt')
L=[]
for line in txtfile:
L.append(line.rstrip())
a = L[0].split(',')
print sorted(a, key = int)
much better if you closed that file while opening using with
with open('file.txt') as txtfile:
b = [x for x in txtfile]
c = b[0].split(',')
print sorted(list(map(int, c)))
but the ' around the numbers bother me : Use int(string, base)
Returns an integer value, which is equivalent of binary string in the given base.Check here! for more
but arranged as one column (under each other)
which returns L=['52','2','103','592','2090','34452','0','1']
Assumptions:
numbers arranged under each other without commas that returns ['52','2','103','592','2090','34452','0','1'] as per the question
file.txt:
52
2
103
592
2090
34452
0
1
Answer:
imagine file.txt has 1000 numbers.
Approach 1: Use List as mentioned in the question
print([int(i.rstrip()) for i in open('file.txt')])
Size in memory
putting all numbers into a list would take 9032 bytes
Suggestion
Approach 2: Use Generator
print(*(int(i.rstrip()) for i in open('file.txt')))
Size in memory
putting all 1000 numbers into a generator would take just 80 bytes.
iterable like a list, but way more memory efficient, enclosed in plain parantheses ()
Cons
Access by index not possible with generators
print(g[4]) # TypeError: 'generator' object has no attribute '__getitem__'
Conclusion
if you want to just keep the numbers in the memory and want to iterate over it whenever necessary, the recommended way is to go with generators as its memory efficient
if want to access generators by index you can always convert a generator into a list like this list(generator)
Hope this helps!

Lists - changing strings to integers in a list imported form CSV, not all data

I have the following code in python 3
import csv
import operator
with open('herofull.csv','r') as file:
reader=csv.reader(file,delimiter=',')
templist=list(reader)
print(templist)
and the data on the csv looks like this
CSVflie
The program imports the data into a list. I then want to change the last 3 items on each row that are now in the list to integers so I can do calculations with them, is this possible? I have tried all sorts with no luck. I can do it with a simple list but this is imported like a list within a list which is making my brain hurt. Please help
Ross
Probably simplest to do a loop, especially if you know it's the last three elements.
for row in templist:
for i in range(-3, 0):
row[i] = int(row[i])
This will not create a new list in memory, instead simply changing the existing templist.
Alternatively, if you know that the last three numbers are always going to contain 2 digits or less, you can do the following:
for line in templist:
for index, element in enumerate(line):
if element.isdigit() and len(element) <= 2:
line[index] = int(element)
This will create a new list with your data and convert any strings that are digits into integers.
new_list = []
for row in templist:
new_list.append(
[int(i) if i.isdigit() else i for i in row]
)
print(new_list)
cool...cheers
Had a brain wave and got this to work
import csv
with open('herofull.csv','r') as file:
reader=csv.reader(file,delimiter=',')
templist=list(reader)
for row in templist:
row[4]=int(row[4])
row[5]=int(row[5])
row[6]=int(row[6])
print(templist)
I can now do calculations and append the list, thanks for you help. It appears I just needed to stop thinking about it for while (wood, trees and all that)

Max value in list with decimals.

So, I'm having a lot of trouble finding the largest decimal integer in a massive list of strings (1500ish). Here's what I have within a function (to find the max value):
all_data_lines = data.split('\n');
maxvalue = float(0.0);
for item in all_data_lines:
temp = item.split(',')[1];
if (float(temp) > maxvalue):
maxvalue = float(temp);
return maxvalue
The data file is essentially a huge list seperated by new lines and then seperated by comma's. So, I need to compare the second comma seperated element on every line.
This is what I have above. For some reason, I'm having this error:
in max_temperature
temp = item.split(',')[1];
IndexError: list index out of range
You apparently have lines that have no comma on them; perhaps you have empty lines. If you are using data.split('\n') then you are liable to end up with a last, empty value for example:
>>> '1\n2\n'.split('\n')
['1', '2', '']
>>> '1\n2\n'.splitlines()
['1', '2']
Using str.splitlines() on the other hand produces a list without a last empty value.
Rather than split on each line manually, and loop manually, use the csv module and a generator expression:
import csv
def foo(data):
reader = csv.reader(data.splitlines(), quoting=csv.QUOTE_NONNUMERIC)
return max(r[1] for r in reader if len(r) > 1)
This delegates splitting to the csv.reader() object, leaving you free to focus on testing for rows with enough elements to have a second column.
The csv.QUOTE_NONNUMERIC option tells the reader to convert values to floats for you so you don't even have to do that anymore either. This, however, works only if all columns without quotes are numeric. If this is not the case and you get ValueErrors instead, you can still do the conversion manually:
def foo(data):
reader = csv.reader(data.splitlines())
return max(float(r[1]) for r in reader if len(r) > 1)

Writing list to file only 10 times, python

I have a list of lists called sorted_lists
I'm using this to write them into a txt file. Now the thing is, this line below prints ALL the lists. I'm trying to figure it out how to print only first n (n = any number), for example first 10 lists.
f.write ("\n".join("\t".join (row) for row in sorted_lists)+"\n")
Try the following:
f.write ("\n".join("\t".join (row) for row in sorted_lists[0:N])+"\n")
where N is the number of the first N lists you want to print.
sorted_lists[0:N] will catch the first N lists (counting from 0 to N-1, there are N lists; list[N] is excluded). You could also write sorted_lists[:N] which implicitly means that it will start from the first item of the list (item 0). They are the same, the latter may be considered more elegant.
f.write ('\n'.join('\t'.join(row) for row in sorted_lists[:n+1])+'\n')
where n is the number of lists.
Why not simplify this code and use the right tools:
from itertools import islice
import csv
first10 = islice(sorted_lists, 10)
with open('output.tab', 'wb') as fout:
tabout = csv.writer(fout, delimiter='\t')
tabout.writerows(first10)
You should read up on the python slicing features.
If you want to look at only the first 10 entires of sorted_lists, you could do sorted_lists[0:10].

How to a turn a list of strings into complex numbers in python?

I'm trying to write code which imports and exports lists of complex numbers in Python. So far I'm attempting this using the csv module. I've exported the data to a file using:
spamWriter = csv.writer(open('data.csv', 'wb')
spamWriter.writerow(complex_data)
Where complex data is a list numbers generated by the complex(re,im) function. Ex:
print complex_data
[(37470-880j),(35093-791j),(33920-981j),(28579-789j),(48002-574j),(46607-2317j),(42353-1557j),(45166-2520j),(45594-232j),(41149+561j)]
To then import this at a later time, I try the following:
mycsv = csv.reader(open('data.csv', 'rb'))
out = list(mycsv)
print out
[['(37470-880j)','(35093-791j)','(33920-981j)','(28579-789j)','(48002-574j)','(46607-2317j)','(42353-1557j)','(45166-2520j)','(45594-232j)','(41149+561j)']]
(Note that this is a list of lists, I just happened to use only one row for the example.)
I now need to turn this into complex numbers rather than strings. I think there should be a way to do this with mapping as in this question, but I couldn't figure out how to make it work. Any help would be appreciated!
Alternatively, if there's any easier way to import/export complex-valued data that I don't know of, I'd be happy to try something else entirely.
Just pass the string to complex():
>>> complex('(37470-880j)')
(37470-880j)
Like int() it takes a string representation of a complex number and parses that. You can use map() to do so for a list:
map(complex, row)
>>> c = ['(37470-880j)','(35093-791j)','(33920-981j)']
>>> map(complex, c)
[(37470-880j), (35093-791j), (33920-981j)]
complex_out = []
for row in out:
comp_row = [complex(x) for x in row]
complex_out.append(comp_row)
CSV docs say:
Note that complex numbers are written out surrounded by parens. This may cause some problems for other programs which read CSV files (assuming they support complex numbers at all).
This should convert elements in 'out' to complex numbers from the string types, which is the simplest solution given your existing code with ease of handling non-complex entries.
for i,row in enumerate(out):
j,entry in enumerate(row):
try:
out[i][j] = complex(out[i][entry])
except ValueError:
# Print here if you want to know something bad happened
pass
Otherwise using map(complex, row) on each row takes fewer lines.
for i,row in enumerate(out):
out[i] = map(complex, row)
I think each method above is bit complex
Easiest Way is this
In [1]: complex_num = '-2+3j'
In [2]: complex(complex_num)
Out[2]: (-2+3j)

Categories