Writing list to file only 10 times, python - python

I have a list of lists called sorted_lists
I'm using this to write them into a txt file. Now the thing is, this line below prints ALL the lists. I'm trying to figure it out how to print only first n (n = any number), for example first 10 lists.
f.write ("\n".join("\t".join (row) for row in sorted_lists)+"\n")

Try the following:
f.write ("\n".join("\t".join (row) for row in sorted_lists[0:N])+"\n")
where N is the number of the first N lists you want to print.
sorted_lists[0:N] will catch the first N lists (counting from 0 to N-1, there are N lists; list[N] is excluded). You could also write sorted_lists[:N] which implicitly means that it will start from the first item of the list (item 0). They are the same, the latter may be considered more elegant.

f.write ('\n'.join('\t'.join(row) for row in sorted_lists[:n+1])+'\n')
where n is the number of lists.

Why not simplify this code and use the right tools:
from itertools import islice
import csv
first10 = islice(sorted_lists, 10)
with open('output.tab', 'wb') as fout:
tabout = csv.writer(fout, delimiter='\t')
tabout.writerows(first10)

You should read up on the python slicing features.
If you want to look at only the first 10 entires of sorted_lists, you could do sorted_lists[0:10].

Related

Python converting strings in a list to numbers

I have encountered the below error message:
invalid literal for int() with base 10: '"2"'
The 2 is enclosed by single quotes on outside, and double quotes on inside. This data is in the primes list from using print primes[0].
Sample data in primes list:
["2","3","5","7"]
The primes list is created from a CSV file via:
primes=csvfile.read().replace('\n',' ').split(',')
I am trying to trying to convert strings in primes list into integers.
Via Google I have come across similar questions to mine on SE, and I have tried the two common answers that are relevant to my problem IMO.
Using map():
primes=map(int,primes)
Using list comprehension:
primes=[int(i) for i in primes]
Unfortunately when I use either of them these both give the same error message as listed above. I get a similar error message for long() when used instead of int().
Please advise.
you want:
to read each csv lines
to create a single list of integers with the flattened version of all lines.
So you have to deal with the quotes (sometimes they may even not be here depending on how the file is created) and also when you're replacing linefeed by space, that doesn't split the last number from one line with the first number of the next line. You have a lot of issues.
Use csv module instead. Say f is the handle on the opened file then:
import csv
nums = [int(x) for row in csv.reader(f) for x in row]
that parses the cells, strips off the quotes if present and flatten + convert to integer, in one line.
To limit the number of numbers read, you could create a generator comprehension instead of a list comprehension and consume only the n first items:
n = 20000 # number of elements to extract
z = (int(x) for row in csv.reader(f) for x in row)
nums = [next(z) for _ in xrange(n)] # xrange => range for python 3
Even better, to avoid StopIteration exception you could use itertools.islice instead, so if csv data ends, you get the full list:
nums = list(itertools.islice(z,n))
(Note that you have to rewind the file to call this code more than once or you'll get no elements)
Performing this task without the csv module is of course possible ([int(x.strip('"')) for x in csvfile.read().replace('\n',',').split(',')]) but more complex and error-prone.
You can try this:
primes=csvfile.read().replace('\n',' ').split(',')
final_primes = [int(i[1:-1]) for i in primes]
try this:
import csv
with open('csv.csv') as csvfile:
data = csv.reader(csvfile, delimiter=',', skipinitialspace=True)
primes = [int(j) for i in data for j in i]
print primes
or to avoid duplicates
print set(primes)

Lists - changing strings to integers in a list imported form CSV, not all data

I have the following code in python 3
import csv
import operator
with open('herofull.csv','r') as file:
reader=csv.reader(file,delimiter=',')
templist=list(reader)
print(templist)
and the data on the csv looks like this
CSVflie
The program imports the data into a list. I then want to change the last 3 items on each row that are now in the list to integers so I can do calculations with them, is this possible? I have tried all sorts with no luck. I can do it with a simple list but this is imported like a list within a list which is making my brain hurt. Please help
Ross
Probably simplest to do a loop, especially if you know it's the last three elements.
for row in templist:
for i in range(-3, 0):
row[i] = int(row[i])
This will not create a new list in memory, instead simply changing the existing templist.
Alternatively, if you know that the last three numbers are always going to contain 2 digits or less, you can do the following:
for line in templist:
for index, element in enumerate(line):
if element.isdigit() and len(element) <= 2:
line[index] = int(element)
This will create a new list with your data and convert any strings that are digits into integers.
new_list = []
for row in templist:
new_list.append(
[int(i) if i.isdigit() else i for i in row]
)
print(new_list)
cool...cheers
Had a brain wave and got this to work
import csv
with open('herofull.csv','r') as file:
reader=csv.reader(file,delimiter=',')
templist=list(reader)
for row in templist:
row[4]=int(row[4])
row[5]=int(row[5])
row[6]=int(row[6])
print(templist)
I can now do calculations and append the list, thanks for you help. It appears I just needed to stop thinking about it for while (wood, trees and all that)

Using Python to loop contents of a CSV

In Python I want to list out each number inside a simple CSV...
CSV:
07555555555, 07555555551
This is what I have tried:
for number in csv.reader(instance.data_file.read().splitlines()):
print(number)
However, this outputs the whole thing as one string like this...
['07446164630', '07755555555']
Why?
I have also tried to loop like this
for i, item in enumerate(csv.reader(instance.data_file.read().splitlines())):
print(i)
print(item)
I'm not sure I fully understand what I'm doing wrong so any help in explaining how to print each number in the file would be amazing.
csv.reader parses each line of a CSV, so your loop is iterating over the lines of the CSV file. Since both numbers are in one line, you get them in one array. If you want to iterate on the values of each line, use another, nested for loop.:
for line in csv.reader(instance.data_file.read().splitlines()):
for item in line:
number = int(item)
print(number) # or whatever you want
Or using enumerate to get the indices of each number:
for line in csv.reader(instance.data_file.read().splitlines()):
for index, item in enumerate(line):
number = int(item)
print(index, number) # or whatever you want
Use numpy's flatten module to convert matrices to 1D arrays:
import numpy as np
data = np.loadtxt(file_name, delimiter=',').flatten()
for item in data: print(item)

Max value in list with decimals.

So, I'm having a lot of trouble finding the largest decimal integer in a massive list of strings (1500ish). Here's what I have within a function (to find the max value):
all_data_lines = data.split('\n');
maxvalue = float(0.0);
for item in all_data_lines:
temp = item.split(',')[1];
if (float(temp) > maxvalue):
maxvalue = float(temp);
return maxvalue
The data file is essentially a huge list seperated by new lines and then seperated by comma's. So, I need to compare the second comma seperated element on every line.
This is what I have above. For some reason, I'm having this error:
in max_temperature
temp = item.split(',')[1];
IndexError: list index out of range
You apparently have lines that have no comma on them; perhaps you have empty lines. If you are using data.split('\n') then you are liable to end up with a last, empty value for example:
>>> '1\n2\n'.split('\n')
['1', '2', '']
>>> '1\n2\n'.splitlines()
['1', '2']
Using str.splitlines() on the other hand produces a list without a last empty value.
Rather than split on each line manually, and loop manually, use the csv module and a generator expression:
import csv
def foo(data):
reader = csv.reader(data.splitlines(), quoting=csv.QUOTE_NONNUMERIC)
return max(r[1] for r in reader if len(r) > 1)
This delegates splitting to the csv.reader() object, leaving you free to focus on testing for rows with enough elements to have a second column.
The csv.QUOTE_NONNUMERIC option tells the reader to convert values to floats for you so you don't even have to do that anymore either. This, however, works only if all columns without quotes are numeric. If this is not the case and you get ValueErrors instead, you can still do the conversion manually:
def foo(data):
reader = csv.reader(data.splitlines())
return max(float(r[1]) for r in reader if len(r) > 1)

Difference of consecutive float numbers in a column

I have a list of floating point numbers in a file in column like this:
123.456
234.567
345.678
How can i generate an output file which is generated by subtracting the value in a line with the value just above it. For the input file above,the output generated should be:
123.456-123.456
234.567-123.456
345.678-234.567
The first value should return zero, but the other values should get subtracted with the value just above it. This is not an homework question. This is a small requirement of my bigger problem and i am stuck at this point. Help much appreciated. Thanks !!
This will work:
diffs = [0] + [j - data[i] for i,j in enumerate(data[1:])]
So, assuming data.txt contains:
123.456
234.567
345.678
then
with open('data.txt') as f:
data = f.readlines()
diffs = [0] + [float(j) - float(data[i]) for i,j in enumerate(data[1:])]
print diffs
will yield
[0, 111.111, 111.11099999999999]
This answer assumes you want to keep the computed values for further processing.
If at some point you want to write these out to a file, line by line:
with open('result.txt', 'w') as outf:
for i in diffs:
outf.write('{0:12.5f}\n'.format(i))
and adjust the field widths to suit your needs (right now 12 spaces reserved, 5 after the decimal point), written out to file result.txt.
UPDATE: Given (from the comments below) that there is possibly too much data to hold in memory, this solution should work. Python 2.6 doesn't allow opening both files in the same with, hence the separate statements.
with open('result2.txt', 'w') as outf:
outf.write('{0:12.5f}\n'.format(0.0))
prev_item = 0;
with open('data.txt') as inf:
for i, item in enumerate(inf):
item = float(item.strip())
val = item - prev_item
if i > 0:
outf.write('{0:12.5f}\n'.format(val))
prev_item = item
Has a bit of a feel of a hack. Doesn't create a huge list in memory though.
Given a list of values:
[values[i] - values[i-1] if i > 0 else 0.0 for i in range(len(values))]
Instead of list comprehensions or generator expressions, why not write your own generator that can have arbitrarily complex logic and easily operate on enormous data sets?
from itertools import imap
def differences(values):
yield 0 # The initial 0 you wanted
iterator = imap(float, values)
last = iterator.next()
for value in iterator:
yield value - last
last = value
with open('data.txt') as f:
data = f.readlines()
with open('outfile.txt', 'w') as f:
for value in differences(data):
f.write('%s\n' % value)
If data holds just a few values, the benefit wouldn't necessarily be so clear (although the explicitness of the code itself might be nice next year when you have to come back and maintain it). But suppose data was a stream of values from a huge (or infinite!) source and you wanted to process the first thousand values from it:
diffs = differences(enormousdataset)
for count in xrange(1000):
print diffs.next()
Finally, this plays well with data sources that aren't indexable. Solutions that track index numbers to look up values don't play well with the output of generators.

Categories