So, I'm having a lot of trouble finding the largest decimal integer in a massive list of strings (1500ish). Here's what I have within a function (to find the max value):
all_data_lines = data.split('\n');
maxvalue = float(0.0);
for item in all_data_lines:
temp = item.split(',')[1];
if (float(temp) > maxvalue):
maxvalue = float(temp);
return maxvalue
The data file is essentially a huge list seperated by new lines and then seperated by comma's. So, I need to compare the second comma seperated element on every line.
This is what I have above. For some reason, I'm having this error:
in max_temperature
temp = item.split(',')[1];
IndexError: list index out of range
You apparently have lines that have no comma on them; perhaps you have empty lines. If you are using data.split('\n') then you are liable to end up with a last, empty value for example:
>>> '1\n2\n'.split('\n')
['1', '2', '']
>>> '1\n2\n'.splitlines()
['1', '2']
Using str.splitlines() on the other hand produces a list without a last empty value.
Rather than split on each line manually, and loop manually, use the csv module and a generator expression:
import csv
def foo(data):
reader = csv.reader(data.splitlines(), quoting=csv.QUOTE_NONNUMERIC)
return max(r[1] for r in reader if len(r) > 1)
This delegates splitting to the csv.reader() object, leaving you free to focus on testing for rows with enough elements to have a second column.
The csv.QUOTE_NONNUMERIC option tells the reader to convert values to floats for you so you don't even have to do that anymore either. This, however, works only if all columns without quotes are numeric. If this is not the case and you get ValueErrors instead, you can still do the conversion manually:
def foo(data):
reader = csv.reader(data.splitlines())
return max(float(r[1]) for r in reader if len(r) > 1)
Related
As a part of my code, I need to iterate to a specific index of my list (which is obtained from a csv file) and change the type of those elements from str to int. However, when I iterate through the indices, convert them, elements don't change their type to int.
I'm very confused about why and how I can do this?
def generate_order(bom, parts_cost_filename):
myfile = open(parts_cost_filename,'r')
# Retrieving the headers for each of the files part name and the price
header = csv.reader(myfile)
header_list = []
for line in header:
header_list += line
break
header_list = list(header_list)
for number in header_list[1:1]:
number = int(number)
print(header_list)
myfile.close()
I don't recall the exact reasoning behind that, but basically, your number variable in the for loop is immutable a separate variable and not a memory reference, so if you assign a new value to it it does not change the original list.
You need to access the list value by indexing it with square brackets.
for index in range(len(some_list)):
some_list[index] = int(some_list[index])
or better yet, use a more pythonic generator approach:
some_list = [int(x) for x in some_list]
I have encountered the below error message:
invalid literal for int() with base 10: '"2"'
The 2 is enclosed by single quotes on outside, and double quotes on inside. This data is in the primes list from using print primes[0].
Sample data in primes list:
["2","3","5","7"]
The primes list is created from a CSV file via:
primes=csvfile.read().replace('\n',' ').split(',')
I am trying to trying to convert strings in primes list into integers.
Via Google I have come across similar questions to mine on SE, and I have tried the two common answers that are relevant to my problem IMO.
Using map():
primes=map(int,primes)
Using list comprehension:
primes=[int(i) for i in primes]
Unfortunately when I use either of them these both give the same error message as listed above. I get a similar error message for long() when used instead of int().
Please advise.
you want:
to read each csv lines
to create a single list of integers with the flattened version of all lines.
So you have to deal with the quotes (sometimes they may even not be here depending on how the file is created) and also when you're replacing linefeed by space, that doesn't split the last number from one line with the first number of the next line. You have a lot of issues.
Use csv module instead. Say f is the handle on the opened file then:
import csv
nums = [int(x) for row in csv.reader(f) for x in row]
that parses the cells, strips off the quotes if present and flatten + convert to integer, in one line.
To limit the number of numbers read, you could create a generator comprehension instead of a list comprehension and consume only the n first items:
n = 20000 # number of elements to extract
z = (int(x) for row in csv.reader(f) for x in row)
nums = [next(z) for _ in xrange(n)] # xrange => range for python 3
Even better, to avoid StopIteration exception you could use itertools.islice instead, so if csv data ends, you get the full list:
nums = list(itertools.islice(z,n))
(Note that you have to rewind the file to call this code more than once or you'll get no elements)
Performing this task without the csv module is of course possible ([int(x.strip('"')) for x in csvfile.read().replace('\n',',').split(',')]) but more complex and error-prone.
You can try this:
primes=csvfile.read().replace('\n',' ').split(',')
final_primes = [int(i[1:-1]) for i in primes]
try this:
import csv
with open('csv.csv') as csvfile:
data = csv.reader(csvfile, delimiter=',', skipinitialspace=True)
primes = [int(j) for i in data for j in i]
print primes
or to avoid duplicates
print set(primes)
I need to read in lines from a text file (I already have done this). The lines are in the same format:
"Name", "number", "number".
I read in the lines and put each line in a separate lists, to make a lists of lists.
I need to divide the third number by the second number from each line, then store the resulting number as a value in a dictionary, with the "Name" as the key.
for line in f:
list_words = [line.strip().split(',') for line in f]
That is what I have so far, assuming f is a textfile that is already read in. I'm using Python 3.
You can use a dictionary comprehension:
list_words = [line.strip().split(',') for line in f]
d = {lst[0]: float(lst[2])/float(lst[1]) for lst in list_words}
Note that the list comprehension that creates list_words eliminates the need for the enclosing for loop.
Caveat: A ZeroDivisionError will be raised if one of your divisors has value zero.
An alternative approach is to add new key-value pairs at each iteration of a for loop on list_words:
d = {}
for lst in list_words:
try:
d[lst[0]] = float(lst[2])/float(lst[1])
except ZeroDivisionError:
pass
Something like
d = {l[0]:float(l[1])/float(l[2]) for l in list_words}
will create a dictionary keyed on the first (i.e. position 0) item.
Notes:
If you've just read the file in, Python will regard the two number as strings, hence the need to convert them to floats.
You really should also consider some sort of error checking - e.g. what if there aren't exactly three items on the line? What if the numbers can't be parsed (e.g. what would happen if something in the name field contained a comma?)
I have a huge CSV file where im supose to show only the colume "name" and "runtime"
My problem is that i have to sort the file and print the top 10 min and top 10 max from the
row runtime and print them
But the row 'runtime' contains text like this:
['http://dbpedia.org/ontology/runtime',
'XMLSchema#double',
'http://www.w3.org/2001/XMLSchema#double',
'4140.0',
'5040.0',
'5700.0',
'{5940.0|6600.0}',
'NULL',
'6480.0',....n]
how do i sort the list showing only numbers
my code so far:
import csv
run = []
fp = urllib.urlopen('Film.csv')
reader = csv.DictReader(fp,delimiter=',')
for line in reader:
if line:
run.append(line)
name = []
for row in run:
name.append(row['name'])
runtime = []
for row in run:
runtime.append(row['runtime'])
runtime
expected output:
the csv file contaist null values and values looking like this {5940.0|6600.0}
expected output
'4140.0',
'5040.0',
'5700.0',
'6600.0',
'6800.0',....n]
not containg the NULL values and only the higest values in the ones looking
like this
{5940.0|6600.0}
You could filter it like this, but you should probably wait for better answers.
>>>l=[1,1.3,7,'text']
>>>[i for i in l if type(i) in (type(1),type(1.0))] #only ints and floats allowed
[1,1.3,7]
This should do though.
My workflow probably would be: Use str.isdigit() as a filter, convert to a number with BIF int() or float() and then use sort() or sorted().
While you could use one of the many answers that will show up here, I personally would exploit some domain knowledge of your csv file:
runtime = runtime[3:]
Based on your example value for the runtime row, the first three columns contain metadata. So you know more about the structure of your input file than just "it is a csv file".
Then, all you need to do is sort:
runtime = sorted(runtime)
max_10 = runtime[-10:]
min_10 = runtime[:10]
The syntax I'm using here is called "slice", which allows you to access a range of a sequence, by specifying the start index and the "up-to-but-not-including" index in the square brackets separated by a colon. Neat trick: Negative indexes wrap are seen as starting at the end of the sequence.
I have a list of floating point numbers in a file in column like this:
123.456
234.567
345.678
How can i generate an output file which is generated by subtracting the value in a line with the value just above it. For the input file above,the output generated should be:
123.456-123.456
234.567-123.456
345.678-234.567
The first value should return zero, but the other values should get subtracted with the value just above it. This is not an homework question. This is a small requirement of my bigger problem and i am stuck at this point. Help much appreciated. Thanks !!
This will work:
diffs = [0] + [j - data[i] for i,j in enumerate(data[1:])]
So, assuming data.txt contains:
123.456
234.567
345.678
then
with open('data.txt') as f:
data = f.readlines()
diffs = [0] + [float(j) - float(data[i]) for i,j in enumerate(data[1:])]
print diffs
will yield
[0, 111.111, 111.11099999999999]
This answer assumes you want to keep the computed values for further processing.
If at some point you want to write these out to a file, line by line:
with open('result.txt', 'w') as outf:
for i in diffs:
outf.write('{0:12.5f}\n'.format(i))
and adjust the field widths to suit your needs (right now 12 spaces reserved, 5 after the decimal point), written out to file result.txt.
UPDATE: Given (from the comments below) that there is possibly too much data to hold in memory, this solution should work. Python 2.6 doesn't allow opening both files in the same with, hence the separate statements.
with open('result2.txt', 'w') as outf:
outf.write('{0:12.5f}\n'.format(0.0))
prev_item = 0;
with open('data.txt') as inf:
for i, item in enumerate(inf):
item = float(item.strip())
val = item - prev_item
if i > 0:
outf.write('{0:12.5f}\n'.format(val))
prev_item = item
Has a bit of a feel of a hack. Doesn't create a huge list in memory though.
Given a list of values:
[values[i] - values[i-1] if i > 0 else 0.0 for i in range(len(values))]
Instead of list comprehensions or generator expressions, why not write your own generator that can have arbitrarily complex logic and easily operate on enormous data sets?
from itertools import imap
def differences(values):
yield 0 # The initial 0 you wanted
iterator = imap(float, values)
last = iterator.next()
for value in iterator:
yield value - last
last = value
with open('data.txt') as f:
data = f.readlines()
with open('outfile.txt', 'w') as f:
for value in differences(data):
f.write('%s\n' % value)
If data holds just a few values, the benefit wouldn't necessarily be so clear (although the explicitness of the code itself might be nice next year when you have to come back and maintain it). But suppose data was a stream of values from a huge (or infinite!) source and you wanted to process the first thousand values from it:
diffs = differences(enormousdataset)
for count in xrange(1000):
print diffs.next()
Finally, this plays well with data sources that aren't indexable. Solutions that track index numbers to look up values don't play well with the output of generators.