I want to find the name of the car which has maximum mpg. I want to print 'Toyota' which has maximum mpg. I want to do this in a Pythonic way. I don't like to use pandas.
Here is my code:
dataset=[]
f= open('auto-mpg-data.csv')
csv_f=csv.reader(f)
for row in csv_f:
dataset.append(row)
#reading column
mpg=[]
for row in dataset:
mpg.append(float(row[0]))
a=max(mpg)
for a in dataset:
print(carname)
This is my data:
Here are a couple of ways to improve your code:
When you are working with files, it's always best to close() your file after working with it, or wrap your snippet of code in a with block. This closes your file automatically.
You are iterating multiple times through the lines in your file, which isn't necessary. There are much more performant approaches to solve your problem.
This code worked for me:
import csv
with open('auto-mpg-data.csv','r') as f:
csv_f = list(csv.reader(f))
best_mpg = 0
best_row = 0
for i,j in enumerate(csv_f):
if i == 0:
continue
best_mpg = max(best_mpg, float(j[0]))
if best_mpg == float(j[0]):
best_row = i
print (csv_f[best_row][3])
# Output:
# 'Toyota'
First, every object that supports iteration, can be converted directly to a list using the list function. Hence instead of
for row in csv_f:
dataset.append(row)
you can do:
dataset = list(csv_f)
Next, since dataset is a list of rows (each row is a list), you can use Python's max function to find the maximum row provided that the key to check against is the float value of the first number on each row, like so:
max_row = max(dataset, key=lambda row: float(row[0]))
max_row holds the row with maximum mpg
The simplies way:
with open('auto-mpg-data.csv') as fo:
reader = csv.reader(fo)
next(reader) # skip the header
biggest_row = max(reader, key=lambda row: float(row[0]))
print(biggest_row[3]) # or whatever the index is
Note that if your csv contains an incorrect data then this will fail so in order to make it fault-tolerant you would have to write a manual loop over reader instead of max and validate each row inside.
Also if you've already loaded the file then you can use next and max on lists as follows:
reader = iter(dataset)
next(reader) # skip the header
biggest_row = max(reader, key=lambda row: float(row[0]))
You mentioned that you don't like pandas, but, for completeness sake, here is how you could have used pandas.read_csv() to read the CSV file into a dataframe (which is quite convenient when dealing with tabular data) and then get the carname value for the maximum mpg value:
import pandas as pd
df = pd.read_csv('cars.csv', delim_whitespace=True)
print(df.loc[df['mpg'].idxmax()]['carname'])
Prints 'Toyota' for the provided sample CSV.
Using for loop iterator...
>>> mpg = [12,34,40.5,6]
>>> idx,maxMpg = 0,0
>>> for n,v in enumerate(mpg):
... if v>maxMpg: idx,maxMpg = n,v
...
>>> idx
2
>>> maxMpg
40.5
>>> carnames = ['ford','bmw','toyota','bugatti']
>>> carnames[idx]
'toyota'
>>>
Using list comprehensions:...
>>> maxMpg = max(mpg)
>>> maxMpgId = [maxMpg == m for m in mpg]
>>> maxMpgId
[False, False, True, False]
>>> carname = [carnames[n] for n,m in enumerate(mpg) if maxMpg == m]
>>> carname
['toyota']
Nasty one liner...
carname = [carnames[n] for n,m in enumerate(mpg) if max(mpg) == m]
Related
I asked this question before but the answer was not provided as a function. I've tried to put in into a dunction but it didn't work so I'm asking again:)
So here is a sample CSV file that I have to analyze
1,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,furniture,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,vehicle,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f
4,6444t43,rrdwk,vehicle,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f
and I'm tryna use this function def popvote(list) to return the most popular thing in the fourth row of each list in the list of csv which in the example above is vehicle.
Explanation down below
This is what I have so far
def popvote(list):
for x in list:
g = list(x)
if x = max(g[x]):
return x
However, this doesn't really work.. what should I change to make sure this works??
Note: The answer should be returned as a set
Explanation: So what I'm trying to return the value that is repeated most in the list based on what's indicated in (** xxxx **) below
1,8dac2b,ewmzr,**jewelry**,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,**furniture**,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,**vehicle**,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f
4,6444t43,rrdwk,**vehicle**,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f
So in this case, vehicle should be the output.
Raw python approach, using collections.Counter:
import csv
from collections import Counter
def read_categories():
with open("tmp.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
yield row[3]
counter = Counter(read_categories())
counter.most_common(n=1)
# [('vehicle', 2)]
Raw python only:
import csv
value_to_count = {}
with open("tmp.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
category = row[3]
if category in value_to_count:
value_to_count[category] += 1
else:
value_to_count[category] = 1
# sorted list of counts and values
count_to_value = sorted((v, k) for k, v in value_to_count.items())
if count_to_value:
print("most common", count_to_value[-1])
# most common (2, 'vehicle')
If you find convtools useful, then:
from convtools import conversion as c
from convtools.contrib.tables import Table
rows = Table.from_csv("tmp.csv", header=False).into_iter_rows(tuple)
# this is where code generation happens, it makes sense to store
# the converter in a separate variable for further reuse
converter = c.aggregate(c.ReduceFuncs.Mode(c.item(3))).gen_converter()
converter(rows)
# "vehicle"
As pointed by the comment, you can use df.mode() and typecast to result to set.
df = pd.read_csv("filename.csv", header=None)
set(df[3].mode())
Out: {'vehicle'}
So I'm trying to find how to open csv files and sort all the details in it...
so an example of data contained in a CSV file is...
2,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
1,668d39,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,doctor,phone2,9759544365415694736,in,53.001.135.54,weqlhrerreuert6f
and so I'm trying to let a function sortCSV(File) to open the CSV file and sort it based on the very first number, which is 0, 1 ....
so the output should be
1,668d39,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
2,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
3,622r49,arqek,doctor,phone2,9759544365415694736,in,53.001.135.54,weqlhrerreuert6f
Here is my code so far, which clearly doesn't work....
import csv
def CSV2List(csvFilename: str):
f = open(csvFilename)
q = list(f)
return q.sort()
What changes should I make to my code to make sure my code works??
using pandas, set the first column as index and use sort_index to sort based on your index column:
import pandas as pd
file_path = '/data.csv'
df = pd.read_csv(file_path,header=None,index_col=0)
df = df.sort_index()
print(df)
There's a number of ways you could handle this but one of the easiest would be to install Pandas (https://pandas.pydata.org/).
First off you most likely will need some titles of each column which should be on the first row of you CSV file. When you've added the column titles and installed pandas:
With pandas:
import pandas as pd
dataframe = pd.read_csv(filepath, index=0)
This will set the first column as the index column and will be sorting on the index.
Another way I've had to handle CSV:s with difficult formatting (aka exporting form excel etc) is by reading the file as a regular file and then iterating the rows to handle them on my own.
final_data = []
with open (filepath, "r") as f:
for row in f:
# Split the row
row_data = row.split(",")
# Add to final data array
final_data.append(row_data
# This sorts the final data based on first row
final_data.sort(key = lambda row: row[0])
# This returns a sorted list of rows of your CSV
return final_data
try csv.reader(Filename)
import csv
def CSV2List(csvFilename: str):
f = open(csvFilename)
q = csv.reader(f)
return q.sort(key=lambda x: x[0])
Using the csv module:
import csv
def csv_to_list(filename: str):
# use a context manager here
with open(filename) as fh:
reader = csv.reader(fh)
# convert the first item to an int for sorting
rows = [[int(num), *row] for num, *row in reader]
# sort the rows based on that value
return sorted(rows, key=lambda row: row[0])
This is not the best way to deal with CSV files but:
def CSV2List(csvFilename: str):
f = open(csvFilename,'r')
l = []
for line in f:
l.append(line.split(','))
for item in l:
item[0] = int(item[0])
return sorted(l)
print(CSV2List('data.csv'))
However I would probably use pandas instead, it is a great module
I am new in python so I'm trying to read a csv with 700 lines included a header, and get a list with the unique values of the first csv column.
Sample CSV:
SKU;PRICE;SUPPLIER
X100;100;ABC
X100;120;ADD
X101;110;ABV
X102;100;ABC
X102;105;ABV
X100;119;ABG
I used the example here
How to create a list in Python with the unique values of a CSV file?
so I did the following:
import csv
mainlist=[]
with open('final_csv.csv', 'r', encoding='utf-8') as csvf:
rows = csv.reader(csvf, delimiter=";")
for row in rows:
if row[0] not in rows:
mainlist.append(row[0])
print(mainlist)
I noticed that in debugging, rows is 1 line not 700 and I get only the
['SKU'] field what I did wrong?
thank you
A solution using pandas. You'll need to call the unique method on the correct column, this will return a pandas series with the unique values in that column, then convert it to a list using the tolist method.
An example on the SKU column below.
import pandas as pd
df = pd.read_csv('final_csv.csv', sep=";")
sku_unique = df['SKU'].unique().tolist()
If you don't know / care for the column name you can use iloc on the correct number of column. Note that the count index starts at 0:
df.iloc[:,0].unique().tolist()
If the question is intending get only the values occurring once then you can use the value_counts method. This will create a series with the index as the values of SKU with the counts as values, you must then convert the index of the series to a list in a similar manner. Using the first example:
import pandas as pd
df = pd.read_csv('final_csv.csv', sep=";")
sku_counts = df['SKU'].value_counts()
sku_single_counts = sku_counts[sku_counts == 1].index.tolist()
If you want the unique values of the first column, you could modify your code to use a set instead of a list. Maybe like this:
import collections
import csv
filename = 'final_csv.csv'
sku_list = []
with open(filename, 'r', encoding='utf-8') as f:
csv_reader = csv.reader(f, delimiter=";")
for i, row in enumerate(csv_reader):
if i == 0:
# skip the header
continue
try:
sku = row[0]
sku_list.append(sku)
except IndexError:
pass
print('All SKUs:')
print(sku_list)
sku_set = set(sku_list)
print('SKUs after removing duplicates:')
print(sku_set)
c = collections.Counter(sku_list)
sku_list_2 = [k for k, v in c.items() if v == 1]
print('SKUs that appear only once:')
print(sku_list_2)
with open('output.csv', 'w') as f:
for sku in sorted(sku_set):
f.write('{}\n'.format(sku))
A solution using neither pandas nor csv :
lines = open('file.csv', 'r').read().splitlines()[1:]
col0 = [v.split(';')[0] for v in lines]
uniques = filter(lambda x: col0.count(x) == 1, col0)
or, using map (but less readable) :
col0 = list(map(lambda line: line.split(';')[0], open('file.csv', 'r').read().splitlines()[1:]))
uniques = filter(lambda x: col0.count(x) == 1, col0)
I've got a text file that looks like this:
162.8008 EXP Set primary_image image=stimulus/Faces/face046.jpg
162.8008 EXP Set secondary_image image=stimulus/Scenes/scene57.jpg
162.8008 EXP Set primary_image opacity=1.0
162.8008 EXP Set secondary_image opacity=0.0
162.8008 EXP Set stimulus_instr text=press for repeated faces
And I've read it in like this:
log_data = []
with open('../filename.log.txt', 'rb') as f:
reader = csv.reader(f, delimiter = '\t')
for row in reader:
log_data.append(row)
I want to access JUST that third column, right now when I say:
print log_data[2][:]
I'm returned all of a single row, like:
['8.8093', 'EXP', 'Started presenting text_2']
And when I switched and instead do:
print log_data[:][2]
I get the same exact result! I've been trying to convert it to an array with numpy and using a .split() function with no luck. Any expertise would be greatly appreciated - thanks a lot!
How about
print [row[2] for row in log_data]
If you use numpy, the following should be OK
#!/usr/bin/env python
import numpy as np
dat = np.genfromtxt('data.txt', delimiter='\t', dtype=str)
print dat[:,2]
Result: ['Set' 'Set' 'Set' 'Set' 'Set']
This post How to use numpy.genfromtxt when first column is string and the remaining columns are numbers? might be of some help.
This is the equivalent of writing
n = len(log_data[2] - 1
print log_data[2][0:n]
That is, you are telling it to print every element within row 2. If you want to access only column 3 of row then you need to use
print log_data[2][3]
If you want to loop over the data
for row in log_data:
# process row
for col in row:
# process each column
The reverse case that you mention, log_data[:][2], is printing row 2 of the slice which is the equivalent of
n = len(log_data) - 1
print log_data[0:n][2]
Numpy is not needed, and for the given data set I don't see why you would choose to use it.
def get_column(n, data):
return [row[n] for row in data]
print(get_column(2, log_data)) # => ["Set", "Set", "Set", ...]
I was wondering how can I find minimum and maximum values from a dataset, which is basically a text file. It has 50 rows, 50 columns.
I know I can set up a control loop (for loop to be specific) to have it read each row and column, and determine the min/max values. But, I'm not sure how to do that.
I think the rows and columns need to be converted to list first and then I need to use the split() function. I tried setting something up as follows, but it doesn't seem to work:
for x in range(4,50): # using that range as an example
x.split()
max(4,50)
print x
New to Python. Please excuse my mistakes.
Try something like this:
data = []
with open('data.txt') as f:
for line in f: # loop over the rows
fields = line.split() # parse the columns
rowdata = map(float, fields) # convert text to numbers
data.extend(rowdata) # accumulate the results
print 'Minimum:', min(data)
print 'Maximum:', max(data)
Note that split() takes an optional argument if you want to split on something other than whitespace (commas for example).
If the file contains a regular (rectangular) matrix, and you know how many lines of header info it contains, then you can skip over the header info and use NumPy to do this particularly easily:
import numpy as np
f = open("file.txt")
# skip over header info
X = np.loadtxt(f)
max_per_col = X.max(axis=0)
max_per_row = X.max(axis=1)
Hmmm...are you sure that homework doesn't apply here? ;) Regardless:
You need to not only split the input lines, you need to convert the text values into numbers.
So assuming you've read the input line into in_line, you'd do something like this:
...
row = [float(each) for each in in_line.split()]
rows.append(row) # assuming you have a list called rows
...
Once you have a list of rows, you need to get columns:
...
columns = zip(*rows)
Then you can just iterate through each row and each column calling max():
...
for each in rows:
print max(each)
for eac in columns:
print max(each)
Edit: Here's more complete code showing how to open a file, iterate through the lines of the file, close the file, and use the above hints:
in_file = open('thefile.txt', 'r')
rows = []
for in_line in in_file:
row = [float(each) for each in in_line.split()]
rows.append(row)
in_file.close() # this'll happen at the end of the script / function / method anyhow
columns = zip(*rows)
for index, row in enumerate(rows):
print "In row %s, Max = %s, Min = %s" % (index, max(row), min(row))
for index, column in enumerate(columns):
print "In column %s, Max = %s, Min = %s" % (index, max(column), min(column))
Edit: For new-school goodness, don't use my old, risky file handling. Use the new, safe version:
rows = []
with open('thefile.txt', 'r') as in_file:
for in_line in in_file:
row = ....
Now you've got a lot of assurances that you don't accidentally do something bad like leave that file open, even if you throw an exception while reading it. Plus, you can entirely skip in_file.close() without feeling even a little guilty.
Will this work for you?
infile = open('my_file.txt', 'r')
file_lines = file.readlines(infile)
for line in file_lines[6:]:
items = [int(x) for x in line.split()]
max_item = max(items)
min_item = min(items)