How to find min/max values from rows and columns in Python? - python

I was wondering how can I find minimum and maximum values from a dataset, which is basically a text file. It has 50 rows, 50 columns.
I know I can set up a control loop (for loop to be specific) to have it read each row and column, and determine the min/max values. But, I'm not sure how to do that.
I think the rows and columns need to be converted to list first and then I need to use the split() function. I tried setting something up as follows, but it doesn't seem to work:
for x in range(4,50): # using that range as an example
x.split()
max(4,50)
print x
New to Python. Please excuse my mistakes.

Try something like this:
data = []
with open('data.txt') as f:
for line in f: # loop over the rows
fields = line.split() # parse the columns
rowdata = map(float, fields) # convert text to numbers
data.extend(rowdata) # accumulate the results
print 'Minimum:', min(data)
print 'Maximum:', max(data)
Note that split() takes an optional argument if you want to split on something other than whitespace (commas for example).

If the file contains a regular (rectangular) matrix, and you know how many lines of header info it contains, then you can skip over the header info and use NumPy to do this particularly easily:
import numpy as np
f = open("file.txt")
# skip over header info
X = np.loadtxt(f)
max_per_col = X.max(axis=0)
max_per_row = X.max(axis=1)

Hmmm...are you sure that homework doesn't apply here? ;) Regardless:
You need to not only split the input lines, you need to convert the text values into numbers.
So assuming you've read the input line into in_line, you'd do something like this:
...
row = [float(each) for each in in_line.split()]
rows.append(row) # assuming you have a list called rows
...
Once you have a list of rows, you need to get columns:
...
columns = zip(*rows)
Then you can just iterate through each row and each column calling max():
...
for each in rows:
print max(each)
for eac in columns:
print max(each)
Edit: Here's more complete code showing how to open a file, iterate through the lines of the file, close the file, and use the above hints:
in_file = open('thefile.txt', 'r')
rows = []
for in_line in in_file:
row = [float(each) for each in in_line.split()]
rows.append(row)
in_file.close() # this'll happen at the end of the script / function / method anyhow
columns = zip(*rows)
for index, row in enumerate(rows):
print "In row %s, Max = %s, Min = %s" % (index, max(row), min(row))
for index, column in enumerate(columns):
print "In column %s, Max = %s, Min = %s" % (index, max(column), min(column))
Edit: For new-school goodness, don't use my old, risky file handling. Use the new, safe version:
rows = []
with open('thefile.txt', 'r') as in_file:
for in_line in in_file:
row = ....
Now you've got a lot of assurances that you don't accidentally do something bad like leave that file open, even if you throw an exception while reading it. Plus, you can entirely skip in_file.close() without feeling even a little guilty.

Will this work for you?
infile = open('my_file.txt', 'r')
file_lines = file.readlines(infile)
for line in file_lines[6:]:
items = [int(x) for x in line.split()]
max_item = max(items)
min_item = min(items)

Related

Finding the corresponding value in python

I want to find the name of the car which has maximum mpg. I want to print 'Toyota' which has maximum mpg. I want to do this in a Pythonic way. I don't like to use pandas.
Here is my code:
dataset=[]
f= open('auto-mpg-data.csv')
csv_f=csv.reader(f)
for row in csv_f:
dataset.append(row)
#reading column
mpg=[]
for row in dataset:
mpg.append(float(row[0]))
a=max(mpg)
for a in dataset:
print(carname)
This is my data:
Here are a couple of ways to improve your code:
When you are working with files, it's always best to close() your file after working with it, or wrap your snippet of code in a with block. This closes your file automatically.
You are iterating multiple times through the lines in your file, which isn't necessary. There are much more performant approaches to solve your problem.
This code worked for me:
import csv
with open('auto-mpg-data.csv','r') as f:
csv_f = list(csv.reader(f))
best_mpg = 0
best_row = 0
for i,j in enumerate(csv_f):
if i == 0:
continue
best_mpg = max(best_mpg, float(j[0]))
if best_mpg == float(j[0]):
best_row = i
print (csv_f[best_row][3])
# Output:
# 'Toyota'
First, every object that supports iteration, can be converted directly to a list using the list function. Hence instead of
for row in csv_f:
dataset.append(row)
you can do:
dataset = list(csv_f)
Next, since dataset is a list of rows (each row is a list), you can use Python's max function to find the maximum row provided that the key to check against is the float value of the first number on each row, like so:
max_row = max(dataset, key=lambda row: float(row[0]))
max_row holds the row with maximum mpg
The simplies way:
with open('auto-mpg-data.csv') as fo:
reader = csv.reader(fo)
next(reader) # skip the header
biggest_row = max(reader, key=lambda row: float(row[0]))
print(biggest_row[3]) # or whatever the index is
Note that if your csv contains an incorrect data then this will fail so in order to make it fault-tolerant you would have to write a manual loop over reader instead of max and validate each row inside.
Also if you've already loaded the file then you can use next and max on lists as follows:
reader = iter(dataset)
next(reader) # skip the header
biggest_row = max(reader, key=lambda row: float(row[0]))
You mentioned that you don't like pandas, but, for completeness sake, here is how you could have used pandas.read_csv() to read the CSV file into a dataframe (which is quite convenient when dealing with tabular data) and then get the carname value for the maximum mpg value:
import pandas as pd
df = pd.read_csv('cars.csv', delim_whitespace=True)
print(df.loc[df['mpg'].idxmax()]['carname'])
Prints 'Toyota' for the provided sample CSV.
Using for loop iterator...
>>> mpg = [12,34,40.5,6]
>>> idx,maxMpg = 0,0
>>> for n,v in enumerate(mpg):
... if v>maxMpg: idx,maxMpg = n,v
...
>>> idx
2
>>> maxMpg
40.5
>>> carnames = ['ford','bmw','toyota','bugatti']
>>> carnames[idx]
'toyota'
>>>
Using list comprehensions:...
>>> maxMpg = max(mpg)
>>> maxMpgId = [maxMpg == m for m in mpg]
>>> maxMpgId
[False, False, True, False]
>>> carname = [carnames[n] for n,m in enumerate(mpg) if maxMpg == m]
>>> carname
['toyota']
Nasty one liner...
carname = [carnames[n] for n,m in enumerate(mpg) if max(mpg) == m]

Trouble with sorting list and "for" statement snytax

I need help sorting a list from a text file. I'm reading a .txt and then adding some data, then sorting it by population change %, then lastly, writing that to a new text file.
The only thing that's giving me trouble now is the sort function. I think the for statement syntax is what's giving me issues -- I'm unsure where in the code I would add the sort statement and how I would apply it to the output of the for loop statement.
The population change data I am trying to sort by is the [1] item in the list.
#Read file into script
NCFile = open("C:\filelocation\NC2010.txt")
#Save a write file
PopulationChange =
open("C:\filelocation\Sorted_Population_Change_Output.txt", "w")
#Read everything into lines, except for first(header) row
lines = NCFile.readlines()[1:]
#Pull relevant data and create population change variable
for aLine in lines:
dataRow = aLine.split(",")
countyName = dataRow[1]
population2000 = float(dataRow[6])
population2010 = float(dataRow[8])
popChange = ((population2010-population2000)/population2000)*100
outputRow = countyName + ", %.2f" %popChange + "%\n"
PopulationChange.write(outputRow)
NCFile.close()
PopulationChange.close()
You can fix your issue with a couple of minor changes. Split the line as you read it in and loop over the sorted lines:
lines = [aLine.split(',') for aLine in NCFile][1:]
#Pull relevant data and create population change variable
for dataRow in sorted(lines, key=lambda row: row[1]):
population2000 = float(dataRow[6])
population2010 = float(dataRow[8])
...
However, if this is a csv you might want to look into the csv module. In particular DictReader will read in the data as a list of dictionaries based on the header row. I'm making up the field names below but you should get the idea. You'll notice I sort the data based on 'countryName' as it is read in:
from csv import DictReader, DictWriter
with open("C:\filelocation\NC2010.txt") as NCFile:
reader = DictReader(NCFile)
data = sorted(reader, key=lambda row: row['countyName'])
for row in data:
population2000 = float(row['population2000'])
population2010 = float(row['population2010'])
popChange = ((population2010-population2000)/population2000)*100
row['popChange'] = "{0:.2f}".format(popChange)
with open("C:\filelocation\Sorted_Population_Change_Output.txt", "w") as PopulationChange:
writer = csv.DictWriter(PopulationChange, fieldnames=['countryName', 'popChange'])
writer.writeheader()
writer.writerows(data)
This will give you a 2 column csv of ['countryName', 'popChange']. You would need to correct this with the correct fieldnames.
You need to read all of the lines in the file before you can sort it. I've created a list called change to hold the tuple pair of the population change and the country name. This list is sorted and then saved.
with open("NC2010.txt") as NCFile:
lines = NCFile.readlines()[1:]
change = []
for line in lines:
row = line.split(",")
country_name = row[1]
population_2000 = float(row[6])
population_2010 = float(row[8])
pop_change = ((population_2010 / population_2000) - 1) * 100
change.append((pop_change, country_name))
change.sort()
output_rows = []
[output_rows.append("{0}, {1:.2f}\n".format(pair[1], pair[0]))
for pair in change]
with open("Sorted_Population_Change_Output.txt", "w") as PopulationChange:
PopulationChange.writelines(output_rows)
I used a list comprehension to generate the output rows which swaps the pair back in the desired order, i.e. country name first.

Add to Values in An Array in a CSV File

I imported my CSV File and made the data into an array. Now I was wondering, what can I do so that I'm able to print a specific value in the array? For instance if I wanted the value in the 2nd row, 2nd column.
Also how would I go about adding the two values together? Thanks.
import csv
import numpy as np
f = open("Test.csv")
csv_f = csv.reader(f)
for row in csv_f:
print(np.array(row))
f.close()
There is no need to use csv module.
This code reads csv file and prints value of cell in second row and second column. I am assuming that fields are separated by commas.
with open("Test.csv") as fo:
table = [row.split(",") for row in fo.read().replace("\r", "").split("\n")]
print table[1][1]
So, I grabbed a dataset ("Company Funding Records") from here. Then, I just rewrote a little...
#!/usr/bin/python
import csv
#import numpy as np
csvaslist = []
f = open("TechCrunchcontinentalUSA.csv")
csv_f = csv.reader(f)
for row in csv_f:
# print(np.array(row))
csvaslist.append(row)
f.close()
# Now your data is in a dict. Everything past this point is just playing
# Add together a couple of arbitrary values...
print int(csvaslist[2][7]) + int(csvaslist[11][7])
# Add using a conditional...
print "\nNow let's see what Facebook has received..."
fbsum = 0
for sublist in csvaslist:
if sublist[0] == "facebook":
print sublist
fbsum += int(sublist[7])
print "Facebook has received", fbsum
I've commented lines at a couple points to show what's being used and what was unneeded. Notice at the end that referring to a particular datapoint is simply a matter of referencing what is, effectively, original_csv_file[line_number][field_on_that_line], and then recasting as int, float, whatever you need. This is because the csv file has been changed to a list of lists.
To get specific values within your array/file, and add together:
import csv
f = open("Test.csv")
csv_f = list(csv.reader(f))
#returns the value in the second row, second column of your file
print csv_f[1][1]
#returns sum of two specific values (in this example, value of second row, second column and value of first row, first column
sum = int(csv_f[1][1]) + int(csv_f[0][0])
print sum

Write last three entries per name in a file

I have the following data in a file:
Sarah,10
John,5
Sarah,7
Sarah,8
John,4
Sarah,2
I would like to keep the last three rows for each person. The output would be:
John,5
Sarah,7
Sarah,8
John,4
Sarah,2
In the example, the first row for Sarah was removed since there where three later rows. The rows in the output also maintain the same order as the rows in the input. How can I do this?
Additional Information
You are all amazing - Thank you so much. Final code which seems to have been deleted from this post is -
import collections
with open("Class2.txt", mode="r",encoding="utf-8") as fp:
count = collections.defaultdict(int)
rev = reversed(fp.readlines())
rev_out = []
for line in rev:
name, value = line.split(',')
if count[name] >= 3:
continue
count[name] += 1
rev_out.append((name, value))
out = list(reversed(rev_out))
print (out)
Since this looks like csv data, use the csv module to read and write it. As you read each line, store the rows grouped by the first column. Store the line number along with the row so that they can be written out maintaining the same order as the input. Use a bound deque to keep only the last three rows for each name. Finally, sort the rows and write them out.
import csv
by_name = defaultdict(lambda x: deque(x, maxlen=3))
with open('my_data.csv') as f_in
for i, row in enumerate(csv.reader(f_in)):
by_name[row[0]].append((i, row))
# sort the rows for each name by line number, discarding the number
rows = sorted(row[1] for value in by_name.values() for row in value, key=lambda row: row[0])
with open('out_data.csv', 'w') as f_out:
csv.writer(f_out).writerows(rows)

How to read a text file into a list or an array with Python

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.
The text file is formatted as follows:
0,0,200,0,53,1,0,255,...,0.
Where the ... is above, there actual text file has hundreds or thousands more items.
I'm using the following code to try to read the file into a list:
text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()
The output I get is:
['0,0,200,0,53,1,0,255,...,0.']
1
Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?
You will have to split your string into a list of values using split()
So,
lines = text_file.read().split(',')
EDIT:
I didn't realise there would be so much traction to this. Here's a more idiomatic approach.
import csv
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
# do something
You can also use numpy loadtxt like
from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)
So you want to create a list of lists... We need to start with an empty list
list_of_lists = []
next, we read the file content, line by line
with open('data') as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
# in alternative, if you need to use the file content as numbers
# inner_list = [int(elt.strip()) for elt in line.split(',')]
list_of_lists.append(inner_list)
A common use case is that of columnar data, but our units of storage are the
rows of the file, that we have read one by one, so you may want to transpose
your list of lists. This can be done with the following idiom
by_cols = zip(*list_of_lists)
Another common use is to give a name to each column
col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
by_names[col_name] = by_cols[i]
so that you can operate on homogeneous data items
mean_apple_prices = [money/fruits for money, fruits in
zip(by_names['apples revenue'], by_names['apples_sold'])]
Most of what I've written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).
Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.
If you need indexed access you can use
by_cols = list(zip(*list_of_lists))
that gives you a list of lists in both versions of Python.
On the other hand, if you don't need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...
file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column
This question is asking how to read the comma-separated value contents from a file into an iterable list:
0,0,200,0,53,1,0,255,...,0.
The easiest way to do this is with the csv module as follows:
import csv
with open('filename.dat', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
Now, you can easily iterate over spamreader like this:
for row in spamreader:
print(', '.join(row))
See documentation for more examples.
Im a bit late but you can also read the text file into a dataframe and then convert corresponding column to a list.
lista=pd.read_csv('path_to_textfile.txt', sep=",", header=None)[0].tolist()
example.
lista=pd.read_csv('data/holdout.txt',sep=',',header=None)[0].tolist()
Note: the column name of the corresponding dataframe will be in the form of integers and i choose 0 because i was extracting only the first column
Better this way,
def txt_to_lst(file_path):
try:
stopword=open(file_path,"r")
lines = stopword.read().split('\n')
print(lines)
except Exception as e:
print(e)

Categories