Get Column from .txt File using CSV.Reader in Python - python

I've got a text file that looks like this:
162.8008 EXP Set primary_image image=stimulus/Faces/face046.jpg
162.8008 EXP Set secondary_image image=stimulus/Scenes/scene57.jpg
162.8008 EXP Set primary_image opacity=1.0
162.8008 EXP Set secondary_image opacity=0.0
162.8008 EXP Set stimulus_instr text=press for repeated faces
And I've read it in like this:
log_data = []
with open('../filename.log.txt', 'rb') as f:
reader = csv.reader(f, delimiter = '\t')
for row in reader:
log_data.append(row)
I want to access JUST that third column, right now when I say:
print log_data[2][:]
I'm returned all of a single row, like:
['8.8093', 'EXP', 'Started presenting text_2']
And when I switched and instead do:
print log_data[:][2]
I get the same exact result! I've been trying to convert it to an array with numpy and using a .split() function with no luck. Any expertise would be greatly appreciated - thanks a lot!

How about
print [row[2] for row in log_data]

If you use numpy, the following should be OK
#!/usr/bin/env python
import numpy as np
dat = np.genfromtxt('data.txt', delimiter='\t', dtype=str)
print dat[:,2]
Result: ['Set' 'Set' 'Set' 'Set' 'Set']
This post How to use numpy.genfromtxt when first column is string and the remaining columns are numbers? might be of some help.

This is the equivalent of writing
n = len(log_data[2] - 1
print log_data[2][0:n]
That is, you are telling it to print every element within row 2. If you want to access only column 3 of row then you need to use
print log_data[2][3]
If you want to loop over the data
for row in log_data:
# process row
for col in row:
# process each column
The reverse case that you mention, log_data[:][2], is printing row 2 of the slice which is the equivalent of
n = len(log_data) - 1
print log_data[0:n][2]

Numpy is not needed, and for the given data set I don't see why you would choose to use it.
def get_column(n, data):
return [row[n] for row in data]
print(get_column(2, log_data)) # => ["Set", "Set", "Set", ...]

Related

Python read csv single specific cell

Im trying to read just a single cell, in order to bring in the date to use elsewhere.
Using pandas I get an error If I try to do this, generally just that the dataframe cant be read because it expects a workable dataframe and not a single cell value prior to the actual convertable dataframe far below the initial line. How can I just get the cell i.e. [A2]
CSV example
Yes, pandas is not very good at files with inconsistent format (like varying number of columns). For that purpose, I recommend you should use csv the standard library.
The code below should give you the desired value.
import csv
row = 2; col = 1 # A2 = (2,1) cell
with open("yourfile.csv") as f:
reader = csv.reader(f)
for i in range(row):
row = next(reader)
value = row[col-1] # because python index starts at zero
print(value)
Demo (using string instead of a file):
import csv
row = 2; col = 1 # A2 = (2,1) cell
input_str = ["a,b", "c,d,e", "f,g,h,i"]
reader = csv.reader(input_str)
for i in range(row):
row = next(reader)
value = row[col-1] # because python index starts at zero
print(value)
To be able to access csv like a two-dimensional object you need to convert it first to 2D python List
import csv
with open("imdb.csv") as f:
csv_as_list = list(csv.reader(f, delimiter=","))
print(csv_as_list[3][2])
The format for accessing element is csv_as_list[row_index][column_index].

Finding the corresponding value in python

I want to find the name of the car which has maximum mpg. I want to print 'Toyota' which has maximum mpg. I want to do this in a Pythonic way. I don't like to use pandas.
Here is my code:
dataset=[]
f= open('auto-mpg-data.csv')
csv_f=csv.reader(f)
for row in csv_f:
dataset.append(row)
#reading column
mpg=[]
for row in dataset:
mpg.append(float(row[0]))
a=max(mpg)
for a in dataset:
print(carname)
This is my data:
Here are a couple of ways to improve your code:
When you are working with files, it's always best to close() your file after working with it, or wrap your snippet of code in a with block. This closes your file automatically.
You are iterating multiple times through the lines in your file, which isn't necessary. There are much more performant approaches to solve your problem.
This code worked for me:
import csv
with open('auto-mpg-data.csv','r') as f:
csv_f = list(csv.reader(f))
best_mpg = 0
best_row = 0
for i,j in enumerate(csv_f):
if i == 0:
continue
best_mpg = max(best_mpg, float(j[0]))
if best_mpg == float(j[0]):
best_row = i
print (csv_f[best_row][3])
# Output:
# 'Toyota'
First, every object that supports iteration, can be converted directly to a list using the list function. Hence instead of
for row in csv_f:
dataset.append(row)
you can do:
dataset = list(csv_f)
Next, since dataset is a list of rows (each row is a list), you can use Python's max function to find the maximum row provided that the key to check against is the float value of the first number on each row, like so:
max_row = max(dataset, key=lambda row: float(row[0]))
max_row holds the row with maximum mpg
The simplies way:
with open('auto-mpg-data.csv') as fo:
reader = csv.reader(fo)
next(reader) # skip the header
biggest_row = max(reader, key=lambda row: float(row[0]))
print(biggest_row[3]) # or whatever the index is
Note that if your csv contains an incorrect data then this will fail so in order to make it fault-tolerant you would have to write a manual loop over reader instead of max and validate each row inside.
Also if you've already loaded the file then you can use next and max on lists as follows:
reader = iter(dataset)
next(reader) # skip the header
biggest_row = max(reader, key=lambda row: float(row[0]))
You mentioned that you don't like pandas, but, for completeness sake, here is how you could have used pandas.read_csv() to read the CSV file into a dataframe (which is quite convenient when dealing with tabular data) and then get the carname value for the maximum mpg value:
import pandas as pd
df = pd.read_csv('cars.csv', delim_whitespace=True)
print(df.loc[df['mpg'].idxmax()]['carname'])
Prints 'Toyota' for the provided sample CSV.
Using for loop iterator...
>>> mpg = [12,34,40.5,6]
>>> idx,maxMpg = 0,0
>>> for n,v in enumerate(mpg):
... if v>maxMpg: idx,maxMpg = n,v
...
>>> idx
2
>>> maxMpg
40.5
>>> carnames = ['ford','bmw','toyota','bugatti']
>>> carnames[idx]
'toyota'
>>>
Using list comprehensions:...
>>> maxMpg = max(mpg)
>>> maxMpgId = [maxMpg == m for m in mpg]
>>> maxMpgId
[False, False, True, False]
>>> carname = [carnames[n] for n,m in enumerate(mpg) if maxMpg == m]
>>> carname
['toyota']
Nasty one liner...
carname = [carnames[n] for n,m in enumerate(mpg) if max(mpg) == m]

Python - CSV to Matrix

Can you help me with this problem?
I`m new in programming and want to find out how to create a matrix, which looks like this:
matrix = {"hello":["one","two","three"],
"world": ["five","six","seven"],
"goodbye":["one","two","three"]}
I want to import a csv, which has all the strings (one, two three,...) in it and I tried with the split method, but I`m not getting there...
Another problems are the names of the categories (hello, world, goodbye)
Do you have any suggestions?
have you looked into the csv module?
https://docs.python.org/2/library/csv.html
import csv
TEST_TEXT = """\
hello,one,two,three
world,four,five,six
goodbye,one,two,three"""
TEST_FILE = TEST_TEXT.split("\n")
#file objects iterate over newlines anyway
#so this is how it would be when opening a file
#this would be the minimum needed to use the csv reader object:
for row in csv.reader(TEST_FILE):
print(row)
#or to get a list of all the rows you can use this:
as_list = list(csv.reader(TEST_FILE))
#splitting off the first element and using it as the key in a dictionary
dict_I_call_matrix = {row[0]:row[1:] for row in csv.reader(TEST_FILE)}
print(dict_I_call_matrix)
without_csv = [row.split(",") for row in TEST_FILE] #...row in TEST_TEXT.split("\n")]
matrix = [row[1:] for row in without_csv]
labels = [row[0] for row in without_csv]

Add to Values in An Array in a CSV File

I imported my CSV File and made the data into an array. Now I was wondering, what can I do so that I'm able to print a specific value in the array? For instance if I wanted the value in the 2nd row, 2nd column.
Also how would I go about adding the two values together? Thanks.
import csv
import numpy as np
f = open("Test.csv")
csv_f = csv.reader(f)
for row in csv_f:
print(np.array(row))
f.close()
There is no need to use csv module.
This code reads csv file and prints value of cell in second row and second column. I am assuming that fields are separated by commas.
with open("Test.csv") as fo:
table = [row.split(",") for row in fo.read().replace("\r", "").split("\n")]
print table[1][1]
So, I grabbed a dataset ("Company Funding Records") from here. Then, I just rewrote a little...
#!/usr/bin/python
import csv
#import numpy as np
csvaslist = []
f = open("TechCrunchcontinentalUSA.csv")
csv_f = csv.reader(f)
for row in csv_f:
# print(np.array(row))
csvaslist.append(row)
f.close()
# Now your data is in a dict. Everything past this point is just playing
# Add together a couple of arbitrary values...
print int(csvaslist[2][7]) + int(csvaslist[11][7])
# Add using a conditional...
print "\nNow let's see what Facebook has received..."
fbsum = 0
for sublist in csvaslist:
if sublist[0] == "facebook":
print sublist
fbsum += int(sublist[7])
print "Facebook has received", fbsum
I've commented lines at a couple points to show what's being used and what was unneeded. Notice at the end that referring to a particular datapoint is simply a matter of referencing what is, effectively, original_csv_file[line_number][field_on_that_line], and then recasting as int, float, whatever you need. This is because the csv file has been changed to a list of lists.
To get specific values within your array/file, and add together:
import csv
f = open("Test.csv")
csv_f = list(csv.reader(f))
#returns the value in the second row, second column of your file
print csv_f[1][1]
#returns sum of two specific values (in this example, value of second row, second column and value of first row, first column
sum = int(csv_f[1][1]) + int(csv_f[0][0])
print sum

How to find min/max values from rows and columns in Python?

I was wondering how can I find minimum and maximum values from a dataset, which is basically a text file. It has 50 rows, 50 columns.
I know I can set up a control loop (for loop to be specific) to have it read each row and column, and determine the min/max values. But, I'm not sure how to do that.
I think the rows and columns need to be converted to list first and then I need to use the split() function. I tried setting something up as follows, but it doesn't seem to work:
for x in range(4,50): # using that range as an example
x.split()
max(4,50)
print x
New to Python. Please excuse my mistakes.
Try something like this:
data = []
with open('data.txt') as f:
for line in f: # loop over the rows
fields = line.split() # parse the columns
rowdata = map(float, fields) # convert text to numbers
data.extend(rowdata) # accumulate the results
print 'Minimum:', min(data)
print 'Maximum:', max(data)
Note that split() takes an optional argument if you want to split on something other than whitespace (commas for example).
If the file contains a regular (rectangular) matrix, and you know how many lines of header info it contains, then you can skip over the header info and use NumPy to do this particularly easily:
import numpy as np
f = open("file.txt")
# skip over header info
X = np.loadtxt(f)
max_per_col = X.max(axis=0)
max_per_row = X.max(axis=1)
Hmmm...are you sure that homework doesn't apply here? ;) Regardless:
You need to not only split the input lines, you need to convert the text values into numbers.
So assuming you've read the input line into in_line, you'd do something like this:
...
row = [float(each) for each in in_line.split()]
rows.append(row) # assuming you have a list called rows
...
Once you have a list of rows, you need to get columns:
...
columns = zip(*rows)
Then you can just iterate through each row and each column calling max():
...
for each in rows:
print max(each)
for eac in columns:
print max(each)
Edit: Here's more complete code showing how to open a file, iterate through the lines of the file, close the file, and use the above hints:
in_file = open('thefile.txt', 'r')
rows = []
for in_line in in_file:
row = [float(each) for each in in_line.split()]
rows.append(row)
in_file.close() # this'll happen at the end of the script / function / method anyhow
columns = zip(*rows)
for index, row in enumerate(rows):
print "In row %s, Max = %s, Min = %s" % (index, max(row), min(row))
for index, column in enumerate(columns):
print "In column %s, Max = %s, Min = %s" % (index, max(column), min(column))
Edit: For new-school goodness, don't use my old, risky file handling. Use the new, safe version:
rows = []
with open('thefile.txt', 'r') as in_file:
for in_line in in_file:
row = ....
Now you've got a lot of assurances that you don't accidentally do something bad like leave that file open, even if you throw an exception while reading it. Plus, you can entirely skip in_file.close() without feeling even a little guilty.
Will this work for you?
infile = open('my_file.txt', 'r')
file_lines = file.readlines(infile)
for line in file_lines[6:]:
items = [int(x) for x in line.split()]
max_item = max(items)
min_item = min(items)

Categories