How to find specific row in Python CSV module - python

I need to find the third row from column 4 to the end of the a CSV file. How would I do that? I know I can find the values from the 4th column on with
row[3]
but how do I get specifically the third row?

You could convert the csv reader object into a list of lists... The rows are stored in a list, which contains lists of the columns.
So:
csvr = csv.reader(file)
csvr = list(csvr)
csvr[2] # The 3rd row
csvr[2][3] # The 4th column on the 3rd row.
csvr[-4][-3]# The 3rd column from the right on the 4th row from the end

You could keep a counter for counting the number of rows:
counter = 1
for row in reader:
if counter == 3:
print('Interested in third row')
counter += 1

You could use itertools.islice to extract the row of data you wanted, then index into it.
Note that the rows and columns are numbered from zero, not one.
import csv
from itertools import islice
def get_row_col(csv_filename, row, col):
with open(csv_filename, 'rb') as f:
return next(islice(csv.reader(f), row, row+1))[col]

This one is a very basic code that will do the job and you can easily make a function out of it.
import csv
target_row = 3
target_col = 4
with open('yourfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
n = 0
for row in reader:
if row == target_row:
data = row.split()[target_col]
break
print data

Related

Iterate through a CSV and select row based on condition, and then select the row before it

SOLVED! Proper code in bottom half.
--
I included a link to a sample from a long CSV file, with any identifying data changed. I need every row that begins with "W", and then every row before it as well. The code I included writes every "W" row to a list. The final line, of course, doesn't work. I would like every previous row from a "W" row to be in its own list. Ultimately, I will combine them into an 8-column csv (using the zip function?), since each of these are 2-row associated data.
(To clarify - the associated rows in the whole table are sometimes in sets of 2, and sometimes in sets of 3. So I can't approach it by counting rows. I don't care about the 3rd row, when it exists. The key is the "W" rows)
What am I not figuring out? I've been searching all day and am not nailing this.
Sample from table
import csv
rows1 = [] #all 'W' rows
rows2 = [] #all rows before 'W' rows
with open ('Businesses.csv', 'r') as file1:
csvreader = csv.reader(file1)
for row in csvreader:
previousrow = row
if row[0].startswith('W'):
rows1.append(row)
rows2.append(previousrow)
#FIGURED IT OUT! With this -
import csv
rows1 = [] #all 'W' rows
rows2 = [] #all rows before 'W' rows
with open ('Businesses.csv', 'r') as file1:
csvreader = csv.reader(file1)
templist = []
for row in csvreader:
if not row[0].startswith('W'):
templist.append(row)
if row[0].startswith('W'):
rows1.append(row)
rows2.append(templist[-1])
On the last line, row - 1 is taking an array and subtracting 1 from it, which isn't meaningful.
What will work is if you store the current row temporarily (as previousrow for example), then when you check the next current row, if it matches, then save both the current row and the previous row to your accumulator arrays.

Skip Rows in CSV Containing Specific String

I have a list of strings (longer than in this example). If one of the strings exists in a row of data, I want to skip that row. This is what I have so far but I get an index error, which leads me to believe I'm not looping correctly.
stringList = ["ABC", "AAB", "AAA"]
with open('filename.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
next(filereader, None) #Skip header row
for row in filereader:
for k in stringList:
if k not in row:
data1 = column[1]
The error I get: IndexError: list index out of range. I realize I'm reading by row, but I need to extract the data by column.
The error is because row is a list and you are using/accessing it as a normal variable.
You can access certain columns by using appropriate indexing of the list row. Eg: in the first iteration row[0] will be the element in the first-row first-column, row[1] the second column entry and so on. On subsequent iterations of row, you can access entries of subsequent column downwards.
Here's a simple loop to do it.
for row in filereader:
for k in stringList:
for i in range(len(row)):
if k not in row[i]:
someVar=row[i]
With pandas you can do it easily, with a mask. See more: link
import pandas as pd
data = pd.read_csv('filename.csv')
data = data.loc[data['column_name'] not in stringList]

accessing the values of collections.defaultdict

I have a csv file that I want to read column wise, for that I've this code :
from collections import defaultdict
from csv import DictReader
columnwise_table = defaultdict(list)
with open("Weird_stuff.csv",'rU') as f:
reader = DictReader(f)
for row in reader:
for col,dat in row.items():
columnwise_table[col].append(dat)
#print(columnwise_table.items()) # this gives me everything
print(type(columnwise_table[2]) # I'm look for smt like this
my question is how can get all the element of only one specific column ? and I'm not using conda and the matrix is big 2400x980
UPDATE
I have 980 columns and over 2000 rows I need to work with the file using the columns say 1st column[0]: feature1 2nd column[0]: j_ss01 50th column:Abs2 and so on
since I can't access the dict using the column names I would like to use an index for that. is this possible ?
import csv
import collections
col_values = collections.defaultdict(list)
with open('Wierd_stuff.csv', 'rU') as f:
reader = csv.reader(f)
# skip field names
next(reader)
for row in csv.reader(f):
for col, value in enumerate(row):
col_values[col].append(value)
# for each numbered column you want...
col_index = 33 # for example
print(col_values[col_index])
If you know the columns you want in advance, only storing those columns could save you some space...
cols = set(1, 5, 6, 234)
...
for col, value in enumerate(row):
if col in cols:
col_values[col].append(value)
By iterating on row.items, you get all columns.
If you want only one specific column via index number, use csv.reader and column index instead.
from csv import reader
col_values = []
# Column index number to get values from
col = 1
with open("Weird_stuff.csv",'rU') as f:
reader = reader(f)
for row in reader:
col_val = row[col]
col_values.append(col_val)
# contains only values from column index <col>
print(col_values)

Add to Values in An Array in a CSV File

I imported my CSV File and made the data into an array. Now I was wondering, what can I do so that I'm able to print a specific value in the array? For instance if I wanted the value in the 2nd row, 2nd column.
Also how would I go about adding the two values together? Thanks.
import csv
import numpy as np
f = open("Test.csv")
csv_f = csv.reader(f)
for row in csv_f:
print(np.array(row))
f.close()
There is no need to use csv module.
This code reads csv file and prints value of cell in second row and second column. I am assuming that fields are separated by commas.
with open("Test.csv") as fo:
table = [row.split(",") for row in fo.read().replace("\r", "").split("\n")]
print table[1][1]
So, I grabbed a dataset ("Company Funding Records") from here. Then, I just rewrote a little...
#!/usr/bin/python
import csv
#import numpy as np
csvaslist = []
f = open("TechCrunchcontinentalUSA.csv")
csv_f = csv.reader(f)
for row in csv_f:
# print(np.array(row))
csvaslist.append(row)
f.close()
# Now your data is in a dict. Everything past this point is just playing
# Add together a couple of arbitrary values...
print int(csvaslist[2][7]) + int(csvaslist[11][7])
# Add using a conditional...
print "\nNow let's see what Facebook has received..."
fbsum = 0
for sublist in csvaslist:
if sublist[0] == "facebook":
print sublist
fbsum += int(sublist[7])
print "Facebook has received", fbsum
I've commented lines at a couple points to show what's being used and what was unneeded. Notice at the end that referring to a particular datapoint is simply a matter of referencing what is, effectively, original_csv_file[line_number][field_on_that_line], and then recasting as int, float, whatever you need. This is because the csv file has been changed to a list of lists.
To get specific values within your array/file, and add together:
import csv
f = open("Test.csv")
csv_f = list(csv.reader(f))
#returns the value in the second row, second column of your file
print csv_f[1][1]
#returns sum of two specific values (in this example, value of second row, second column and value of first row, first column
sum = int(csv_f[1][1]) + int(csv_f[0][0])
print sum

CSV read specific row

I have a CSV file with 100 rows.
How do I read specific rows?
I want to read say the 9th line or the 23rd line etc?
You could use a list comprehension to filter the file like so:
with open('file.csv') as fd:
reader=csv.reader(fd)
interestingrows=[row for idx, row in enumerate(reader) if idx in (28,62)]
# now interestingrows contains the 28th and the 62th row after the header
Use list to grab all the rows at once as a list. Then access your target rows by their index/offset in the list. For example:
#!/usr/bin/env python
import csv
with open('source.csv') as csv_file:
csv_reader = csv.reader(csv_file)
rows = list(csv_reader)
print(rows[8])
print(rows[22])
You simply skip the necessary number of rows:
with open("test.csv", "rb") as infile:
r = csv.reader(infile)
for i in range(8): # count from 0 to 7
next(r) # and discard the rows
row = next(r) # "row" contains row number 9 now
You could read all of them and then use normal lists to find them.
with open('bigfile.csv','rb') as longishfile:
reader=csv.reader(longishfile)
rows=[r for r in reader]
print row[9]
print row[88]
If you have a massive file, this can kill your memory but if the file's got less than 10,000 lines you shouldn't run into any big slowdowns.
You can do something like this :
with open('raw_data.csv') as csvfile:
readCSV = list(csv.reader(csvfile, delimiter=','))
row_you_want = readCSV[index_of_row_you_want]
May be this could help you , using pandas you can easily do it with loc
'''
Reading 3rd record using pandas -> (loc)
Note : Index start from 0
If want to read second record then 3-1 -> 2
loc[2]` -> read second row and `:` -> entire row details
'''
import pandas as pd
df = pd.read_csv('employee_details.csv')
df.loc[[2],:]
Output :

Categories