How to skip a column when reading a CSV file - Python - python

So in my CSV i have peoples names as the first row and i'm trying to get the average of 3 numbers on the columns 1-3. (not the first column though) Is there a way to skip a column so that i can just pull out columns 1-3? here is my code for getting the average. Any help would be much appreciated. So just to be clear on what i want:
I want to skip a column so that i can successfully get the mean average from columns 1-3.
if order ==("average score"):
with open("data.csv") as f:
reader = csv.reader(f)
columns = f.readline().strip().split(" ")
numRows = 0
sums = [1] * len(columns)
for line in f:
# Skip empty lines
if not line.strip():
continue
values = line.split(" ")
for i in range(len(values)):
sums[i] += int(values[i])
numRows += 1
for index, summedRowValue in enumerate (str(sums)):
print (columns[index], 1.0 * summedRowValue / numRows)

All you should have to do is change your range to start at 1 instead of 0, to skip the first column:
for i in range(1, len(values)):

Related

Python CSV Reader

I have a CSV from a system that has a load of rubbish at the top of the file, so the header row is about row 5 or could even be 14 depending on the gibberish the report puts out.
I used to use:
idx = next(idx for idx, row in enumerate(csvreader) if len(row) > 2)
to go through the rows that had less than 2 columns, then when it hit the col headers, of which there are 12, it would stop, and then I could use idx with skiprows when reading the CSV file.
The system has had an update and someone thought it would be good to have the CSV file valid by adding in 11 blank commas after their gibberish to align the header count.
so now I have a CSV like:
sadjfhasdkljfhasd,,,,,,,,,,
dsfasdgasfg,,,,,,,,,,
time,date,code,product
etc..
I tried:
idx = next(idx for idx, row in enumerate(csvreader) if row in (None, "") > 2)
but I think that's a Pandas thing and it just fails.
Any ideas on how i can get to my header row?
CODE:
lmf = askopenfilename(filetypes=(("CSV Files",".csv"),("All Files","*.*")))
# Section gets row number where headers start
with open(lmf, 'r') as fin:
csvreader = csv.reader(fin)
print(csvreader)
input('hold')
idx = next(idx for idx, row in enumerate(csvreader) if len(row) > 2)
# Reopens file parsing the number for the row headers
lmkcsv = pd.read_csv(lmf, skiprows=idx)
lm = lm.append(lmkcsv)
print(lm)
Since your csv is now a valid file and you just want to filter out the header rows without a certain amount of columns, you can just do that in pandas directly.
import pandas as pd
minimum_cols_required = 3
lmkcsv = pd.read_csv()
lmkcsv = lmkcsv.dropna(thresh=minimum_cols_required, inplace=True)
If your csv data have a lot of empty values as well that gets caught in this threshold, then just slightly modify your code:
idx = next(idx for idx, row in enumerate(csvreader) if len(set(row)) > 3)
I'm not sure in what case a None would return, so the set(row) should do. If your headers for whatever are duplicates as well, do this:
from collections import Counter
# ...
idx = next(idx for idx, row in enumerate(csvreader) if len(row) - Counter(row)[''] > 2)
And how about erasing the starting lines, doing some logic, like checking many ',' exist's or some word. Something like:
f = open("target.txt","r+")
d = f.readlines()
f.seek(0)
for i in d:
if "sadjfhasdkljfhasd" not in i:
f.write(i)
f.truncate()
f.close()
after that, read normaly the file.

Replace value of specific column in all non header rows

Below is some python code that runs on a file similar to this (old_file.csv).
A,B,C,D
1,2,XX,3
11,22,XX,33
111,222,XX,333
How can I iterate through all lines in the old_file.csv (if I don't know the length of the file) and replace all values in column C or index 2 or cells[row][2] (based on cells[row][col]). But I'd like to ignore the header row. In the new_file.csv, all values containing 'XX' will become 'YY' for example.
import csv
r = csv.reader(open('old_file.csv'))
cells = [l for l in r]
cells[1][2] = 'YY'
cells[2][2] = 'YY'
cells[3][2] = 'YY'
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(cells)
Just small change in #Soviut ans, try this I think this will help you
import csv
rows = csv.reader(open('old_file.csv'))
newRows=[]
for i, row in enumerate(rows):
# ignore the first row, modify all the rest
if i > 0:
row[2] = 'YY'
newRows.append(row)
# write rows to new CSV file, no header is written unless explicitly told to
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(newRows)
You can very easily loop over the array of rows and replace values in the target cell.
# get rows from old CSV file
rows = csv.reader(open('old_file.csv'))
# iterate over each row and replace target cell
for i, row in enumerate(rows):
# ignore the first row, modify all the rest
if i > 0:
row[2] = 'YY'
# write rows to new CSV file, no header is written unless explicitly told to
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(rows)
csv reader makes arrays, so you could just run it on r[1:]
len(cells) is the number of rows. Iterating from 1 makes it skip the header line. Also the lines should be cells.
import csv
r = csv.reader(open('old_file.csv'))
cells = [l for l in r]
for i in range(1, len(cells)):
cells[i][2] = 'YY'
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(cells)
read_handle = open('old_file.csv', 'r')
data = read_handle.read().split('\n')
read_handle.close()
new_data = []
new_data.append(data[0])
for line in data[1:]:
if not line:
new_data.append(line)
continue
line = line.split(',')
line[2] = 'YY'
new_data.append(','.join(line))
write_handle = open('new_file.csv', 'w')
write_handle.writelines('\n'.join(new_data))
write_handle.close()

Python3.4 - enumeration through columns then rows in csv to obtain Max lengths

I would like to find the Max length for each column in a tab delimited csv file.
I can find the max value of one column by using this:
import csv
oldlen=0
with open(mfile) as csvfile:
test = csv.reader(csvfile,dialect='excel-tab')
for row in test:
if len(row[0]) > oldlen:
newlen = len(row[0])
print (newlen)
If I wish to do all columns (and count them), I could just change row[] manually, but I wish to learn so I tried this:
with open(mfile) as csvfile:
test = csv.reader(csvfile,dialect='excel-tab')
ncol=len(test[0])
for column in test:
for row in test:
if len(row[column]) > oldlen:
newlen = len(row[0])
print (column,newlen)
Which, of course, doesnt make programatic sense. But it indicates, I hope, what my intention is. I have to do the columns first so I can get the max length out of each column, across all rows.
You can use a dict to store a column number->max length lookup and assign to that by looping over each column of each row.
lengths = {}
with open(mfile) as csvfile:
test = csv.reader(csvfile, dialect='excel-tab')
for row in test:
for colno, col in enumerate(row):
lengths[colno] = max(len(col), lengths.get(colno, 0))
The number of columns will be len(lengths), and the maximum length of each will be accessible as lengths[0] for the first column lengths[1] for the second etc...
You can transpose the rows into columns with the zip() function:
with open(mfile) as csvfile:
test = csv.reader(csvfile, dialect='excel-tab')
columns = list(zip(*test))
and then get the maximum value per column:
for col in columns:
print(max(col))

How to find specific row in Python CSV module

I need to find the third row from column 4 to the end of the a CSV file. How would I do that? I know I can find the values from the 4th column on with
row[3]
but how do I get specifically the third row?
You could convert the csv reader object into a list of lists... The rows are stored in a list, which contains lists of the columns.
So:
csvr = csv.reader(file)
csvr = list(csvr)
csvr[2] # The 3rd row
csvr[2][3] # The 4th column on the 3rd row.
csvr[-4][-3]# The 3rd column from the right on the 4th row from the end
You could keep a counter for counting the number of rows:
counter = 1
for row in reader:
if counter == 3:
print('Interested in third row')
counter += 1
You could use itertools.islice to extract the row of data you wanted, then index into it.
Note that the rows and columns are numbered from zero, not one.
import csv
from itertools import islice
def get_row_col(csv_filename, row, col):
with open(csv_filename, 'rb') as f:
return next(islice(csv.reader(f), row, row+1))[col]
This one is a very basic code that will do the job and you can easily make a function out of it.
import csv
target_row = 3
target_col = 4
with open('yourfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
n = 0
for row in reader:
if row == target_row:
data = row.split()[target_col]
break
print data

Python - Convert a matrix to edge list/long form

I have a very large csv file, with a matrix like this:
null,A,B,C
A,0,2,3
B,3,4,2
C,1,2,4
It is always a n*n matrix. The first column and the first row are the names. I want to convert it to a 3 column format (also could be called edge list, long form, etc) like this:
A,A,0
A,B,2
A,C,3
B,A,3
B,B,4
B,C,2
C,A,1
C,B,2
C,C,4
I have used:
row = 0
for line in fin:
line = line.strip("\n")
col = 0
tokens = line.split(",")
for t in tokens:
fout.write("\n%s,%s,%s"%(row,col,t))
col += 1
row += 1
doesn't work...
Could you please help? Thank you..
You also need to enumerate the column titles as your print out the individual cells.
For a matrix file mat.csv:
null,A,B,C
A,0,2,3
B,3,4,2
C,1,2,4
The following program:
csv = open("mat.csv")
columns = csv.readline().strip().split(',')[1:]
for line in csv:
tokens = line.strip().split(',')
row = tokens[0]
for column, cell in zip(columns,tokens[1:]):
print '{},{},{}'.format(row,column,cell)
prints out:
A,A,0
A,B,2
A,C,3
B,A,3
B,B,4
B,C,2
C,A,1
C,B,2
C,C,4
For generating the upper diagonal, you can use the following script:
csv = open("mat.csv")
columns = csv.readline().strip().split(',')[1:]
for i, line in enumerate(csv):
tokens = line.strip().split(',')
row = tokens[0]
for column, cell in zip(columns[i:],tokens[i+1:]):
print '{},{},{}'.format(row,column,cell)
which results in the output:
A,A,0
A,B,2
A,C,3
B,B,4
B,C,2
C,C,4
You need to skip the first column in each line:
for t in tokens[1:]:

Categories