Replace value of specific column in all non header rows - python

Below is some python code that runs on a file similar to this (old_file.csv).
A,B,C,D
1,2,XX,3
11,22,XX,33
111,222,XX,333
How can I iterate through all lines in the old_file.csv (if I don't know the length of the file) and replace all values in column C or index 2 or cells[row][2] (based on cells[row][col]). But I'd like to ignore the header row. In the new_file.csv, all values containing 'XX' will become 'YY' for example.
import csv
r = csv.reader(open('old_file.csv'))
cells = [l for l in r]
cells[1][2] = 'YY'
cells[2][2] = 'YY'
cells[3][2] = 'YY'
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(cells)

Just small change in #Soviut ans, try this I think this will help you
import csv
rows = csv.reader(open('old_file.csv'))
newRows=[]
for i, row in enumerate(rows):
# ignore the first row, modify all the rest
if i > 0:
row[2] = 'YY'
newRows.append(row)
# write rows to new CSV file, no header is written unless explicitly told to
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(newRows)

You can very easily loop over the array of rows and replace values in the target cell.
# get rows from old CSV file
rows = csv.reader(open('old_file.csv'))
# iterate over each row and replace target cell
for i, row in enumerate(rows):
# ignore the first row, modify all the rest
if i > 0:
row[2] = 'YY'
# write rows to new CSV file, no header is written unless explicitly told to
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(rows)

csv reader makes arrays, so you could just run it on r[1:]

len(cells) is the number of rows. Iterating from 1 makes it skip the header line. Also the lines should be cells.
import csv
r = csv.reader(open('old_file.csv'))
cells = [l for l in r]
for i in range(1, len(cells)):
cells[i][2] = 'YY'
w = csv.writer(open('new_file.csv', 'wb'))
w.writerows(cells)

read_handle = open('old_file.csv', 'r')
data = read_handle.read().split('\n')
read_handle.close()
new_data = []
new_data.append(data[0])
for line in data[1:]:
if not line:
new_data.append(line)
continue
line = line.split(',')
line[2] = 'YY'
new_data.append(','.join(line))
write_handle = open('new_file.csv', 'w')
write_handle.writelines('\n'.join(new_data))
write_handle.close()

Related

Iteration and index dropping using general logic in python

So I've got this code I've been working on for a few days. I need to iterate through a set of csv's, and using general logic, find the indexes which don't have the same number of columns as index 2 and strip them out of the new csv. I've gotten the code to this point, but I'm stuck as to how to use slicing to strip the broken index.
Say each index in file A is supposed to have 10 columns, and for some reason index 2,000 logs with only 7 columns. How is the best way to approach this problem to get the code to strip index 2,000 out of the new csv?
#Comments to the right
for f in TD_files: #FOR ALL TREND FILES:
with open(f,newline='',encoding='latin1') as g: #open file as read
r = csv.reader((line.replace('\0','') for line in g)) #declare read variable for list while stripping nulls
data = [line for line in r] #set list to all data in file
for j in range(0,len(data)): #set up data variable
if data[j][2] != data[j-1][2] and j != 0: #compare index j2 and j2-1
print('Index Not Equal') #print debug
data[0] = TDmachineID #add machine ID line
data[1] = trendHeader #add trend header line
with open(f,'w',newline='') as g: #open file as write
w = csv.writer(g) #declare write variable
w.writerows(data)
The Index To Strip
EDIT
Since you loop through the whole data anyway, I would replace that \0 at the same list comprehension when checking for the length. It looks cleaner to me and works the same.
with open(f, newline='', encoding='latin1') as g:
raw_data = csv.reader(g)
data = [[elem.replace('\0', '') for elem in line] for line in raw_data if len(line)==10]
data[0] = TDmachineID
data[1] = trendHeader
old answer:
You could add a condition to your list comprehension if the list has the length 10.
with open(f,newline='',encoding='latin1') as g:
r = csv.reader((line.replace('\0','') for line in g))
data = [line for line in r if len(line)==10] #add condition to check if the line is added to your data
data[0] = TDmachineID
data[1] = trendHeader

Is there a function to concatenate two header rows into one?

Consider the following textfile excerpt
Distance,Velocity,Time
(m),(m/s),(s)
1,1,1
2,1,2
3,1,3
I want it to be transformed into this:
Distance(m),Velocity(m/s),Time(s)
1,1,1
2,1,2
3,1,3
In other words, I want to concatenate rows that contains text, and I want them to be concatenated column-wise.
I am initially manipulating a textfile that's generated from a software. I have successfully transformed it down to only numeric columns and their headers, in a csv format. But I have multiple headers for each column. And I need all the information in each header row, because the column attributes will differ from file to file. How can I do this in a smart way in python?
edit: Thank you for your suggestions, it helped me a lot. I used Daweos solution, and added dynamic row count because number of header rows may differ from 2 to 7, depending on the generated output. Here's the code snippet i ended up with.
# Get column headers
a = 0
header_rows= 0
with open(full,"r") as input:
Lines= ""
for line in input:
l = line
g = re.sub(' +',' ',l)
y = re.sub('\t',',',g)
numlines += 1
if len(l.encode('ANSI')) > 250:
# finds header start row
a += 1
if a>0:
# finds header end row
if "---" in line:
header_rows = numlines - (numlines-a+1)
break
else:
# Lines is my headers string
Lines = Lines + "%s" % (y) + ' '
output.close()
# Create concatenated column headers
rows = [i.split(',') for i in Lines.rstrip().split('\n')]
cols = [list(c) for c in zip(*rows)]
for i in (cols):
for j in (rows):
newcolz = [list(c) for c in zip(*rows)]
print(newcolz)
I would do it following way:
txt = " Distance,Velocity,Time \n (m),(m/s),(s) \n 1,1,1 \n 2,1,2 \n 3,1,3 \n "
rows = [i.split(',') for i in txt.rstrip().split('\n')]
cols = [list(c) for c in zip(*rows)]
newcols = [[i[0]+i[1],*i[2:]] for i in cols]
newrows = [','.join(i) for i in zip(*newcols)]
print(newtxt)
Output:
Distance (m),Velocity(m/s),Time (s)
1,1,1
2,1,2
3,1,3
Crucial here is usage of zip to transpose your data, so I can deal with columns rather than rows. [[i[0]+i[1],*i[2:]] for i in cols] is responsible for actual concat, so if you would have headers spanning 3 lines you can do [[i[0]+i[1]+i[2],*i[3:]] for i in cols] and so on.
I am not aware of anything that exists to do this so instaed you can just write a custom function. In the example below the function takes to strings and also a separator which defaults to ,.
It will split each string into a list then use list comprehension using zip to pair up the lists. and then joining the pairs.
Lastly it will join the consolidated headers again with the separator.
def concat_headers(header1, header2, seperator=","):
headers1 = header1.split(seperator)
headers2 = header2.split(seperator)
consolidated_headers = ["".join(values) for values in zip(headers1, headers2)]
return seperator.join(consolidated_headers)
data = """Distance,Velocity,Time\n(m),(m/s),(s)\n1,1,1\n2,1,2\n3,1,3\n"""
header1, header2, *lines = data.splitlines()
consolidated_headers = concat_headers(header1, header2)
print(consolidated_headers)
print("\n".join(lines))
OUTPUT
Distance(m),Velocity(m/s),Time(s)
1,1,1
2,1,2
3,1,3
You don't really need a function to do it because it can be done like this using the csv module:
import csv
data_filename = 'position_data.csv'
new_filename = 'new_position_data.csv'
with open(data_filename, 'r', newline='') as inp, \
open(new_filename, 'w', newline='') as outp:
reader, writer = csv.reader(inp), csv.writer(outp)
row1, row2 = next(reader), next(reader)
new_header = [a+b for a,b in zip(row1, row2)]
writer.writerow(new_header)
# Copy the rest of the input file.
for row in reader:
writer.writerow(row)

Python CSV Reader

I have a CSV from a system that has a load of rubbish at the top of the file, so the header row is about row 5 or could even be 14 depending on the gibberish the report puts out.
I used to use:
idx = next(idx for idx, row in enumerate(csvreader) if len(row) > 2)
to go through the rows that had less than 2 columns, then when it hit the col headers, of which there are 12, it would stop, and then I could use idx with skiprows when reading the CSV file.
The system has had an update and someone thought it would be good to have the CSV file valid by adding in 11 blank commas after their gibberish to align the header count.
so now I have a CSV like:
sadjfhasdkljfhasd,,,,,,,,,,
dsfasdgasfg,,,,,,,,,,
time,date,code,product
etc..
I tried:
idx = next(idx for idx, row in enumerate(csvreader) if row in (None, "") > 2)
but I think that's a Pandas thing and it just fails.
Any ideas on how i can get to my header row?
CODE:
lmf = askopenfilename(filetypes=(("CSV Files",".csv"),("All Files","*.*")))
# Section gets row number where headers start
with open(lmf, 'r') as fin:
csvreader = csv.reader(fin)
print(csvreader)
input('hold')
idx = next(idx for idx, row in enumerate(csvreader) if len(row) > 2)
# Reopens file parsing the number for the row headers
lmkcsv = pd.read_csv(lmf, skiprows=idx)
lm = lm.append(lmkcsv)
print(lm)
Since your csv is now a valid file and you just want to filter out the header rows without a certain amount of columns, you can just do that in pandas directly.
import pandas as pd
minimum_cols_required = 3
lmkcsv = pd.read_csv()
lmkcsv = lmkcsv.dropna(thresh=minimum_cols_required, inplace=True)
If your csv data have a lot of empty values as well that gets caught in this threshold, then just slightly modify your code:
idx = next(idx for idx, row in enumerate(csvreader) if len(set(row)) > 3)
I'm not sure in what case a None would return, so the set(row) should do. If your headers for whatever are duplicates as well, do this:
from collections import Counter
# ...
idx = next(idx for idx, row in enumerate(csvreader) if len(row) - Counter(row)[''] > 2)
And how about erasing the starting lines, doing some logic, like checking many ',' exist's or some word. Something like:
f = open("target.txt","r+")
d = f.readlines()
f.seek(0)
for i in d:
if "sadjfhasdkljfhasd" not in i:
f.write(i)
f.truncate()
f.close()
after that, read normaly the file.

Python - Convert a matrix to edge list/long form

I have a very large csv file, with a matrix like this:
null,A,B,C
A,0,2,3
B,3,4,2
C,1,2,4
It is always a n*n matrix. The first column and the first row are the names. I want to convert it to a 3 column format (also could be called edge list, long form, etc) like this:
A,A,0
A,B,2
A,C,3
B,A,3
B,B,4
B,C,2
C,A,1
C,B,2
C,C,4
I have used:
row = 0
for line in fin:
line = line.strip("\n")
col = 0
tokens = line.split(",")
for t in tokens:
fout.write("\n%s,%s,%s"%(row,col,t))
col += 1
row += 1
doesn't work...
Could you please help? Thank you..
You also need to enumerate the column titles as your print out the individual cells.
For a matrix file mat.csv:
null,A,B,C
A,0,2,3
B,3,4,2
C,1,2,4
The following program:
csv = open("mat.csv")
columns = csv.readline().strip().split(',')[1:]
for line in csv:
tokens = line.strip().split(',')
row = tokens[0]
for column, cell in zip(columns,tokens[1:]):
print '{},{},{}'.format(row,column,cell)
prints out:
A,A,0
A,B,2
A,C,3
B,A,3
B,B,4
B,C,2
C,A,1
C,B,2
C,C,4
For generating the upper diagonal, you can use the following script:
csv = open("mat.csv")
columns = csv.readline().strip().split(',')[1:]
for i, line in enumerate(csv):
tokens = line.strip().split(',')
row = tokens[0]
for column, cell in zip(columns[i:],tokens[i+1:]):
print '{},{},{}'.format(row,column,cell)
which results in the output:
A,A,0
A,B,2
A,C,3
B,B,4
B,C,2
C,C,4
You need to skip the first column in each line:
for t in tokens[1:]:

Dynamically remove a column from a CSV

I want to dynamically remove a column from a CSV, this is what I have so far. I have no idea where to go from here though:
# Remove column not needed.
column_numbers_to_remove = 3,2,
file = upload.filepath
#I READ THE FILE
file_read = csv.reader(file)
REMOVE 3 and 2 column from the CSV
UPDATE SAVE CSV
Use enumerate to get the column index, and create a new row without the columns you don't want... eg:
for row in file_read:
new_row = [col for idx, col in enumerate(row) if idx not in (3, 2)]
Then write out your rows using csv.writer somewhere...
Read the csv and write into another file after removing the columns.
import csv
creader = csv.reader(open('csv.csv'))
cwriter = csv.writer(open('csv2.csv', 'w'))
for cline in creader:
new_line = [val for col, val in enumerate(cline) if col not in (2,3)]
cwriter.writerow(new_line)

Categories