Compare a column between 2 csv files and write differences using Python

Compare a column between 2 csv files and write differences using Python - python

I am trying to print out the differences by comparing a column between 2 csv files.
CSV1:
SERVER, FQDN, IP_ADDRESS,
serverA, device1.com, 10.10.10.1
serverA,device2.com,10.11.11.1
serverC,device3.com,10.12.12.1
and so on..
CSV2:
FQDN, IP_ADDRESS, SERVER, LOCATION
device3.com,10.12.12.1,serverC,xx
device679.com,20.3.67.1,serverA,we
device1.com,10.10.10.1,serverA,ac
device345.com,192.168.2.0,serverA,ad
device2.com,192.168.6.0,serverB,af
and so on...
What I am looking to do is to compare the FQDN column and write the differences to a new csv output file. So my output would look something like this:
Output.csv:
FQDN, IP_ADDRESS, SERVER, LOCATION
device679.com,20.3.67.1,serverA,we
device345.com,192.168.2.0,serverA,ad
and so on..
I have tried, but not able to get the output.
This is my Code, please tell me where i am going wrong;
import csv
data = {} # creating list to store the data
with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
DATA[col[0]] = col[1]
with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER'])
csv_output.writerow(fieldnames) # prints header to the output file
for col in reader1:
if col[1] not in reader2:
csv_output.writerow(col)
(EDIT) This is another approach that I have used:
import csv
f1 = (open("CSV1.csv"))
f2 = (open("CSV2.csv"))
csv_f1 = csv.reader(f1)
csv_f2 = csv.reader(f2)
for col1, col2 in zip(csv_f1, csv_f2):
if col2[0] not in col1[1]:
print(col2[0])
Basically, here I am only trying to find out first whether the unmatched FQDNs are printed or not. But it is printing out the whole CSV1 column instead. Please help guys, lot of research has went into this, but found no luck yet! :(

This code uses the built-in difflib to spit out the lines from file1.csv that don't appear in file2.csv and vice versa.
I use the Differ object for identifying line changes.
I assumed that you would not regard line swapping as a difference, that's why I added the sorted() function call.
from difflib import Differ
csv_file1 = sorted(open("file1.csv", 'r').readlines())
csv_file2 = sorted(open("file2.csv", 'r').readlines())
with open("diff.csv", 'w') as f:
for line in Differ().compare(csv_file1,csv_file2)):
dmode, line = line[:2], line[2:]
if dmode.strip() == "":
continue
f.write(line + "\n")
Note that if the line differs somehow (not only in the FQDN column) it would appear in diff.csv

import csv
data = {} # creating list to store the data
with open('CSV1.csv', 'r') as lookuplist, open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w') as f_output:
reader1 = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER', 'LOCATION'])
csv_output.writerow(fieldnames) # prints header to the output file
_tempFqdn = []
for i,dt in enumerate(reader1):
if i==0:
continue
_tempFqdn.append(dt[1].strip())
for i,col in enumerate(reader2):
if i==0:
continue
if col[0].strip() not in _tempFqdn:
csv_output.writerow(col)

import csv
data = {} # creating dictionary to store the data
with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
data[col[1]] = col[1] # stores the data from column 0 to column 1 in the data list
with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
'MHT_3'])
csv_output.writerow(fieldnames) # prints header to the output file
for col in reader2:
if col[0] not in data: # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
col = [col[0]]
csv_output.writerow(col) # writes all the data that is matched in CMDB WLC Extract
So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.

Related

Create multiple files from unique values of a column using inbuilt libraries of python

I started learning python and was wondering if there was a way to create multiple files from unique values of a column. I know there are 100's of ways of getting it done through pandas. But I am looking to have it done through inbuilt libraries. I couldn't find a single example where its done through inbuilt libraries.
Here is the sample csv file data:
uniquevalue|count
a|123
b|345
c|567
d|789
a|123
b|345
c|567
Sample output file:
a.csv
uniquevalue|count
a|123
a|123
b.csv
b|345
b|345
I am struggling with looping on unique values in a column and then print them out. Can someone explain with logic how to do it ? That will be much appreciated. Thanks.

import csv
from collections import defaultdict
header = []
data = defaultdict(list)
DELIMITER = "|"
with open("inputfile.csv", newline="") as csvfile:
reader = csv.reader(csvfile, delimiter=DELIMITER)
for i, row in enumerate(reader):
if i == 0:
header = row
else:
key = row[0]
data[key].append(row)
for key, value in data.items():
filename = f"{key}.csv"
with open(filename, "w", newline="") as f:
writer = csv.writer(f, delimiter=DELIMITER)
rows = [header] + value
writer.writerows(rows)

import csv
with open('sample.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
with open(f"{row[0]}.csv", 'a') as inner:
writer = csv.writer(
inner, delimiter='|',
fieldnames=('uniquevalue', 'count')
)
writer.writerow(row)

the task can also be done without using csv module. the lines of the file are read, and with read_file.read().splitlines()[1:] the newline characters are stripped off, also skipping the header line of the csv file. with a set a unique collection of inputdata is created, that is used to count number of duplicates and to create the output files.
with open("unique_sample.csv", "r") as read_file:
items = read_file.read().splitlines()[1:]
for line in set(items):
with open(line[:line.index('|')] + '.csv', 'w') as output:
output.write((line + '\n') * items.count(line))

Insert next line in For loop

I'd like to have a next line inside my for loop, currently, what happens is that since it is inside a for loop, all the data is stored in an array and once I write it at the end of my code, it prints as one line.
fields = []
new_rows_list = []
file1 = open('CSV_sample_file.csv','rb')
reader = csv.reader(file1)
fields = reader.next()
for row in reader:
for column in row:
cellValue = column
new_row = find_and_mask_cc(cellValue)
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerow(new_rows_list)
file2.close()
What I am getting is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111,EMSPDGCAJAPIN,test2,511111XXXXXX1111,EMSPDGNCRETES,test3,611111XXXXXX1111
My expected output is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111
EMSPDGCAJAPIN,test2,511111XXXXXX111
EMSPDGNCRETES,test3,611111XXXXXX1111

You're appending all columns to the same list new_rows_list and writing it as one row with writer.writerow(new_rows_list).
You can make new_rows_list a list of lists and use writer.writerows for output instead:
...
for row in reader:
new_row = []
for column in row:
cellValue = column
new_row.append(find_and_mask_cc(cellValue))
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerows(new_rows_list)
...
Alternatively, you can pass to writerows a generator expression that iterates through reader to write each row with columns converted by find_and_mask_cc to the output as you read it from the input, so it won't require reading the entire input into memory:
with open('CSV_sample_file.csv') as file1, open('CSV_sample_file2.csv', 'w', newline='') as file2:
reader = csv.reader(file1)
writer = csv.writer(file2)
writer.writerow(next(reader))
writer.writerows(map(find_and_mask_cc, row) for row in reader)
Demo: https://repl.it/repls/SatisfiedSardonicExponents

Compare two CSV files and output only rows with the specific columns that are different

I have two CSV files with 6 columns each and both have one common column EmpID (the primary key for comparison). For Example, File1.csv is:
EmpID1,Name1,Email1,City1,Phone1,Hobby1
120034,Tom Hanks,tom.hanks#gmail.com,Mumbai,8888999,Fishing
And File2.csv is
EmpID2,Name2,Email2,City2,Phone2,Hobby2
120034,Tom Hanks,hanks.tom#gmail.com,Mumbai,8888999,Running
The files need to be compared for differences and only rows and columns that are different should be added into a new output file as
EmpID1,Email1,Email2,Hobby1,Hobby2
120034,tom.hanks#gmail.com,hanks.tom#gmail.com,Fishing,Running
Currently I have written the below piece of code in Python. Now I am wondering on how to identify and pick the differences. Any pointers and help will be much appreciated.
import csv
import os
os.getcwd()
os.chdir('filepath')
with open('File1.csv', 'r') as csv1, open('File2.csv', 'r') as csv2:
file1 = csv1.readlines()`
file2 = csv2.readlines()`
with open('OutputFile.csv', 'w') as output:
for line in file1:`
if line not in file2:
output.write(line)
output.close()
csv1.close()
csv2.close()

First read the files to a dict structure, with the 'EMPID' as key pointing to the entire row:
import csv
fieldnames = [] # to store all fieldnames
with open('File1.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data1 = {row['EMPID1']: row for row in cf}
fieldnames.extend(cf.fieldnames)
with open('File2.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data2 = {row['EMPID2']: row for row in cf}
fieldnames.extend(cf.fieldnames)
Then identify all ids that are in both dicts:
ids_to_check = set(data1) & set(data2)
Finally, iterate over the ids and compare the rows themselves
with open('OutputFile.csv', 'w') as f:
cw = csv.DictWriter(f, fieldnames, delimiter=',')
cw.writeheader()
for id in ids_to_check:
diff = compare_dict(data1[id], data2[id], fieldnames)
if diff:
cw.writerow(diff)
Here's the compare_dict function implementation:
def compare_dict(d1, d2, fields_compare):
fields_compare = set(field.rstrip('12') for field in fields_compare)
if any(d1[k + '1'] != d2[k + '2'] for k in fields_compare):
# they differ, return a new dict with all fields
result = d1.copy()
result.update(d2)
return result
else:
return {}

How to sort a file alphabetically by named column, python, csv

I have three csv files each with three named columns, 'Genus', 'Species', and 'Source'. I merged the files into a new document and now I need to alphabetize the columns, first by genus and then by species. I figured I could do this by first alphabetizing the species, and then the genus and then they should be in the proper order, but I haven't been able to find anything online that addresses how to sort named columns of strings. I tried lots of different ways of sorting, but it either didn't change anything or replaced all the string in the first column with the last string.
Here's my code for merging the files:
import csv, sys
with open('Footit_aphid_list_mod.csv', 'r') as inny:
reader = csv.DictReader(inny)
with open('Favret_aphid_list_mod.csv', 'r') as inny:
reader1 = csv.DictReader(inny)
with open ('output_al_vonDohlen.csv', 'r') as inny:
reader2 = csv.DictReader(inny)
with open('aphid_list_complete.csv', 'w') as outty:
fieldnames = ['Genus', 'Species', 'Source']
writer = csv.DictWriter(outty, fieldnames = fieldnames)
writer.writeheader()
for record in reader:
writer.writerow(record)
for record in reader1:
writer.writerow(record)
for record in reader2:
writer.writerow(record)
for record in reader:
g = record['Genus']
g = sorted(g)
writer.writerow(record)
inny.closed
outty.closed

If you files aren't insanely large, then read all the rows into a single list, sort it, then write it back:
#!python2
import csv
rows = []
with open('Footit_aphid_list_mod.csv','rb') as inny:
reader = csv.DictReader(inny)
rows.extend(reader)
with open('Favret_aphid_list_mod.csv','rb') as inny:
reader = csv.DictReader(inny)
rows.extend(reader)
with open('output_al_vonDohlen.csv','rb') as inny:
reader = csv.DictReader(inny)
rows.extend(reader)
rows.sort(key=lambda d: (d['Genus'],d['Species']))
with open('aphid_list_complete.csv','wb') as outty:
fieldnames = ['Genus','Species','Source']
writer = csv.DictWriter(outty,fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)

Not able to write to all rows in .csv file in Python?

I'm trying to write to file2.csv file by values from file1.csv file using a keyfile.csv which contains the mapping between two files as the two files don't have the same column order.
def convert():
Keyfile = open('keyfile.csv', 'rb')
file1 = open('file1.csv', 'rb')
file2 = open('file2.csv', 'w')
reader_Keyfile = csv.reader(Keyfile, delimiter=",")
reader_file1 = csv.reader(file1, delimiter=",")
writer_file2 = csv.writer(file2, delimiter=",")
for row_file1 in reader_file1:
for row_Keyfile in reader_Keyfile:
for index_val in row_Keyfile:
file2.write(row_file1[int(index_val)-1]+',')
# Closing all the files
file2.close()
Keyfile.close()
file1.close()
# keyfile structure: 3,77,65,78,1,10,8...
# so 1st column of file2 is 3rd column of file1 ;
# col2 of file 2 is col77 of file1 and so on
I'm only able to write only one row in file2.csv. It should have as many rows as there are in file1.csv. How do I move to the next row after one row is finished ? I'm assuming Loop should take care of that but that's not happening.What am I doing wrong ?

You have two problems.
You should only read keyfile once and build a dict out of the mapping
You need to write a \n at the end of each line of your output file

I am assuming the KeyFile is just one row, giving the mappings for all rows. Something like the following should work:
def convert():
with open('keyfile.csv') as Keyfile, open('file1.csv', 'r') as file1, open('file2.csv', 'wb') as file2:
mappings = next(csv.reader(Keyfile, delimiter=","))
mappings = [int(x)-1 if x else None for x in mappings]
reader_file1 = csv.reader(file1, delimiter=",")
writer_file2 = csv.writer(file2, delimiter=",")
for row_file1 in reader_file1:
row = [''] * len(mappings)
for from_index, to_index in enumerate(mappings):
if to_index != None:
row[to_index] = row_file1[from_index]
writer_file2.writerow(row)
It assumes column mappings start from 1.

Your nested looping is problematic as others mentioned. Instead, create the mapping outside of the row iteration, then write the rows based on the mapping. I use a dict object for this.
import csv
Keyfile = open('keyfile.csv', 'rb')
file_out = csv.reader(open('file1.csv', 'rb'), delimiter=",")
file_in = csv.writer(open('file2.csv', 'w'), delimiter=",")
mapDict = {}
# the first line in KeyFile convert to dict
reader = csv.reader(Keyfile, delimiter=',')
for i, v in enumerate(reader.next()):
if v != ' ':
mapDict[i] = int(v)
# re-index the row in file_in based on mapDict
for row in file_out:
file_in.writerow([row[c] for c in mapDict.values()])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Compare a column between 2 csv files and write differences using Python - python

Related

Create multiple files from unique values of a column using inbuilt libraries of python

Insert next line in For loop

Compare two CSV files and output only rows with the specific columns that are different

How to sort a file alphabetically by named column, python, csv

Not able to write to all rows in .csv file in Python?

Categories

Resources