I have two files in this format
1.txt
what i want to do is to merge both these files by considering the first column and append the output as following
expected output
my script i have written is not working
file1=raw_input('Enter the first file name: ')
file2=raw_input('Enter the second file name: ')
with open(file1, 'r') as f1:
with open(file2, 'r') as f2:
mydict = {}
for row in f1:
mydict[row[0]] = row[1:]
for row in f2:
mydict[row[0]] = mydict[row[0]].extend(row[1:])
fout = csv.write(open('out.txt','w'))
for k,v in mydict:
fout.write([k]+v)
Your script doesn't work because you have made a few inaccuraces.
row is a string, so row[0] is the first character, not the first number.
The method .extend returns nothing, so it doesn't make a sense to use =.
I would fix your script in this way:
import csv
mydict = {}
with open('1.csv') as f:
reader = csv.reader(f)
for row in reader:
mydict[row[0]] = row[1:]
with open('2.csv') as f:
reader = csv.reader(f)
with open('out.csv', 'w') as fout:
writer = csv.writer(fout)
for row in reader:
new_row = row + mydict[row[0]]
writer.writerow(new_row)
The following approach should work:
import csv
d_1 = {}
with open('1.csv') as f_1:
for row in csv.reader(f_1):
d_1[row[0]] = row[4:]
with open('2.csv') as f_2, open('out.csv', 'wb') as f_out:
csv_out = csv.writer(f_out)
for row in csv.reader(f_2):
if row[0] in d_1:
row.extend(d_1[row[0]])
csv_out.writerow(row)
This first reads 1.csv into a dictionary, leaving out the first three columns. It then reads each entry in 2.csv, and if the first column matches an entry in the dictionary, it appends the result before writing to the output.
Note: Entries present in 1.csv but not in 2.csv will be ignored. Secondly, entries in 2.csv which are not in 1.csv are written unchanged.
This gives you an out.csv file as follows:
223456,233,334,334,45,667,445,6667,77798,881,2234,44556,3333,22334,44555,22233,22334,22222,22334,2234,2233,222,55,666666
333883,445,445,4445,44,556,555,333,44445,5556,5555,223,334,5566,334,445,667,334,556,776,45,2223,3334,4444
For Python 2.6, split the with onto two lines as follows:
import csv
d_1 = {}
with open('1.csv') as f_1:
for row in csv.reader(f_1):
d_1[row[0]] = row[4:]
with open('2.csv') as f_2:
with open('out.csv', 'wb') as f_out:
csv_out = csv.writer(f_out)
for row in csv.reader(f_2):
if row[0] in d_1:
row.extend(d_1[row[0]])
csv_out.writerow(row)
file1=raw_input('Enter the first file name: ')
file2=raw_input('Enter the second file name: ')
with open(file1, 'r') as f1:
r1 = f1.read()
with open(file2, 'r') as f2:
r2 = f2.read()
with open('out.txt','w') as o2:
o2.write('{0},{1}'.format(r1, r2))
Related
I'd like to have a next line inside my for loop, currently, what happens is that since it is inside a for loop, all the data is stored in an array and once I write it at the end of my code, it prints as one line.
fields = []
new_rows_list = []
file1 = open('CSV_sample_file.csv','rb')
reader = csv.reader(file1)
fields = reader.next()
for row in reader:
for column in row:
cellValue = column
new_row = find_and_mask_cc(cellValue)
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerow(new_rows_list)
file2.close()
What I am getting is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111,EMSPDGCAJAPIN,test2,511111XXXXXX1111,EMSPDGNCRETES,test3,611111XXXXXX1111
My expected output is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111
EMSPDGCAJAPIN,test2,511111XXXXXX111
EMSPDGNCRETES,test3,611111XXXXXX1111
You're appending all columns to the same list new_rows_list and writing it as one row with writer.writerow(new_rows_list).
You can make new_rows_list a list of lists and use writer.writerows for output instead:
...
for row in reader:
new_row = []
for column in row:
cellValue = column
new_row.append(find_and_mask_cc(cellValue))
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerows(new_rows_list)
...
Alternatively, you can pass to writerows a generator expression that iterates through reader to write each row with columns converted by find_and_mask_cc to the output as you read it from the input, so it won't require reading the entire input into memory:
with open('CSV_sample_file.csv') as file1, open('CSV_sample_file2.csv', 'w', newline='') as file2:
reader = csv.reader(file1)
writer = csv.writer(file2)
writer.writerow(next(reader))
writer.writerows(map(find_and_mask_cc, row) for row in reader)
Demo: https://repl.it/repls/SatisfiedSardonicExponents
I would like to read each row of the csv file and match each word in the row with a list of strings. If any of the strings appears in the row, then write that string at the end of the line separated by comma.
The code below doesn't give me what I want.
file = 'test.csv'
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
name=''
with open('testnew2.csv','a') as f:
for line in text_lines:
line=str(line)
#words = line.split()
with open('names.csv', 'r') as fd:
reader = csv.reader(fd, delimiter=',')
for row in reader:
if row[0] in line:
name=row
print(name)
f.write(line+","+name[0]+'\n')
A sample of test.csv would look like this:
A,B,C,D
ABCD,,,
Total,Robert,,
Name,Annie,,
Total,Robert,,
And the names.csv would look:
Robert
Annie
Amanda
The output I want is:
A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert
Currently the code will get rid of lines that don't result in a match, so I got:
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert
Process each line by testing row[1] and appending the 5th column, then writing it. The name list isn't really a csv. If it's really long use a set for lookup. Read it only once for efficiency as well.
import csv
with open('names.txt') as f:
names = set(f.read().strip().splitlines())
# newline='' per Python 3 csv documentation...
with open('input.csv',newline='') as inf:
with open('output.csv','w',newline='') as outf:
r = csv.reader(inf)
w = csv.writer(outf)
for row in r:
row.append(row[1] if row[1] in names else '')
w.writerow(row)
Output:
A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert
I think the problem is you're only writing when the name is in the row. To fix that move the writing call outside of the if conditional:
if row[0] in line:
name=row
print(name)
f.write(line+","+name[0]+'\n')
I'm guessing that print statement is for testing purposes?
EDIT: On second thought, you may need to move name='' inside the loop as well so it is reset after each row is written, that way you don't get names from matched rows bleeding into unmatched rows.
EDIT: Decided to show an implementation that should avoid the (possible) problem of two matched names in a row:
EDIT: Changed name=row and the call of name[0] in f.write() to name=row[0] and a call of name in f.write()
file = 'test.csv'
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
with open('testnew2.csv','a') as f:
for line in text_lines:
name=''
line=str(line)
#words = line.split()
with open('names.csv', 'r') as fd:
reader = csv.reader(fd, delimiter=',')
match=False
while match == False:
for row in reader:
if row[0] in line:
name=row[0]
print(name)
match=True
f.write(line+","+name+'\n')
Try this as well:
import csv
myFile = open('testnew2.csv', 'wb+')
writer = csv.writer(myFile)
reader2 = open('names.csv').readlines()
with open('test.csv') as File1:
reader1 = csv.reader(File1)
for row in reader1:
name = ""
for record in reader2:
record = record.replace("\n","")
if record in row:
row.append(record)
writer.writerow(row)
break
I have two CSV files with 6 columns each and both have one common column EmpID (the primary key for comparison). For Example, File1.csv is:
EmpID1,Name1,Email1,City1,Phone1,Hobby1
120034,Tom Hanks,tom.hanks#gmail.com,Mumbai,8888999,Fishing
And File2.csv is
EmpID2,Name2,Email2,City2,Phone2,Hobby2
120034,Tom Hanks,hanks.tom#gmail.com,Mumbai,8888999,Running
The files need to be compared for differences and only rows and columns that are different should be added into a new output file as
EmpID1,Email1,Email2,Hobby1,Hobby2
120034,tom.hanks#gmail.com,hanks.tom#gmail.com,Fishing,Running
Currently I have written the below piece of code in Python. Now I am wondering on how to identify and pick the differences. Any pointers and help will be much appreciated.
import csv
import os
os.getcwd()
os.chdir('filepath')
with open('File1.csv', 'r') as csv1, open('File2.csv', 'r') as csv2:
file1 = csv1.readlines()`
file2 = csv2.readlines()`
with open('OutputFile.csv', 'w') as output:
for line in file1:`
if line not in file2:
output.write(line)
output.close()
csv1.close()
csv2.close()
First read the files to a dict structure, with the 'EMPID' as key pointing to the entire row:
import csv
fieldnames = [] # to store all fieldnames
with open('File1.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data1 = {row['EMPID1']: row for row in cf}
fieldnames.extend(cf.fieldnames)
with open('File2.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data2 = {row['EMPID2']: row for row in cf}
fieldnames.extend(cf.fieldnames)
Then identify all ids that are in both dicts:
ids_to_check = set(data1) & set(data2)
Finally, iterate over the ids and compare the rows themselves
with open('OutputFile.csv', 'w') as f:
cw = csv.DictWriter(f, fieldnames, delimiter=',')
cw.writeheader()
for id in ids_to_check:
diff = compare_dict(data1[id], data2[id], fieldnames)
if diff:
cw.writerow(diff)
Here's the compare_dict function implementation:
def compare_dict(d1, d2, fields_compare):
fields_compare = set(field.rstrip('12') for field in fields_compare)
if any(d1[k + '1'] != d2[k + '2'] for k in fields_compare):
# they differ, return a new dict with all fields
result = d1.copy()
result.update(d2)
return result
else:
return {}
I am trying to print out the differences by comparing a column between 2 csv files.
CSV1:
SERVER, FQDN, IP_ADDRESS,
serverA, device1.com, 10.10.10.1
serverA,device2.com,10.11.11.1
serverC,device3.com,10.12.12.1
and so on..
CSV2:
FQDN, IP_ADDRESS, SERVER, LOCATION
device3.com,10.12.12.1,serverC,xx
device679.com,20.3.67.1,serverA,we
device1.com,10.10.10.1,serverA,ac
device345.com,192.168.2.0,serverA,ad
device2.com,192.168.6.0,serverB,af
and so on...
What I am looking to do is to compare the FQDN column and write the differences to a new csv output file. So my output would look something like this:
Output.csv:
FQDN, IP_ADDRESS, SERVER, LOCATION
device679.com,20.3.67.1,serverA,we
device345.com,192.168.2.0,serverA,ad
and so on..
I have tried, but not able to get the output.
This is my Code, please tell me where i am going wrong;
import csv
data = {} # creating list to store the data
with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
DATA[col[0]] = col[1]
with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER'])
csv_output.writerow(fieldnames) # prints header to the output file
for col in reader1:
if col[1] not in reader2:
csv_output.writerow(col)
(EDIT) This is another approach that I have used:
import csv
f1 = (open("CSV1.csv"))
f2 = (open("CSV2.csv"))
csv_f1 = csv.reader(f1)
csv_f2 = csv.reader(f2)
for col1, col2 in zip(csv_f1, csv_f2):
if col2[0] not in col1[1]:
print(col2[0])
Basically, here I am only trying to find out first whether the unmatched FQDNs are printed or not. But it is printing out the whole CSV1 column instead. Please help guys, lot of research has went into this, but found no luck yet! :(
This code uses the built-in difflib to spit out the lines from file1.csv that don't appear in file2.csv and vice versa.
I use the Differ object for identifying line changes.
I assumed that you would not regard line swapping as a difference, that's why I added the sorted() function call.
from difflib import Differ
csv_file1 = sorted(open("file1.csv", 'r').readlines())
csv_file2 = sorted(open("file2.csv", 'r').readlines())
with open("diff.csv", 'w') as f:
for line in Differ().compare(csv_file1,csv_file2)):
dmode, line = line[:2], line[2:]
if dmode.strip() == "":
continue
f.write(line + "\n")
Note that if the line differs somehow (not only in the FQDN column) it would appear in diff.csv
import csv
data = {} # creating list to store the data
with open('CSV1.csv', 'r') as lookuplist, open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w') as f_output:
reader1 = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER', 'LOCATION'])
csv_output.writerow(fieldnames) # prints header to the output file
_tempFqdn = []
for i,dt in enumerate(reader1):
if i==0:
continue
_tempFqdn.append(dt[1].strip())
for i,col in enumerate(reader2):
if i==0:
continue
if col[0].strip() not in _tempFqdn:
csv_output.writerow(col)
import csv
data = {} # creating dictionary to store the data
with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
data[col[1]] = col[1] # stores the data from column 0 to column 1 in the data list
with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
'MHT_3'])
csv_output.writerow(fieldnames) # prints header to the output file
for col in reader2:
if col[0] not in data: # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
col = [col[0]]
csv_output.writerow(col) # writes all the data that is matched in CMDB WLC Extract
So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.
I'm trying to write to file2.csv file by values from file1.csv file using a keyfile.csv which contains the mapping between two files as the two files don't have the same column order.
def convert():
Keyfile = open('keyfile.csv', 'rb')
file1 = open('file1.csv', 'rb')
file2 = open('file2.csv', 'w')
reader_Keyfile = csv.reader(Keyfile, delimiter=",")
reader_file1 = csv.reader(file1, delimiter=",")
writer_file2 = csv.writer(file2, delimiter=",")
for row_file1 in reader_file1:
for row_Keyfile in reader_Keyfile:
for index_val in row_Keyfile:
file2.write(row_file1[int(index_val)-1]+',')
# Closing all the files
file2.close()
Keyfile.close()
file1.close()
# keyfile structure: 3,77,65,78,1,10,8...
# so 1st column of file2 is 3rd column of file1 ;
# col2 of file 2 is col77 of file1 and so on
I'm only able to write only one row in file2.csv. It should have as many rows as there are in file1.csv. How do I move to the next row after one row is finished ? I'm assuming Loop should take care of that but that's not happening.What am I doing wrong ?
You have two problems.
You should only read keyfile once and build a dict out of the mapping
You need to write a \n at the end of each line of your output file
I am assuming the KeyFile is just one row, giving the mappings for all rows. Something like the following should work:
def convert():
with open('keyfile.csv') as Keyfile, open('file1.csv', 'r') as file1, open('file2.csv', 'wb') as file2:
mappings = next(csv.reader(Keyfile, delimiter=","))
mappings = [int(x)-1 if x else None for x in mappings]
reader_file1 = csv.reader(file1, delimiter=",")
writer_file2 = csv.writer(file2, delimiter=",")
for row_file1 in reader_file1:
row = [''] * len(mappings)
for from_index, to_index in enumerate(mappings):
if to_index != None:
row[to_index] = row_file1[from_index]
writer_file2.writerow(row)
It assumes column mappings start from 1.
Your nested looping is problematic as others mentioned. Instead, create the mapping outside of the row iteration, then write the rows based on the mapping. I use a dict object for this.
import csv
Keyfile = open('keyfile.csv', 'rb')
file_out = csv.reader(open('file1.csv', 'rb'), delimiter=",")
file_in = csv.writer(open('file2.csv', 'w'), delimiter=",")
mapDict = {}
# the first line in KeyFile convert to dict
reader = csv.reader(Keyfile, delimiter=',')
for i, v in enumerate(reader.next()):
if v != ' ':
mapDict[i] = int(v)
# re-index the row in file_in based on mapDict
for row in file_out:
file_in.writerow([row[c] for c in mapDict.values()])