I'd like to have a next line inside my for loop, currently, what happens is that since it is inside a for loop, all the data is stored in an array and once I write it at the end of my code, it prints as one line.
fields = []
new_rows_list = []
file1 = open('CSV_sample_file.csv','rb')
reader = csv.reader(file1)
fields = reader.next()
for row in reader:
for column in row:
cellValue = column
new_row = find_and_mask_cc(cellValue)
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerow(new_rows_list)
file2.close()
What I am getting is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111,EMSPDGCAJAPIN,test2,511111XXXXXX1111,EMSPDGNCRETES,test3,611111XXXXXX1111
My expected output is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111
EMSPDGCAJAPIN,test2,511111XXXXXX111
EMSPDGNCRETES,test3,611111XXXXXX1111
You're appending all columns to the same list new_rows_list and writing it as one row with writer.writerow(new_rows_list).
You can make new_rows_list a list of lists and use writer.writerows for output instead:
...
for row in reader:
new_row = []
for column in row:
cellValue = column
new_row.append(find_and_mask_cc(cellValue))
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerows(new_rows_list)
...
Alternatively, you can pass to writerows a generator expression that iterates through reader to write each row with columns converted by find_and_mask_cc to the output as you read it from the input, so it won't require reading the entire input into memory:
with open('CSV_sample_file.csv') as file1, open('CSV_sample_file2.csv', 'w', newline='') as file2:
reader = csv.reader(file1)
writer = csv.writer(file2)
writer.writerow(next(reader))
writer.writerows(map(find_and_mask_cc, row) for row in reader)
Demo: https://repl.it/repls/SatisfiedSardonicExponents
Related
I have python code for appending data to the same csv, but when I append the data, it skips rows, and starts from row 15, instead from row 4
import csv
with open('csvtask.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
ls = []
for line in csv_reader:
if len(line['Values'])!= 0:
ls.append(int(line['Values']))
new_ls = ['','','']
for i in range(len(ls)-1):
new_ls.append(ls[i+1]-ls[i])
print(new_ls)
with open('csvtask.csv','a',newline='') as new_file:
csv_writer = csv.writer(new_file)
for i in new_ls:
csv_writer.writerow(('','','','',i))
new_file.close()
Here is the image
It's not really feasible to update a file at the same time you're reading it, so a common workaround it to create a new file. The following does that while preserving the fieldnames in the origin file. The new column will be named Diff.
Since there's no previous value to use to calculate a difference for the first row, the rows of the files are processed using the built-in enumerate() function which provides a value each time it's called which provides the index of the item in the sequence as well as the item itself as the object is iterated. You can use the index to know whether the current row is the first one or not and handle in a special way.
import csv
# Read csv file and calculate values of new column.
with open('csvtask.csv', 'r', newline='') as file:
reader = csv.DictReader(file)
fieldnames = reader.fieldnames # Save for later.
diffs = []
prev_value = 0
for i, row in enumerate(reader):
row['Values'] = int(row['Values']) if row['Values'] else 0
diff = row['Values'] - prev_value if i > 0 else ''
prev_value = row['Values']
diffs.append(diff)
# Read file again and write an updated file with the column added to it.
fieldnames.append('Diff') # Name of new field.
with open('csvtask.csv', 'r', newline='') as inp:
reader = csv.DictReader(inp)
with open('csvtask_updated.csv', 'w', newline='') as outp:
writer = csv.DictWriter(outp, fieldnames)
writer.writeheader()
for i, row in enumerate(reader):
row.update({'Diff': diffs[i]}) # Add new column.
writer.writerow(row)
print('Done')
You can use the DictWriter function like this:-
header = ["data", "values"]
writer = csv.DictWriter(file, fieldnames = header)
data = [[1, 2], [4, 6]]
writer.writerows(data)
I am trying to print out the differences by comparing a column between 2 csv files.
CSV1:
SERVER, FQDN, IP_ADDRESS,
serverA, device1.com, 10.10.10.1
serverA,device2.com,10.11.11.1
serverC,device3.com,10.12.12.1
and so on..
CSV2:
FQDN, IP_ADDRESS, SERVER, LOCATION
device3.com,10.12.12.1,serverC,xx
device679.com,20.3.67.1,serverA,we
device1.com,10.10.10.1,serverA,ac
device345.com,192.168.2.0,serverA,ad
device2.com,192.168.6.0,serverB,af
and so on...
What I am looking to do is to compare the FQDN column and write the differences to a new csv output file. So my output would look something like this:
Output.csv:
FQDN, IP_ADDRESS, SERVER, LOCATION
device679.com,20.3.67.1,serverA,we
device345.com,192.168.2.0,serverA,ad
and so on..
I have tried, but not able to get the output.
This is my Code, please tell me where i am going wrong;
import csv
data = {} # creating list to store the data
with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
DATA[col[0]] = col[1]
with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER'])
csv_output.writerow(fieldnames) # prints header to the output file
for col in reader1:
if col[1] not in reader2:
csv_output.writerow(col)
(EDIT) This is another approach that I have used:
import csv
f1 = (open("CSV1.csv"))
f2 = (open("CSV2.csv"))
csv_f1 = csv.reader(f1)
csv_f2 = csv.reader(f2)
for col1, col2 in zip(csv_f1, csv_f2):
if col2[0] not in col1[1]:
print(col2[0])
Basically, here I am only trying to find out first whether the unmatched FQDNs are printed or not. But it is printing out the whole CSV1 column instead. Please help guys, lot of research has went into this, but found no luck yet! :(
This code uses the built-in difflib to spit out the lines from file1.csv that don't appear in file2.csv and vice versa.
I use the Differ object for identifying line changes.
I assumed that you would not regard line swapping as a difference, that's why I added the sorted() function call.
from difflib import Differ
csv_file1 = sorted(open("file1.csv", 'r').readlines())
csv_file2 = sorted(open("file2.csv", 'r').readlines())
with open("diff.csv", 'w') as f:
for line in Differ().compare(csv_file1,csv_file2)):
dmode, line = line[:2], line[2:]
if dmode.strip() == "":
continue
f.write(line + "\n")
Note that if the line differs somehow (not only in the FQDN column) it would appear in diff.csv
import csv
data = {} # creating list to store the data
with open('CSV1.csv', 'r') as lookuplist, open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w') as f_output:
reader1 = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER', 'LOCATION'])
csv_output.writerow(fieldnames) # prints header to the output file
_tempFqdn = []
for i,dt in enumerate(reader1):
if i==0:
continue
_tempFqdn.append(dt[1].strip())
for i,col in enumerate(reader2):
if i==0:
continue
if col[0].strip() not in _tempFqdn:
csv_output.writerow(col)
import csv
data = {} # creating dictionary to store the data
with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
data[col[1]] = col[1] # stores the data from column 0 to column 1 in the data list
with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
'MHT_3'])
csv_output.writerow(fieldnames) # prints header to the output file
for col in reader2:
if col[0] not in data: # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
col = [col[0]]
csv_output.writerow(col) # writes all the data that is matched in CMDB WLC Extract
So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.
I have two files in this format
1.txt
what i want to do is to merge both these files by considering the first column and append the output as following
expected output
my script i have written is not working
file1=raw_input('Enter the first file name: ')
file2=raw_input('Enter the second file name: ')
with open(file1, 'r') as f1:
with open(file2, 'r') as f2:
mydict = {}
for row in f1:
mydict[row[0]] = row[1:]
for row in f2:
mydict[row[0]] = mydict[row[0]].extend(row[1:])
fout = csv.write(open('out.txt','w'))
for k,v in mydict:
fout.write([k]+v)
Your script doesn't work because you have made a few inaccuraces.
row is a string, so row[0] is the first character, not the first number.
The method .extend returns nothing, so it doesn't make a sense to use =.
I would fix your script in this way:
import csv
mydict = {}
with open('1.csv') as f:
reader = csv.reader(f)
for row in reader:
mydict[row[0]] = row[1:]
with open('2.csv') as f:
reader = csv.reader(f)
with open('out.csv', 'w') as fout:
writer = csv.writer(fout)
for row in reader:
new_row = row + mydict[row[0]]
writer.writerow(new_row)
The following approach should work:
import csv
d_1 = {}
with open('1.csv') as f_1:
for row in csv.reader(f_1):
d_1[row[0]] = row[4:]
with open('2.csv') as f_2, open('out.csv', 'wb') as f_out:
csv_out = csv.writer(f_out)
for row in csv.reader(f_2):
if row[0] in d_1:
row.extend(d_1[row[0]])
csv_out.writerow(row)
This first reads 1.csv into a dictionary, leaving out the first three columns. It then reads each entry in 2.csv, and if the first column matches an entry in the dictionary, it appends the result before writing to the output.
Note: Entries present in 1.csv but not in 2.csv will be ignored. Secondly, entries in 2.csv which are not in 1.csv are written unchanged.
This gives you an out.csv file as follows:
223456,233,334,334,45,667,445,6667,77798,881,2234,44556,3333,22334,44555,22233,22334,22222,22334,2234,2233,222,55,666666
333883,445,445,4445,44,556,555,333,44445,5556,5555,223,334,5566,334,445,667,334,556,776,45,2223,3334,4444
For Python 2.6, split the with onto two lines as follows:
import csv
d_1 = {}
with open('1.csv') as f_1:
for row in csv.reader(f_1):
d_1[row[0]] = row[4:]
with open('2.csv') as f_2:
with open('out.csv', 'wb') as f_out:
csv_out = csv.writer(f_out)
for row in csv.reader(f_2):
if row[0] in d_1:
row.extend(d_1[row[0]])
csv_out.writerow(row)
file1=raw_input('Enter the first file name: ')
file2=raw_input('Enter the second file name: ')
with open(file1, 'r') as f1:
r1 = f1.read()
with open(file2, 'r') as f2:
r2 = f2.read()
with open('out.txt','w') as o2:
o2.write('{0},{1}'.format(r1, r2))
I open a file and read it with csv.DictReader. I iterate over it twice, but the second time nothing is printed. Why is this, and how can I make it work?
with open('MySpreadsheet.csv', 'rU') as wb:
reader = csv.DictReader(wb, dialect=csv.excel)
for row in reader:
print row
for row in reader:
print 'XXXXX'
# XXXXX is not printed
You read the entire file the first time you iterated, so there is nothing left to read the second time. Since you don't appear to be using the csv data the second time, it would be simpler to count the number of rows and just iterate over that range the second time.
import csv
from itertools import count
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
row_count = count(1)
for row in reader:
next(count)
print(row)
for i in range(row_count):
print('Stack Overflow')
If you need to iterate over the raw csv data again, it's simple to open the file again. Most likely, you should be iterating over some data you stored the first time, rather than reading the file again.
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print(row)
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print('Stack Overflow')
If you don't want to open the file again, you can seek to the beginning, skip the header, and iterate again.
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print(row)
f.seek(0)
next(reader)
for row in reader:
print('Stack Overflow')
You can create a list of dictionaries, each dictionary representing a row in your file, and then count the length of the list, or use list indexing to print each dictionary item.
Something like:
with open('YourCsv.csv') as csvfile:
reader = csv.DictReader(csvfile)
rowslist = list(reader)
for i in range(len(rowslist))
print(rowslist[i])
add a wb.seek(0) (goes back to the start of the file) and next(reader) (skips the header row) before your second loop.
You can try store the dict in list and output
input_csv = []
with open('YourCsv.csv', 'r', encoding='UTF-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
input_csv.append(row)
for row in input_csv:
print(row)
for row in input_csv:
print(row)
with open("test.txt", "r") as test:
reader = csv.reader(test, delimiter="\t")
writer = csv.writer(table, delimiter="\t")
for row in reader:
for field in row:
if field not in keywords:
writer.writerow(row)
break
It seems that this code writes out every row multiple times. I guess that it looks up every single field in each column. How can I specify a single column?
So this is the code I am using right now and it seems that it misses a few rows where the keyword is not present in any column.
table = open("table.txt", "w")
with open("test.txt", "r") as test:
reader = csv.reader(test, delimiter="\t")
writer = csv.writer(table, delimiter="\t")
for row in reader:
if all(field not in keywords for field in row):
writer.writerow(row)
You can use zip to get your columns then.You can use a generator expression within all function for checking that all the elements mett the condition :
with open("test.txt", "r") as Spenn,open("test.txt", "r") as table:
reader = zip(*csv.reader(Spenn, delimiter="\t"))
writer = csv.writer(table, delimiter="\t")
for row in reader:
if all(field not in keywords for field in row):
writer.writerow(row)
But if you just want to write the rows that meet the condition you can use the following code :
with open("test.txt", "r") as Spenn,open("test.txt", "r") as table:
reader = csv.reader(Spenn, delimiter="\t")
writer = csv.writer(table, delimiter="\t")
for row in reader:
if all(field not in keywords for field in row):
writer.writerow(row)