Searching a specific columns of a table for not matching items - python

with open("test.txt", "r") as test:
reader = csv.reader(test, delimiter="\t")
writer = csv.writer(table, delimiter="\t")
for row in reader:
for field in row:
if field not in keywords:
writer.writerow(row)
break
It seems that this code writes out every row multiple times. I guess that it looks up every single field in each column. How can I specify a single column?
So this is the code I am using right now and it seems that it misses a few rows where the keyword is not present in any column.
table = open("table.txt", "w")
with open("test.txt", "r") as test:
reader = csv.reader(test, delimiter="\t")
writer = csv.writer(table, delimiter="\t")
for row in reader:
if all(field not in keywords for field in row):
writer.writerow(row)

You can use zip to get your columns then.You can use a generator expression within all function for checking that all the elements mett the condition :
with open("test.txt", "r") as Spenn,open("test.txt", "r") as table:
reader = zip(*csv.reader(Spenn, delimiter="\t"))
writer = csv.writer(table, delimiter="\t")
for row in reader:
if all(field not in keywords for field in row):
writer.writerow(row)
But if you just want to write the rows that meet the condition you can use the following code :
with open("test.txt", "r") as Spenn,open("test.txt", "r") as table:
reader = csv.reader(Spenn, delimiter="\t")
writer = csv.writer(table, delimiter="\t")
for row in reader:
if all(field not in keywords for field in row):
writer.writerow(row)

Related

Adding/appending additional information in a column for a csv file

EDIT:
I need to store/add/append additional information in a specific column in a csv file with out using csv.DictReader.
If I wanted to skip a row in a column and it was empty, what do I need to do for it?
For example:
Sample csv file:
$ cat file.csv
"A","B","C","D","E"
"a1","b1","c1","d1","e1"
"a2","b2","c2","d2","e2"
"a2","b2","c2",,"e2"
Code:
sample = ['dx;dy']
with(openfile.csv, "r") as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
headers = next(reader)
for row in reader:
#sample.append(to the column D)
The Output should look like this:
$ cat file.csv
"A","B","C","D","E"
"a1","b1","c1","d1;dx;dy","e1"
"a2","b2","c2","d2;dx;dy","e2"
"a2","b2","c2",,"e2"
Since you know the header of the column you want to append to, you can find its index in the headers row, and then modify that element of each row.
append_to_column = 'D'
separator = ';'
sample = ['dx;dy']
with open('file.csv', "r") as csvfile, open("outfile.csv", "w") as outfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
headers = next(reader)
writer = csv.writer(outfile, delimiter=',', quotechar='"')
col_index = headers.index(append_to_column)
for row in reader:
value = row[col_index]
new_value = value + separator + sample[0]
row[col_index] = new_value
writer.writerow(row)
Which gives:
A,B,C,D,E
a1,b1,c1,d1;dx;dy,e1
a2,b2,c2,d2;dx;dy,e2
Note that this file doesn't have quotes because they aren't required, since the fields don't contain any commas. If you want to force the csv.writer to write quotes, you can add the quoting=csv.QUOTE_ALL argument to the csv.writer() call, like so: writer = csv.writer(outfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
Then, you'll get:
"A","B","C","D","E"
"a1","b1","c1","d1;dx;dy","e1"
"a2","b2","c2","d2;dx;dy","e2"

Insert next line in For loop

I'd like to have a next line inside my for loop, currently, what happens is that since it is inside a for loop, all the data is stored in an array and once I write it at the end of my code, it prints as one line.
fields = []
new_rows_list = []
file1 = open('CSV_sample_file.csv','rb')
reader = csv.reader(file1)
fields = reader.next()
for row in reader:
for column in row:
cellValue = column
new_row = find_and_mask_cc(cellValue)
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerow(new_rows_list)
file2.close()
What I am getting is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111,EMSPDGCAJAPIN,test2,511111XXXXXX1111,EMSPDGNCRETES,test3,611111XXXXXX1111
My expected output is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111
EMSPDGCAJAPIN,test2,511111XXXXXX111
EMSPDGNCRETES,test3,611111XXXXXX1111
You're appending all columns to the same list new_rows_list and writing it as one row with writer.writerow(new_rows_list).
You can make new_rows_list a list of lists and use writer.writerows for output instead:
...
for row in reader:
new_row = []
for column in row:
cellValue = column
new_row.append(find_and_mask_cc(cellValue))
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerows(new_rows_list)
...
Alternatively, you can pass to writerows a generator expression that iterates through reader to write each row with columns converted by find_and_mask_cc to the output as you read it from the input, so it won't require reading the entire input into memory:
with open('CSV_sample_file.csv') as file1, open('CSV_sample_file2.csv', 'w', newline='') as file2:
reader = csv.reader(file1)
writer = csv.writer(file2)
writer.writerow(next(reader))
writer.writerows(map(find_and_mask_cc, row) for row in reader)
Demo: https://repl.it/repls/SatisfiedSardonicExponents

Loop over rows of csv.DictReader more than once

I open a file and read it with csv.DictReader. I iterate over it twice, but the second time nothing is printed. Why is this, and how can I make it work?
with open('MySpreadsheet.csv', 'rU') as wb:
reader = csv.DictReader(wb, dialect=csv.excel)
for row in reader:
print row
for row in reader:
print 'XXXXX'
# XXXXX is not printed
You read the entire file the first time you iterated, so there is nothing left to read the second time. Since you don't appear to be using the csv data the second time, it would be simpler to count the number of rows and just iterate over that range the second time.
import csv
from itertools import count
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
row_count = count(1)
for row in reader:
next(count)
print(row)
for i in range(row_count):
print('Stack Overflow')
If you need to iterate over the raw csv data again, it's simple to open the file again. Most likely, you should be iterating over some data you stored the first time, rather than reading the file again.
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print(row)
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print('Stack Overflow')
If you don't want to open the file again, you can seek to the beginning, skip the header, and iterate again.
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print(row)
f.seek(0)
next(reader)
for row in reader:
print('Stack Overflow')
You can create a list of dictionaries, each dictionary representing a row in your file, and then count the length of the list, or use list indexing to print each dictionary item.
Something like:
with open('YourCsv.csv') as csvfile:
reader = csv.DictReader(csvfile)
rowslist = list(reader)
for i in range(len(rowslist))
print(rowslist[i])
add a wb.seek(0) (goes back to the start of the file) and next(reader) (skips the header row) before your second loop.
You can try store the dict in list and output
input_csv = []
with open('YourCsv.csv', 'r', encoding='UTF-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
input_csv.append(row)
for row in input_csv:
print(row)
for row in input_csv:
print(row)

Printing out csv rows apart from first row

I want to read a CSV file in Python, and then print out every row apart from the first row.
I know how to print out all the rows:
with open('myfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
print row
And the only way I can think of not printing out the first row is:
with open('myfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for i, row in enumerate(reader):
if i != 0:
print row
But this doesn't seem very elegant. Any other solutions?
csv reader objects are iterators, which means you can skip single entries using next():
with open('myfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
next(reader) # just ignore the result
for row in reader:
print row

How to read one single line of csv data in Python?

There is a lot of examples of reading csv data using python, like this one:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
I only want to read one line of data and enter it into various variables. How do I do that? I've looked everywhere for a working example.
My code only retrieves the value for i, and none of the other values
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
i = int(row[0])
a1 = int(row[1])
b1 = int(row[2])
c1 = int(row[2])
x1 = int(row[2])
y1 = int(row[2])
z1 = int(row[2])
To read only the first row of the csv file use next() on the reader object.
with open('some.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
# now do something here
# if first row is the header, then you can do one more next() to get the next row:
# row2 = next(f)
or :
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
# do something here with `row`
break
you could get just the first row like:
with open('some.csv', newline='') as f:
csv_reader = csv.reader(f)
csv_headings = next(csv_reader)
first_line = next(csv_reader)
You can use Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
Just for reference, a for loop can be used after getting the first row to get the rest of the file:
with open('file.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
for row in reader:
print(row) # prints rows 2 and onward
From the Python documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just drop your string data into a singleton list.
The simple way to get any row in csv file
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(csvFileArray[0])
To print a range of line, in this case from line 4 to 7
import csv
with open('california_housing_test.csv') as csv_file:
data = csv.reader(csv_file)
for row in list(data)[4:7]:
print(row)
I think the simplest way is the best way, and in this case (and in most others) is one without using external libraries (pandas) or modules (csv). So, here is the simple answer.
""" no need to give any mode, keep it simple """
with open('some.csv') as f:
""" store in a variable to be used later """
my_line = f.nextline()
""" do what you like with 'my_line' now """

Categories