How to use the set intersection method in python - python

at_set = {'Num1', 'Num2', 'Num3'}
for files in os.listdir(zipped_trots_files):
zipped_path = os.path.join(zipped_trots_files, files)
with open(zipped_path, 'r') as output:
reader = csv.reader(output, delimiter = '\t')
for row in reader:
read = [row for row in reader if row]
for row in read:
if set(row).intersection(at_set):
print(row)
I guess i'm using the intersection function wrong...can someone see it? I'm trying to print only the rows who contain either Num1, Num2 or Num3
When I do print I receive nothing...

there are duplicated iterations. You need to remove the excessive iterations or go back to the beginning of reader by calling output.seek(0).
at_set = {'Num1', 'Num2', 'Num3'}
for files in os.listdir(zipped_trots_files):
zipped_path = os.path.join(zipped_trots_files, files)
with open(zipped_path, 'r') as output:
reader = csv.reader(output, delimiter = '\t')
for row in reader:
if row and set(row).intersection(at_set):
print(row)

Related

Insert next line in For loop

I'd like to have a next line inside my for loop, currently, what happens is that since it is inside a for loop, all the data is stored in an array and once I write it at the end of my code, it prints as one line.
fields = []
new_rows_list = []
file1 = open('CSV_sample_file.csv','rb')
reader = csv.reader(file1)
fields = reader.next()
for row in reader:
for column in row:
cellValue = column
new_row = find_and_mask_cc(cellValue)
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerow(new_rows_list)
file2.close()
What I am getting is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111,EMSPDGCAJAPIN,test2,511111XXXXXX1111,EMSPDGNCRETES,test3,611111XXXXXX1111
My expected output is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111
EMSPDGCAJAPIN,test2,511111XXXXXX111
EMSPDGNCRETES,test3,611111XXXXXX1111
You're appending all columns to the same list new_rows_list and writing it as one row with writer.writerow(new_rows_list).
You can make new_rows_list a list of lists and use writer.writerows for output instead:
...
for row in reader:
new_row = []
for column in row:
cellValue = column
new_row.append(find_and_mask_cc(cellValue))
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerows(new_rows_list)
...
Alternatively, you can pass to writerows a generator expression that iterates through reader to write each row with columns converted by find_and_mask_cc to the output as you read it from the input, so it won't require reading the entire input into memory:
with open('CSV_sample_file.csv') as file1, open('CSV_sample_file2.csv', 'w', newline='') as file2:
reader = csv.reader(file1)
writer = csv.writer(file2)
writer.writerow(next(reader))
writer.writerows(map(find_and_mask_cc, row) for row in reader)
Demo: https://repl.it/repls/SatisfiedSardonicExponents

try to split the row of a csv output

Now I am writing some data into a csv file. I directly write a list to a row of a csv file, like below:
with open("files/data.csv", "wb") as f_csv:
writer = csv.writer(f_csv,delimiter = ',')
writer.writerow(flux_inteplt) ## here flux_inteplt is a list
But when I read the data like below:
with open('files/data.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
parts = row.split(",")
print parts[0]
It has some problem AttributeError: 'list' object has no attribute 'split'
Does anyone has some idea how to approach to this problem?
import csv
with open('us-cities.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
str1 = ''.join(row) #Convert list into string
parts = str1.split(",")
print parts[0]
row is already a list, when you iterate over the reader object you get a list of values split by the delimiter you pass, just use each row:
for row in reader:
print row[0] # first element from each row
If you have comma separated values use delimiter=',' not delimiter=' ', which based on the fact you use csv.writer(f_csv,delimiter = ',') when writing means you have. The delimiter you pass when writing is what is used to delimit each element from your input iterable so when reading you need to use the same delimiter if you want to get the same output.
row is already a list. No need to split (:

stripping the zeros in csv with python

Hello I have a csv file and I need to remove the zero's with python:
Column 6, column 5 in python is defaulted to 7 digits. with this
AFI12001,01,C-,201405,P,0000430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,0001550,2,0.03500000,US,30.0000
I need to remove the zeros in front then I need to add a zero or zeros to make sure it has 4 digits total
so I would need it to look like this:
AFI12001,01,C-,201405,P,0430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,1550,2,0.03500000,US,30.0000
This code adds the zero's
import csv
new_rows = []
with open('csvpatpos.csv','r') as f:
csv_f = csv.reader(f)
for row in csv_f:
new_row = ""
col = 0
print row
for x in row:
col = col + 1
if col == 6:
if len(x) == 3:
x = "0" + x
new_row = new_row + x + ","
print new_row
However, I'm having trouble removing the zeros in front.
Convert the column to an int then back to a string in whatever format you want.
row[5] = "%04d" % int(row[5])
You could probably do this in several steps with .lstrip(), then finding the resulting string length, then adding on 4-len(s) 0s to the front. However, I think it's easier with regex.
with open('infilename', 'r') as infile:
reader = csv.reader(infile)
for row in reader:
stripped_value = re.sub(r'^0{3}', '', row[5])
Yields
0430
1550
In the regex, we are using the format sub(pattern, substitute, original). The pattern breakdown is:
'^' - match start of string
'0{3}' - match 3 zeros
You said all the strings in the 6th column have 7 digits, and you want 4, so replace the first 3 with an empty string.
Edit: If you want to replace the rows, I would just write it out to a new file:
with open('infilename', 'r') as infile, open('outfilename', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
row[5] = re.sub(r'^0{3}', '', row[5])
writer.writerow(row)
Edit2: In light of your newest requests, I would recommend doing the following:
with open('infilename', 'r') as infile, open('outfilename', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
# strip all 0's from the front
stripped_value = re.sub(r'^0+', '', row[5])
# pad zeros on the left to smaller numbers to make them 4 digits
row[5] = '%04d'%int(stripped_value)
writer.writerow(row)
Given the following numbers,
['0000430', '0001550', '0013300', '0012900', '0100000', '0001000']
this yields
['0430', '1550', '13300', '12900', '100000', '1000']
You can use lstrip() and zfill() methods. Like this:
with open('input') as in_file:
csv_reader = csv.reader(in_file)
for row in csv_reader:
stripped_data = row[5].lstrip('0')
new_data = stripped_data.zfill(4)
print new_data
This prints:
0430
1550
The line:
stripped_data = row[5].lstrip('0')
gets rid of all the zeros on the left. And the line:
new_data = stripped_data.zfill(4)
fills the front with zeros such that the total number of digits are 4.
Hope this helps.
You can keep last 4 chars
columns[5] = columns[5][-4:]
example
data = '''AFI12001,01,C-,201405,P,0000430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,0001550,2,0.03500000,US,30.0000'''
for row in data.splitlines():
columns = row.split(',')
columns[5] = columns[5][-4:]
print ','.join(columns)
result
AFI12001,01,C-,201405,P,0430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,1550,2,0.03500000,US,30.0000
EDIT:
code with csv module - not data to simulate file.
import csv
with open('csvpatpos.csv','r') as f:
csv_f = csv.reader(f)
for row in csv_f:
row[5] = row[5][-4:]
print row[5] # print one element
#print ','.join(row) # print full row
print row # print full row

Python not entering in for

Why the unique[1] is never accessed in the second for???
unique is an array of strings.
import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
for i in range(len(unique)):
# print unique[i] #prints all the items in the array
for row in reader:
print unique[i] # always prints the first item unique[0]
if row[1]==unique[i]:
print row[1], row[0] # prints only the unique[0] stuff
Thank you
I think it would be useful to go through the program flow.
First, it will assign i=0, then it will read the entire CSV file, printing unique[0] for each line in the CSV file, then after it finishes reading the CSV file, it will go to the second iteration, assigning i=1, and then since the program has finished reading the file, it won't enter for row in reader:, hence it exits the loop.
Further Clarification
The csv.reader(f) won't actually read the file until you do for row in reader, and after that it has nothing more to read. If you want to read the file multiple times, then read it into a list first beforehand, like this:
import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
rows = [row for row in reader]
for i in range(len(unique)):
for row in rows:
print unique[i]
if row[1]==unique[i]:
print row[1], row[0]
I think you might have better luck if you change your nested structure to:
import csv
res = {}
for x in unique:
res[x] = []
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
for i in range(len(unique)):
# print unique[i] #prints all the items in the array
if row[1]==unique[i]:
res[unique[i]].append([row[1],row[0]])
#print row[1], row[0] # prints only the unique[0] stuff
for x in unique:
print res[x]

How to read one single line of csv data in Python?

There is a lot of examples of reading csv data using python, like this one:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
I only want to read one line of data and enter it into various variables. How do I do that? I've looked everywhere for a working example.
My code only retrieves the value for i, and none of the other values
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
i = int(row[0])
a1 = int(row[1])
b1 = int(row[2])
c1 = int(row[2])
x1 = int(row[2])
y1 = int(row[2])
z1 = int(row[2])
To read only the first row of the csv file use next() on the reader object.
with open('some.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
# now do something here
# if first row is the header, then you can do one more next() to get the next row:
# row2 = next(f)
or :
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
# do something here with `row`
break
you could get just the first row like:
with open('some.csv', newline='') as f:
csv_reader = csv.reader(f)
csv_headings = next(csv_reader)
first_line = next(csv_reader)
You can use Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
Just for reference, a for loop can be used after getting the first row to get the rest of the file:
with open('file.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
for row in reader:
print(row) # prints rows 2 and onward
From the Python documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just drop your string data into a singleton list.
The simple way to get any row in csv file
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(csvFileArray[0])
To print a range of line, in this case from line 4 to 7
import csv
with open('california_housing_test.csv') as csv_file:
data = csv.reader(csv_file)
for row in list(data)[4:7]:
print(row)
I think the simplest way is the best way, and in this case (and in most others) is one without using external libraries (pandas) or modules (csv). So, here is the simple answer.
""" no need to give any mode, keep it simple """
with open('some.csv') as f:
""" store in a variable to be used later """
my_line = f.nextline()
""" do what you like with 'my_line' now """

Categories