Hello I have a csv file and I need to remove the zero's with python:
Column 6, column 5 in python is defaulted to 7 digits. with this
AFI12001,01,C-,201405,P,0000430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,0001550,2,0.03500000,US,30.0000
I need to remove the zeros in front then I need to add a zero or zeros to make sure it has 4 digits total
so I would need it to look like this:
AFI12001,01,C-,201405,P,0430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,1550,2,0.03500000,US,30.0000
This code adds the zero's
import csv
new_rows = []
with open('csvpatpos.csv','r') as f:
csv_f = csv.reader(f)
for row in csv_f:
new_row = ""
col = 0
print row
for x in row:
col = col + 1
if col == 6:
if len(x) == 3:
x = "0" + x
new_row = new_row + x + ","
print new_row
However, I'm having trouble removing the zeros in front.
Convert the column to an int then back to a string in whatever format you want.
row[5] = "%04d" % int(row[5])
You could probably do this in several steps with .lstrip(), then finding the resulting string length, then adding on 4-len(s) 0s to the front. However, I think it's easier with regex.
with open('infilename', 'r') as infile:
reader = csv.reader(infile)
for row in reader:
stripped_value = re.sub(r'^0{3}', '', row[5])
Yields
0430
1550
In the regex, we are using the format sub(pattern, substitute, original). The pattern breakdown is:
'^' - match start of string
'0{3}' - match 3 zeros
You said all the strings in the 6th column have 7 digits, and you want 4, so replace the first 3 with an empty string.
Edit: If you want to replace the rows, I would just write it out to a new file:
with open('infilename', 'r') as infile, open('outfilename', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
row[5] = re.sub(r'^0{3}', '', row[5])
writer.writerow(row)
Edit2: In light of your newest requests, I would recommend doing the following:
with open('infilename', 'r') as infile, open('outfilename', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
# strip all 0's from the front
stripped_value = re.sub(r'^0+', '', row[5])
# pad zeros on the left to smaller numbers to make them 4 digits
row[5] = '%04d'%int(stripped_value)
writer.writerow(row)
Given the following numbers,
['0000430', '0001550', '0013300', '0012900', '0100000', '0001000']
this yields
['0430', '1550', '13300', '12900', '100000', '1000']
You can use lstrip() and zfill() methods. Like this:
with open('input') as in_file:
csv_reader = csv.reader(in_file)
for row in csv_reader:
stripped_data = row[5].lstrip('0')
new_data = stripped_data.zfill(4)
print new_data
This prints:
0430
1550
The line:
stripped_data = row[5].lstrip('0')
gets rid of all the zeros on the left. And the line:
new_data = stripped_data.zfill(4)
fills the front with zeros such that the total number of digits are 4.
Hope this helps.
You can keep last 4 chars
columns[5] = columns[5][-4:]
example
data = '''AFI12001,01,C-,201405,P,0000430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,0001550,2,0.03500000,US,30.0000'''
for row in data.splitlines():
columns = row.split(',')
columns[5] = columns[5][-4:]
print ','.join(columns)
result
AFI12001,01,C-,201405,P,0430,2,0.02125000,US,60.0000
AFI12001,01,S-,201404,C,1550,2,0.03500000,US,30.0000
EDIT:
code with csv module - not data to simulate file.
import csv
with open('csvpatpos.csv','r') as f:
csv_f = csv.reader(f)
for row in csv_f:
row[5] = row[5][-4:]
print row[5] # print one element
#print ','.join(row) # print full row
print row # print full row
Related
at_set = {'Num1', 'Num2', 'Num3'}
for files in os.listdir(zipped_trots_files):
zipped_path = os.path.join(zipped_trots_files, files)
with open(zipped_path, 'r') as output:
reader = csv.reader(output, delimiter = '\t')
for row in reader:
read = [row for row in reader if row]
for row in read:
if set(row).intersection(at_set):
print(row)
I guess i'm using the intersection function wrong...can someone see it? I'm trying to print only the rows who contain either Num1, Num2 or Num3
When I do print I receive nothing...
there are duplicated iterations. You need to remove the excessive iterations or go back to the beginning of reader by calling output.seek(0).
at_set = {'Num1', 'Num2', 'Num3'}
for files in os.listdir(zipped_trots_files):
zipped_path = os.path.join(zipped_trots_files, files)
with open(zipped_path, 'r') as output:
reader = csv.reader(output, delimiter = '\t')
for row in reader:
if row and set(row).intersection(at_set):
print(row)
I want to read from the pList.csv file and write all item in a string, such that each row is separated by a comma.
the file has only one column. for example, the pList.csv is :
28469977
24446384
25968054
and output string must be:
28469977,24446384,25968054
for do this, have considered the following code. but there is a little problem
p_list = ""
with open("pList.csv", mode="r") as infile:
reader = csv.reader(infile)
for row in reader:
p_list += row[0]
if its_not_last_loop :
p_list += ","
What expression is appropriate for its_not_last_loop so that , is not applied to the last row of the file?
Try this:
with open("pList.csv", mode="r") as infile:
reader = csv.reader(infile)
out_list = []
for row in reader:
out_list.append(row[0]) #row[0] get value for the sample input
p_list = ",".join(out_list)
print(p_list)
See What exactly does the .join() method do?
This can be shortened to (and is faster)
with open("pList.csv", mode="r") as infile:
reader = csv.reader(infile)
p_list = ",".join(row[0] for row in reader)
print(p_list)
the following code stops at the first row of the data1 file.
Instead, it should go throught all the values of the 2nd colomn of data1 and look if the value is in the range of colomn 1 and 2 of each row of data 2
with open('data1.csv', 'r') as f:
reader1 = csv.reader(f, delimiter=';')
with open('data2.csv', 'r') as d:
reader2 = csv.reader(d, delimiter=';')
for row in reader1:
for line in reader2:
if (row[0] == line[1]) and (line[2] <= row[1] <= line[3]):
print(line[0] + ' ' + row[1])
reader1 and reader2 only go through the file once. So when you read through all of reader2 to check the first row reader1 it gets all used up. When you try to check another row from reader1, there are no more rows to read from reader2.
A naive fix would be to put d.seek(0) before the line for line in reader2: which would reset the file pointer back to the beginning of the file. Don't do this because it is a very slow way to process your files.
A better way would be to store the lines of reader2 in such a way that you only iterate over lines that are likely to be matches. Since one of your criteria is row[0] == line[1] I have cached the lines in reader2 by line[1]. (I have kept your convention of naming each row in reader1 row and each row in reader2 line.)
from collections import defaultdict
reader2_by_item1 = defaultdict(list)
with open('data1.csv', 'r') as f, open('data2.csv', 'r') as d:
reader1 = csv.reader(f, delimiter=';')
reader2 = csv.reader(d, delimiter=';')
for line in reader2:
reader2_by_item1[line[1]].append(line)
for row in reader1:
for line in reader2_by_item1[row[0]]: # this tests row[0] == line[1]
if (line[2] <= row[1] <= line[3]):
print(line[0] + ' ' + row[1])
Note: line[2] <= row[1] <= line[3] This comparison is lexicographical (string comparisions). If you are trying to compare numerical types you would need to convert them to numeric types.
The content of the csv is as follows:
"Washington-Arlington-Al, DC-VA-MD-WV (MSAD)" 47894 1976
"Grand-Forks, ND-MN" 24220 2006
"Abilene, TX" 10180 1977
The output required is read through the csv, find the content between ""
in column 1 and fetch only DC-VA-MD-WV , ND-MN , TX and
put this content in a new column. (For Normalization)
So far tried a lot of regex patterns in python, but could not get the right one.
sample=""" "Washington-Arlington-Al, DC-VA-MD-WV (MSAD)",47894,1976
"Grand-Forks, ND-MN",24220,2006
"Abilene, TX",10180,1977 """
open('sample.csv','w').write(sample)
with open('sample.csv') as sample, open('output.csv','w') as output:
reader = csv.reader(sample)
writer = csv.writer(output)
for comsplit in row[0].split(','):
writer.writerow([ comsplit, row[1]])
print open('output.csv').read()
Output Expected is:
DC-VA-MD-WV
ND-MN
TX
in a new row
There is no need to use regex here provided a couple of things:
The city (?) always has a comma after it followed by 1 space of whitespace (though I could add a modification to accept more than 1 bit of whitespace if needed)
There is a space after your letter sequence before encountering something like (MSAD).
This code gives your expected output against the sample input:
with open('sample.csv', 'r') as infile, open('expected_output.csv', 'wb') as outfile:
reader = csv.reader(infile)
expected_output = []
for row in reader:
split_by_comma = row[0].split(',')[1]
split_by_space = split_by_comma.split(' ')[1]
print split_by_space
expected_output.append([split_by_space])
writer = csv.writer(outfile)
writer.writerows(expected_output)
I'd do it like this:
with open('csv_file.csv', 'r') as f_in, open('output.csv', 'w') as f_out:
csv_reader = csv.reader(f_in, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, skipinitialspace=True)
csv_writer = csv.writer(f_out)
new_csv_list = []
for row in csv_reader:
first_entry = row[0].strip('"')
relevant_info= first_entry.split(',')[1].split(' ')[0]
row += [relevant_info]
new_csv_list += [row]
for row in new_csv_list:
csv_writer.writerow(row)
Let me know if you have any questions.
I believe you could use this regex pattern, which will extract any alphanumeric expression (with hyphen or not) between a comma and a parenthesis:
import re
BETWEEN_COMMA_PAR = re.compile(ur',\s+([\w-]+)\s+\(')
test_str = 'Washington-Arlington-Al, DC-VA-MD-WV (MSAD)'
result = BETWEEN_COMMA_PAR.search(test_str)
if result != None:
print result.group(1)
This will print as a result: DC-VA-MD-WV, as expected.
It seems that you are having troubles finding the right regex to use for finding the expected values.
I have created a small sample pythext which will satisfy your requirement.
Basically, when you check the content of every value of the first column, you could use a regex like /(TX|ND-MN|DC-VA-MD-WV)/
I hope this was useful! Let me know if you need further explanations.
I have two CSV files. data.csv and data2.csv.
I would like to first of Strip the two data files down to the data I am interested in. I have figured this part out with data.csv. I would then like to compare by row making sure that if a row is missing to add it.
Next I want to look at column 2. If there is a value there then I want to write to column 3 if there is data in column 3 then write to 4, etc.
My current program looks like sow. Need some guidance
Oh and I am using Python V3.4
__author__ = 'krisarmstrong'
#!/usr/bin/python
import csv
searched = ['aircheck', 'linkrunner at', 'onetouch at']
def find_group(row):
"""Return the group index of a row
0 if the row contains searched[0]
1 if the row contains searched[1]
etc
-1 if not found
"""
for col in row:
col = col.lower()
for j, s in enumerate(searched):
if s in col:
return j
return -1
inFile = open('data.csv')
reader = csv.reader(inFile)
inFile2 = open('data2.csv')
reader2 = csv.reader(inFile2)
outFile = open('data3.csv', "w")
writer = csv.writer(outFile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
header = next(reader)
header2 = next(reader2)
"""Built a list of items to sort. If row 12 contains 'LinkRunner AT' (group 1),
one stores a triple (1, 12, row)
When the triples are sorted later, all rows in group 0 will come first, then
all rows in group 1, etc.
"""
stored = []
writer.writerow([header[0], header[3]])
for i, row in enumerate(reader):
g = find_group(row)
if g >= 0:
stored.append((g, i, row))
stored.sort()
for g, i, row in stored:
writer.writerow([row[0], row[3]])
inFile.close()
outFile.close()
Perhaps try:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
col1.append(row[0])
col2.append(row[1])
for i in xrange(len(col1))
if col1[i] == '':
#thing to do if there is nothing for col1
if col2[i] == '':
#thing to do if there is nothing for col2
This is a start at "making sure that if a row is missing to add it".