This is an example of my code. It is not the whole code, it is just the part where I am having trouble. Does anyone understand why it prints like this rather than the full numbers, like 104.0 and 96.0? They are strings, but it will not allow me to convert it to a float because the period in some of the digits..
with open('file.csv','w') as file:
with open('file2.csv', 'r') as file2:
reader = csv.DictReader(file2)
file.write(','.join(row))
file.write('\n')
for num,row in enumerate(reader):
outrow = []
for x in row['numbers']:
print(x)
When I execute this, it prints out the values I am looking for but separately like this:
1
0
4
.
0
9
6
.
0
N
a
N
1
3
6
.
0
N
a
N
6
2
.
0
The 'NaN' are values I am changing, but the rest of the numbers I have to use. I cannot insert them into a list because they will end up separated right?
Seems like you want something like:
with open('file.csv','w') as file:
with open('file2.csv', 'r') as file2:
reader = csv.DictReader(file2)
file.write(','.join(row))
file.write('\n')
for num,row in enumerate(reader):
number = row['numbers']
print(number)
for x in row['numbers'] means, "Iterate over every individual character in the numbers cell/vallue".
Also, what are you doing here?
file.write(','.join(row))
file.write('\n')
You don't have a row variable/object at that point (at least not visible in your example). Are you trying to write the header? Presumably it's working, so you defined row before, maybe like, row = ['col1', 'numbers']
If so, maybe take this general approach:
import csv
# Do your reading and processing in one step
rows = []
with open('input.csv', newline='') as f:
reader = csv.DictReader(f)
for row in reader:
# do some work on row, like...
number = row['numbers']
if row['numbers'] == 'NaN':
row['numbers'] = '-1' # whatever you do with NaN
rows.append(row)
# Do your writing in another step
my_field_names = rows[0].keys()
with open('output.csv', 'w', newline='') as f:
# Use the provided writer, in addition to reader
writer = csv.DictWriter(f, fieldnames=my_field_names)
writer.writeheader()
writer.writerows(rows)
At the very least, use the provide writer and DictWriter classes, they will make your life much easier.
I mocked up this sample CSV:
input.csv
id,numbers
id_1,100.4
id2,NaN
id3,23
and the program above produced this:
output.csv
id,numbers
id_1,100.4
id2,-1
id3,23
Related
I have raw data.
I want to split this into csv/excel.
after that if the data in the rows are not correctly stored( for e.g. if 0 is there entered instead of 121324) I want python to identify those rows.
I mean while splitting raw data into csv through python code, some rows might form incorrectly( please understand).
How to identify those rows through python?
example:
S.11* N. ENGLAND L -8' 21-23 u44'\n
S.18 TAMPA BAY W -7 40-7 u49'\n
S.25 Buffalo L -4' 18-33 o48
result i want:
S,11,*,N.,ENGLAND,L,-8',21-23,u44'\n
S,18,,TAMPA,BAY,W,-7,40-7,u49'\n
S,25,,Buffalo,L,-4',18-33,o48\n
suppose the output is like this:
S,11,N.,ENGLAND,L,-8',21-23u,44'\n
S,18,,TAMPA,BAY,W,-7,40-7,u49'\n
S,25,,Buffalo,L,-4',18-33,o48\n
you can see that in first row * is missing and u44' is stored as only 44. and u is append with another column.
this row should be identified by python code and should return me this row.
likewise i want all rows those with error.
this is what i have done so far.
import csv
input_filename = 'rawsample.txt'
output_filename = 'spreads.csv'
with open(input_filename, 'r', newline='') as infile:
open(output_filename, 'w', newline='') as outfile:
reader = csv.reader(infile, delimiter=' ', skipinitialspace=True)
writer = csv.writer(outfile, delimiter=',')
for row in reader:
new_cols = row[0].split('.')
if not new_cols[1].endswith('*'):
new_cols.extend([''])
else:
new_cols[1] = new_cols[1][:-1]
new_cols.extend(['*'])
row = new_cols + row[1:]
#print(row)
writer.writerow(row)
er=[]
for index, row in df.iterrows():
for i in row:
if str(i).lower()=='nan' or i=='':
er.append(row)
# i was able to check for null values but nothing more.
please help me.
#mozway is right you better give an example input and expected result.
Anyway if you're dealing with a variable number of columns in the input please refer to Handling Variable Number of Columns with Pandas - Python
Best
When using different values in range loop, I am receiving different output to file. I think should not be like that. This is the initial content of data.csv:
a1,b1,c1
a2,b2,c2
a3,b3,c3
a4,b4,c4
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
This is the script (run on initial content):
import csv
csv_f = open("data.csv","r+", newline='')
csv_w = csv.writer(csv_f)
for x in range(1,4):
csv_w.writerow(["e"+str(x)]+["f"+str(x)]+["g"+str(x)])
csv_f.flush()
csv_f.close()
Output:
e1,f1,g1
e2,f2,g2
e3,f3,g3
a4,b4,c4
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
Result as expected.
This is the modified script (range values only) (run on initial content):
import csv
csv_f = open("data.csv","r+", newline='')
csv_w = csv.writer(csv_f)
for x in range(12,15):
csv_w.writerow(["e"+str(x)]+["f"+str(x)]+["g"+str(x)])
csv_f.flush()
csv_f.close()
Output:
e12,f12,g12
e13,f13,g13
e14,f14,g14
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
Output is not as expected, a row is lost.
If you play with range values it is doing something that you would not expect with output.
It definitely shouldn't do it. What is the reason? I know there are better ways to open CSV files, but I would like to know what is the problem here. Another example below run on initial content as well.
import csv
csv_f = open("data.csv","r+", newline='')
csv_w = csv.writer(csv_f)
for x in range(13,17):
csv_w.writerow(["e"+str(x)]+["f"+str(x)]+["g"+str(x)])
csv_f.flush()
csv_f.close()
Output:
e13,f13,g13
e14,f14,g14
e15,f15,g15
e16,f16,g16
,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
Again the output is not as expected.
You read in the csv into a buffer with open("data.csv","r+", newline='').
This buffer is just a stream of chars. You can imagine it as one string: a1,b1,c1\r\na2,b2,c2\r\na3,b3,c3\r\na4,b4,c4
Now you start writing lines to the start of your buffer. First e.g. e12,f12,g12\r\n
So it writes the e to the first position replacing the a
1 replacing 1
2 replacing ,
Now you can see what this leads too. Because with double-digit numbers you have more chars then before the write behavior is not as you expect.
Alternative solution:
So to achieve the behavior that I think you wanted you could read in the old lines:
rows = []
with open('data.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
rows.append(row)
Then you could replace the rows that you want:
In your case you wanted to replace the first four rows:
for x in range(0,4):
offset = 12
rows[x] = ["e"+str(x+offset)]+["f"+str(x+offset)]+["g"+str(x+offset)]
Then you can just specify an offset for the actual content-indice you want (12-15 or whatever)
In the end you just write the rows back:
with open('data.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for x in range(0, len(rows)):
writer.writerow(rows[x])
This leads to the following output:
e12,f12,g12
e13,f13,g13
e14,f14,g14
e15,f15,g15
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
The content of the csv is as follows:
"Washington-Arlington-Al, DC-VA-MD-WV (MSAD)" 47894 1976
"Grand-Forks, ND-MN" 24220 2006
"Abilene, TX" 10180 1977
The output required is read through the csv, find the content between ""
in column 1 and fetch only DC-VA-MD-WV , ND-MN , TX and
put this content in a new column. (For Normalization)
So far tried a lot of regex patterns in python, but could not get the right one.
sample=""" "Washington-Arlington-Al, DC-VA-MD-WV (MSAD)",47894,1976
"Grand-Forks, ND-MN",24220,2006
"Abilene, TX",10180,1977 """
open('sample.csv','w').write(sample)
with open('sample.csv') as sample, open('output.csv','w') as output:
reader = csv.reader(sample)
writer = csv.writer(output)
for comsplit in row[0].split(','):
writer.writerow([ comsplit, row[1]])
print open('output.csv').read()
Output Expected is:
DC-VA-MD-WV
ND-MN
TX
in a new row
There is no need to use regex here provided a couple of things:
The city (?) always has a comma after it followed by 1 space of whitespace (though I could add a modification to accept more than 1 bit of whitespace if needed)
There is a space after your letter sequence before encountering something like (MSAD).
This code gives your expected output against the sample input:
with open('sample.csv', 'r') as infile, open('expected_output.csv', 'wb') as outfile:
reader = csv.reader(infile)
expected_output = []
for row in reader:
split_by_comma = row[0].split(',')[1]
split_by_space = split_by_comma.split(' ')[1]
print split_by_space
expected_output.append([split_by_space])
writer = csv.writer(outfile)
writer.writerows(expected_output)
I'd do it like this:
with open('csv_file.csv', 'r') as f_in, open('output.csv', 'w') as f_out:
csv_reader = csv.reader(f_in, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, skipinitialspace=True)
csv_writer = csv.writer(f_out)
new_csv_list = []
for row in csv_reader:
first_entry = row[0].strip('"')
relevant_info= first_entry.split(',')[1].split(' ')[0]
row += [relevant_info]
new_csv_list += [row]
for row in new_csv_list:
csv_writer.writerow(row)
Let me know if you have any questions.
I believe you could use this regex pattern, which will extract any alphanumeric expression (with hyphen or not) between a comma and a parenthesis:
import re
BETWEEN_COMMA_PAR = re.compile(ur',\s+([\w-]+)\s+\(')
test_str = 'Washington-Arlington-Al, DC-VA-MD-WV (MSAD)'
result = BETWEEN_COMMA_PAR.search(test_str)
if result != None:
print result.group(1)
This will print as a result: DC-VA-MD-WV, as expected.
It seems that you are having troubles finding the right regex to use for finding the expected values.
I have created a small sample pythext which will satisfy your requirement.
Basically, when you check the content of every value of the first column, you could use a regex like /(TX|ND-MN|DC-VA-MD-WV)/
I hope this was useful! Let me know if you need further explanations.
I have a csv file created with 6 rows, 1 column (header and 5 numbers). I want to be able to do a conversion, say from centimeters to inches, and save it in a new csv with a new header.
So far I have only been able to import the csv and read it, and print it (using print row), but how can I do the conversion? Since the numbers are saved in the csv, would I have to convert the numbers to float and then write them to a new csv? I only have 5 numbers as I want to be able to just figure out the correct code, but I will use this for a lot of numbers.
I wasn't sure where the computation would be placed either. Help please! Also, this isn't homework or the like. Im just doing this for fun.
This is the code I currently have:
import csv
with open('test.csv', 'rb') as f:
reader = csv.reader(f)
next(reader, None) # I did this to skip the header I labelled Centimeters
with open('test1.csv', 'wb') as o:
writer = csv.writer(o)
for row in reader
f.close()
o.close()
I guess I dont know how to convert the number in the rows to float and then output the values. I want to just be able to multiply the number in the row by 0.393701 so that in the new csv the header is labelled inches with the output beneath in the rows.
This should work, assuming a single column (for multiple columns the handling would differ some to output all the values, but the general concept would be the same):
import csv
with open('test.csv', 'rb') as f, open('test1.csv', 'wb') as o:
reader = csv.reader(f)
writer = csv.writer(o)
# skip the header
next(reader, None)
# print the new header
writer.writerow(['inches'])
for row in reader:
newVal = float(row[0]) * 0.393701
writer.writerow([newVal])
import csv
float_rows=[]
with open('test.csv', 'rb') as f:
reader = csv.reader(f)
next(reader, None) # I did this to skip the header I labelled Centimeters
for row in reader:
comp = [ x * 0.393701 for x in map(float,row)] # do calculations and map elements to float
float_rows.append(comp)
with open('test1.csv', 'wb') as o:
writer = csv.writer(o)
writer.writerows(float_rows) # write all computed data to new csv
No close needed, with closes the files automatically.
Using map(float,iterable) is the same as [float(x) for x in my_iterable]
I am new at handling csv files with python and I want to write code that allows me to do the following: I have a pattern as:
pattern="3-5;7;10-16"(which may vary)
and I want to delete (in that case) rows 3 to 5 , 7 and 10 to 16
does any one have an idea how to do that?
You cannot simply delete lines from a csv. Instead, you have to read it in and then write it back with the accepted values. The following code works:
import csv
pattern="3-5;7;10-16"
off = []
for i in pattern.split(';'):
if '-' in i:
off += range(int(i.split('-')[0]),int(i.split('-')[1])+1)
else:
off += [int(i)]
with open('test.txt') as f:
reader = csv.reader(f)
reader = [','.join(item) for i,item in enumerate(reader) if i+1 not in off]
print reader
with open('input.txt', 'w') as f2:
for i in reader:
f2.write(i+'\n')