delete rows on a csv file depending on pattern python

delete rows on a csv file depending on pattern python - python

I am new at handling csv files with python and I want to write code that allows me to do the following: I have a pattern as:
pattern="3-5;7;10-16"(which may vary)
and I want to delete (in that case) rows 3 to 5 , 7 and 10 to 16
does any one have an idea how to do that?

You cannot simply delete lines from a csv. Instead, you have to read it in and then write it back with the accepted values. The following code works:
import csv
pattern="3-5;7;10-16"
off = []
for i in pattern.split(';'):
if '-' in i:
off += range(int(i.split('-')[0]),int(i.split('-')[1])+1)
else:
off += [int(i)]
with open('test.txt') as f:
reader = csv.reader(f)
reader = [','.join(item) for i,item in enumerate(reader) if i+1 not in off]
print reader
with open('input.txt', 'w') as f2:
for i in reader:
f2.write(i+'\n')

Related

Why are the digits in my numbers printing separately rather than together?

This is an example of my code. It is not the whole code, it is just the part where I am having trouble. Does anyone understand why it prints like this rather than the full numbers, like 104.0 and 96.0? They are strings, but it will not allow me to convert it to a float because the period in some of the digits..
with open('file.csv','w') as file:
with open('file2.csv', 'r') as file2:
reader = csv.DictReader(file2)
file.write(','.join(row))
file.write('\n')
for num,row in enumerate(reader):
outrow = []
for x in row['numbers']:
print(x)
When I execute this, it prints out the values I am looking for but separately like this:
1
0
4
.
0
9
6
.
0
N
a
N
1
3
6
.
0
N
a
N
6
2
.
0
The 'NaN' are values I am changing, but the rest of the numbers I have to use. I cannot insert them into a list because they will end up separated right?

Seems like you want something like:
with open('file.csv','w') as file:
with open('file2.csv', 'r') as file2:
reader = csv.DictReader(file2)
file.write(','.join(row))
file.write('\n')
for num,row in enumerate(reader):
number = row['numbers']
print(number)
for x in row['numbers'] means, "Iterate over every individual character in the numbers cell/vallue".
Also, what are you doing here?
file.write(','.join(row))
file.write('\n')
You don't have a row variable/object at that point (at least not visible in your example). Are you trying to write the header? Presumably it's working, so you defined row before, maybe like, row = ['col1', 'numbers']
If so, maybe take this general approach:
import csv
# Do your reading and processing in one step
rows = []
with open('input.csv', newline='') as f:
reader = csv.DictReader(f)
for row in reader:
# do some work on row, like...
number = row['numbers']
if row['numbers'] == 'NaN':
row['numbers'] = '-1' # whatever you do with NaN
rows.append(row)
# Do your writing in another step
my_field_names = rows[0].keys()
with open('output.csv', 'w', newline='') as f:
# Use the provided writer, in addition to reader
writer = csv.DictWriter(f, fieldnames=my_field_names)
writer.writeheader()
writer.writerows(rows)
At the very least, use the provide writer and DictWriter classes, they will make your life much easier.
I mocked up this sample CSV:
input.csv
id,numbers
id_1,100.4
id2,NaN
id3,23
and the program above produced this:
output.csv
id,numbers
id_1,100.4
id2,-1
id3,23

Python rewriting instead of appending

I have two csv files result.csv and sample.csv.
result.csv
M11251TH1230
M11543TH4292
M11435TDS144
sample.csv
M11435TDS144,STB#1,Router#1
M11543TH4292,STB#2,Router#1
M11509TD9937,STB#3,Router#1
M11543TH4258,STB#4,Router#1
I have a python script which will compare both the files if line in result.csv matches with the first word in the line in sample.csv, then append 1 else append 0 at every line in sample.csv
It should look like M11435TDS144,STB#1,Router#1,1 and M11543TH4258,STB#4,Router#1,0 since M11543TH4258 is not found in result.csv
script.py
import csv
with open('result.csv', 'rb') as f:
reader = csv.reader(f)
result_list = []
for row in reader:
result_list.extend(row)
with open('sample.csv', 'rb') as f:
reader = csv.reader(f)
sample_list = []
for row in reader:
if row[0] in result_list:
sample_list.append(row + [1])
else:
sample_list.append(row + [0])
with open('sample.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(sample_list)
sample output(sample.csv)if I run the script two times
M11435TDS144,STB#1,Router#1,1,1
M11543TH4292,STB#2,Router#1,1,1
M11509TD9937,STB#3,Router#1,0,0
M11543TH4258,STB#4,Router#1,0,0
Every time I run the script, 1's and 0's are being appended in a new column sample.csv. Is there any way every time I run the script, I can replace the appended column instead of increasing columns.

you write to the sample.csv and then you use it as input file, with the additional column. That's why you have more and more 1's and 0's in this file.
Regards, Grzegorz

Use Python to split a CSV file with multiple headers

I have a CSV file that is being constantly appended. It has multiple headers and the only common thing among the headers is that the first column is always "NAME".
How do I split the single CSV file into separate CSV files, one for each header row?
here is a sample file:
"NAME","AGE","SEX","WEIGHT","CITY"
"Bob",20,"M",120,"New York"
"Peter",33,"M",220,"Toronto"
"Mary",43,"F",130,"Miami"
"NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
"Larry","USA","Football",14,"Baseball",22
"Jenny","UK","Rugby",5,"Field Hockey",11
"Jacques","Canada","Hockey",19,"Volleyball",4
"NAME","DRINK","QTY"
"Jesse","Beer",6
"Wendel","Juice",1
"Angela","Milk",3

If the size of the csv files is not huge -- so all can be in memory at once -- just use read() to read the file into a string and then use a regex on this string:
import re
with open(ur_csv) as f:
data=f.read()
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',data,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
If the size of the file is a concern, you can use mmap to create something that looks like a big string but is not all in memory at the same time.
Then use the mmap string with a regex to separate the csv chunks like so:
import mmap
import re
with open(ur_csv) as f:
mf=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',mf,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
In either case, this will write all the chunks in files named 1.csv, 2.csv etc.

Copy the input to a new output file each time you see a header line. Something like this (not checked for errors):
partNum = 1
outHandle = None
for line in open("yourfile.csv","r").readlines():
if line.startswith('"NAME"'):
if outHandle is not None:
outHandle.close()
outHandle = open("part%d.csv" % (partNum,), "w")
partNum += 1
outHandle.write(line)
outHandle.close()
The above will break if the input does not begin with a header line or if the input is empty.

You can use the python csv package to read your source file and write multile csv files based on the rule that if element 0 in your row == "NAME", spawn off a new file. Something like this...
import csv
outfile_name = "out_%.csv"
out_num = 1
with open('nameslist.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
csv_buffer = []
for row in csvreader:
if row[0] != "NAME":
csv_buffer.append(row)
else:
with open(outfile_name % out_num, 'wb') as csvout:
for b_row in csv_buffer:
csvout.writerow(b_row)
out_num += 1
csv_buffer = [row]
P.S. I haven't actually tested this but that's the general concept

Given the other answers, the only modification that I would suggest would be to open using csv.DictReader. pseudo code would be like this. Assuming that the first line in the file is the first header
Note that this assumes that there is no blank line or other indicator between the entries so that a 'NAME' header occurs right after data. If there were a blank line between appended files the you could use that as an indicator to use infile.fieldnames() on the next row. If you need to handle the inputs as a list, then the previous answers are better.
ifile = open(filename, 'rb')
infile = cvs.Dictreader(ifile)
infields = infile.fieldnames
filenum = 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
for row in infile:
if row['NAME'] != 'NAME':
#process this row here and do whatever is needed
else:
close(ofile)
# build infields again from this row
infields = [row["NAME"], ...] # This assumes you know the names & order
# Dict cannot be pulled as a list and keep the order that you want.
filenum += 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
# This is the end of the loop. All data has been read and processed
close(ofile)
close(ifile)
If the exact order of the new header does not matter except for the name in the first entry, then you can transfer the new list as follows:
infileds = [row['NAME']
for k in row.keys():
if k != 'NAME':
infields.append(row[k])
This will create the new header with NAME in entry 0 but the others will not be in any particular order.

How can I get a specific field of a csv file?

I need a way to get a specific item(field) of a CSV. Say I have a CSV with 100 rows and 2 columns (comma seperated). First column emails, second column passwords. For example I want to get the password of the email in row 38. So I need only the item from 2nd column row 38...
Say I have a csv file:
aaaaa#aaa.com,bbbbb
ccccc#ccc.com,ddddd
How can I get only 'ddddd' for example?
I'm new to the language and tried some stuff with the csv module, but I don't get it...

import csv
mycsv = csv.reader(open(myfilepath))
for row in mycsv:
text = row[1]
Following the comments to the SO question here, a best, more robust code would be:
import csv
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[1]
............
Update: If what the OP actually wants is the last string in the last row of the csv file, there are several aproaches that not necesarily needs csv. For example,
fulltxt = open(mifilepath, 'rb').read()
laststring = fulltxt.split(',')[-1]
This is not good for very big files because you load the complete text in memory but could be ok for small files. Note that laststring could include a newline character so strip it before use.
And finally if what the OP wants is the second string in line n (for n=2):
Update 2: This is now the same code than the one in the answer from J.F.Sebastian. (The credit is for him):
import csv
line_number = 2
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
text = mycsv[line_number][1]
............

#!/usr/bin/env python
"""Print a field specified by row, column numbers from given csv file.
USAGE:
%prog csv_filename row_number column_number
"""
import csv
import sys
filename = sys.argv[1]
row_number, column_number = [int(arg, 10)-1 for arg in sys.argv[2:])]
with open(filename, 'rb') as f:
rows = list(csv.reader(f))
print rows[row_number][column_number]
Example
$ python print-csv-field.py input.csv 2 2
ddddd
Note: list(csv.reader(f)) loads the whole file in memory. To avoid that you could use itertools:
import itertools
# ...
with open(filename, 'rb') as f:
row = next(itertools.islice(csv.reader(f), row_number, row_number+1))
print row[column_number]

import csv
def read_cell(x, y):
with open('file.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
print (read_cell(4, 8))
This example prints cell 4, 8 in Python 3.

There is an interesting point you need to catch about csv.reader() object. The csv.reader object is not list type, and not subscriptable.
This works:
for r in csv.reader(file_obj): # file not closed
print r
This does not:
r = csv.reader(file_obj)
print r[0]
So, you first have to convert to list type in order to make the above code work.
r = list( csv.reader(file_obj) )
print r[0]

Finaly I got it!!!
import csv
def select_index(index):
csv_file = open('oscar_age_female.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
l = line['Index']
if l == index:
print(line[' "Name"'])
select_index('11')
"Bette Davis"

Following may be be what you are looking for:
import pandas as pd
df = pd.read_csv("table.csv")
print(df["Password"][row_number])
#where row_number is 38 maybe

import csv
inf = csv.reader(open('yourfile.csv','r'))
for row in inf:
print row[1]

append content of one csv file to another using python

I have 2 csv files:
output.csv
output1.csv
output.csv has a 5 columns of titles.
output1.csv has about 40 columns of different types of data.
I need to append all the content of output1.csv to output.csv. How can I do this?
could somebody please give me a hint on how to go about it ???
i have the following code :
reader=csv.DictReader(open("test.csv","r"))
allrows = list(reader)
keepcols = [c for c in allrows[0] if all(r[c] != '0' for r in allrows)]
print keepcols
writer=csv.DictWriter(open("output.csv","w"),fieldnames='keepcols',extrasaction='ignore')
writer.writerows(allrows)
with open("test1.csv","r") as f:
fields=next(f).split()
# print(fields)
allrows=[]
for line in f:
line=line.split()
row=dict(zip(fields,line))
allrows.append(row)
# print(row)
keepcols = [c for c in fields if any(row[c] != '0' for row in allrows)]
print keepcols
writer=csv.DictWriter(open("output1.csv","w"),fieldnames=keepcols,extrasaction='ignore')
writer.writerows(allrows)
test.csv generates output.csv
test1.csv generates output1.csv
i m trying to see if i can make both files generate my output in the same file..

If I understand your question correctly, you want to create a csv with 41 columns - the 1 from output.csv followed by the 40 from output1.csv.
I assume they have the same number of rows (if not - what is the necessary behavior?)
Try using the csv module:
import csv
reader = csv.reader(open('output.csv', 'rb'))
reader1 = csv.reader(open('output1.csv', 'rb'))
writer = csv.writer(open('appended_output.csv', 'wb'))
for row in reader:
row1 = reader1.next()
writer.writerow(row + row1)
If your csv files are formatted with special delimiters or quoting characters, you can use the optional keyword arguments for the csv.reader and csv.writer objects.
See Python's csv module documentation for details...
EDIT: Added 'b' flag, as suggested.

This recent discussion looks very similar to what you are looking for except that the OP there wanted to concatenate mp3 files.
EDIT:
import os, sys
target = '/path/to/target'
src1 = '/path/to/source1.csv'
src2 = '/path/to/source2.csv'
tf = open(target, 'a')
tf.write(open(src1).read())
tf.write(open(src2).read())
tf.close()
try this, this should work since you simply want to do the equivalent of cat src1 src2 > target of shell command

"I need to append all the content of output1.csv to output.csv." ... taken literally that would mean write each row in the first file followed by each row in the second file. Is that what you want??
titles of what? the 40 columns in the other file?? If this is so, then assuming that you want the titles written as a row of column headings:
import csv
titles = [x[0] for x in csv.reader(open('titles.csv', 'rb'))]
writer = csv.writer(open('merged.csv', 'wb'))
writer.writerow(titles)
for row in csv.reader(open('data.csv', 'rb')):
writer.writerow(row)

You could also use a generator from the reader if you want to pass a condition:
import csv
def read_generator(filepath:str):
with open(filepath, 'rb'):
reader = csv.reader(f)
for row in reader:
if row[0] == condition:
yield row
and then write from that with:
writer = csv.writer(open("process.csv", "rb"))
write.writerow(read_generator(file_to_read.csv))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

delete rows on a csv file depending on pattern python - python

I am new at handling csv files with python and I want to write code that allows me to do the following: I have a pattern as: pattern="3-5;7;10-16"(which may vary) and I want to delete (in that case) rows 3 to 5 , 7 and 10 to 16 does any one have an idea how to do that?

Related

Why are the digits in my numbers printing separately rather than together?

Python rewriting instead of appending

Use Python to split a CSV file with multiple headers

How can I get a specific field of a csv file?

append content of one csv file to another using python

Categories

Resources