Python find matching string in each line

Python find matching string in each line - python

I would like to read each row of the csv file and match each word in the row with a list of strings. If any of the strings appears in the row, then write that string at the end of the line separated by comma.
The code below doesn't give me what I want.
file = 'test.csv'
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
name=''
with open('testnew2.csv','a') as f:
for line in text_lines:
line=str(line)
#words = line.split()
with open('names.csv', 'r') as fd:
reader = csv.reader(fd, delimiter=',')
for row in reader:
if row[0] in line:
name=row
print(name)
f.write(line+","+name[0]+'\n')
A sample of test.csv would look like this:
A,B,C,D
ABCD,,,
Total,Robert,,
Name,Annie,,
Total,Robert,,
And the names.csv would look:
Robert
Annie
Amanda
The output I want is:
A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert
Currently the code will get rid of lines that don't result in a match, so I got:
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert

Process each line by testing row[1] and appending the 5th column, then writing it. The name list isn't really a csv. If it's really long use a set for lookup. Read it only once for efficiency as well.
import csv
with open('names.txt') as f:
names = set(f.read().strip().splitlines())
# newline='' per Python 3 csv documentation...
with open('input.csv',newline='') as inf:
with open('output.csv','w',newline='') as outf:
r = csv.reader(inf)
w = csv.writer(outf)
for row in r:
row.append(row[1] if row[1] in names else '')
w.writerow(row)
Output:
A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert

I think the problem is you're only writing when the name is in the row. To fix that move the writing call outside of the if conditional:
if row[0] in line:
name=row
print(name)
f.write(line+","+name[0]+'\n')
I'm guessing that print statement is for testing purposes?
EDIT: On second thought, you may need to move name='' inside the loop as well so it is reset after each row is written, that way you don't get names from matched rows bleeding into unmatched rows.
EDIT: Decided to show an implementation that should avoid the (possible) problem of two matched names in a row:
EDIT: Changed name=row and the call of name[0] in f.write() to name=row[0] and a call of name in f.write()
file = 'test.csv'
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
with open('testnew2.csv','a') as f:
for line in text_lines:
name=''
line=str(line)
#words = line.split()
with open('names.csv', 'r') as fd:
reader = csv.reader(fd, delimiter=',')
match=False
while match == False:
for row in reader:
if row[0] in line:
name=row[0]
print(name)
match=True
f.write(line+","+name+'\n')

Try this as well:
import csv
myFile = open('testnew2.csv', 'wb+')
writer = csv.writer(myFile)
reader2 = open('names.csv').readlines()
with open('test.csv') as File1:
reader1 = csv.reader(File1)
for row in reader1:
name = ""
for record in reader2:
record = record.replace("\n","")
if record in row:
row.append(record)
writer.writerow(row)
break

Related

Delete rows from csv file using function in Python

def usunPsa(self, ImiePsa):
with open('schronisko.csv', 'rb') as input, open('schronisko.csv', 'wb') as output:
writer = csv.writer(output)
for row in csv.reader(input):
if row[0] == ImiePsa:
writer.writerow(row)
with open(self.plik, 'r') as f:
print(f.read())
Dsac;Chart;2;2020-11-04
Dsac;Chart;3;2020-11-04
Dsac;Chart;4;2020-11-04
Lala;Chart;4;2020-11-04
Sda;Chart;4;2020-11-04
Sda;X;4;2020-11-04
Sda;Y;4;2020-11-04
pawel;Y;4;2020-11-04`
If I use usunPsa("pawel") every line gets removed.
Following code earse my whole csv file instead only one line with given ImiePsa,
What may be the problem there?

I found the problem. row[0] in your code returns the entire row, that means the lines are not parsed correctly. After a bit of reading, I found that csv.reader has a parammeter called delimiter to sepcify the delimiter between columns.
Adding that parameter solves your problem, but not all problems though.
The code that worked for me (just in case you still want to use your original code)
import csv
def usunPsa(ImiePsa):
with open('asd.csv', 'rb') as input, open('schronisko.csv', 'wb') as output:
writer = csv.writer(output)
for row in csv.reader(input, delimiter=';'):
if row[0] == ImiePsa:
writer.writerow(row)
usunPsa("pawel")
Notice that I changed the output filename. If you want to keep the filename the same however, you have to use Hamza Malik's answer.

Just read the csv file in memory as a list, then edit that list, and then write it back to the csv file.
lines = list()
members= input("Please enter a member's name to be deleted.")
with open('mycsv.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == members:
lines.remove(row)
with open('mycsv.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)

how to find its the last item in for loop ,when read from file - Python

I want to read from the pList.csv file and write all item in a string, such that each row is separated by a comma.
the file has only one column. for example, the pList.csv is :
28469977
24446384
25968054
and output string must be:
28469977,24446384,25968054
for do this, have considered the following code. but there is a little problem
p_list = ""
with open("pList.csv", mode="r") as infile:
reader = csv.reader(infile)
for row in reader:
p_list += row[0]
if its_not_last_loop :
p_list += ","
What expression is appropriate for its_not_last_loop so that , is not applied to the last row of the file?

Try this:
with open("pList.csv", mode="r") as infile:
reader = csv.reader(infile)
out_list = []
for row in reader:
out_list.append(row[0]) #row[0] get value for the sample input
p_list = ",".join(out_list)
print(p_list)
See What exactly does the .join() method do?

This can be shortened to (and is faster)
with open("pList.csv", mode="r") as infile:
reader = csv.reader(infile)
p_list = ",".join(row[0] for row in reader)
print(p_list)

CSV file input and sorted output is formmated incorrectly

So I am just trying to learn python and I am working on a program to take in information from a csv file sort it then output it into a new csv file. my problem is that in the output, the data ends up being in one row instead of the original 5 rows. I am not sure why, I am attempting to open the new sorted file in LibreOffice Calc and the formatting is off, does this have something to do with the delimiters? Thanks
import csv
import operator
name = raw_input()
myfile = name
o = open(myfile, 'rU')
mydata = csv.reader(o)
sortedlist = sorted(mydata, key=operator.itemgetter(1), reverse=True)
for row in sortedlist:
print(row[0], row[1], row[2], row[3], row[4])
o.close()
print('Enter the name for the output file')
ofile = raw_input()
with open(ofile, 'wb') as csvfile:
sortwriter = csv.writer(csvfile)
sortwriter.writerow(sortedlist)

writerow() writes a single row at a time. Since you want to write them all at once so you need to call its plural cousin writerows().
I have taken the liberty of cleaning up your code:
import csv
from operator import itemgetter
fin_name = raw_input('Enter the name for the input file')
fout_name = raw_input('Enter the name for the output file')
with open(fin_name, 'rb') as fin: # switched filemode from 'rU'. change if really needed
with open(fout_name 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
rows = sorted(reader, key=itemgetter(1), reverse=True)
writer.writerows(rows) # changed writerow to writerows

Python read a file replace a string in a word

I am trying to read a file with below data
Et1, Arista2, Ethernet1
Et2, Arista2, Ethernet2
Ma1, Arista2, Management1
I need to read the file replace Et with Ethernet and Ma with Management. At the end of them the digit should be the same. The actual output should be as follows
Ethernet1, Arista2, Ethernet1
Ethernet2, Arista2, Ethernet2
Management1, Arista2, Management1
I tried a code with Regular expressions, I am able to get to the point I can parse all Et1, Et2 and Ma1. But unable to replace them.
import re
with open('test.txt','r') as fin:
for line in fin:
data = re.findall(r'\A[A-Z][a-z]\Z\d[0-9]*', line)
print(data)
The output looks like this..
['Et1']
['Et2']
['Ma1']

import re
#to avoid compile in each iteration
re_et = re.compile(r'^Et(\d+),')
re_ma = re.compile(r'^Ma(\d+),')
with open('test.txt') as fin:
for line in fin:
data = re_et.sub('Ethernet\g<1>,', line.strip())
data = re_ma.sub('Management\g<1>,', data)
print(data)

This example follows Joseph Farah's suggestion
import csv
file_name = 'data.csv'
output_file_name = "corrected_data.csv"
data = []
with open(file_name, "rb") as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
data.append(row)
corrected_data = []
for row in data:
tmp_row = []
for col in row:
if 'Et' in col and not "Ethernet" in col:
col = col.replace("Et", "Ethernet")
elif 'Ma' in col and not "Management" in col:
col = col.replace("Ma", "Management")
tmp_row.append(col)
corrected_data.append(tmp_row)
with open(output_file_name, "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for row in corrected_data:
writer.writerow(row)
print data

Here are the steps you should take:
Read each line in the file
Separate each line into smaller list items using the comments as delimiters
Use str.replace() to replace the characters with the words you want; keep in mind that anything that says "Et" (including the beginning of the word "ethernet") will be replaced, so remember to account for that. Same goes for Ma and Management.
Roll it back into one big list and put it back in the file with file.write(). You may have to overwrite the original file.

Error while reading csv file using python

I am trying to read a specific comma value from a csv file but i am getting the full row value how can i get the specific comma value
My csv looks like this
Index,Time,Energy
1,1.0,45.034
i need to get the values of Energy in each column.

import csv
with open('somefile.csv') as f:
reader = csv.DictReader(f, delimiter=',')
rows = list(reader)
for row in rows:
print(row['Energy'])

f = open('file.txt')
f.readline() # To skip header
for line in f:
print(line.split(',')[2])
f.close()

If you want it working even if the position of column Energy changes, you can do this:
with open('your_file.csv') as f:
# Get header
header = f.readline()
energy_index = header.index('Energy')
# Get Energy value
for line in f.readlines():
energy = line.split(',')[energy_index]
# do whatever you want to do with Energy

Check the below code. Hoping this is what you are looking for.
import csv
try:
fobj = open(file_name, 'r')
file_content = csv.reader(fobj, delimiter=',', quotechar='|')
except:
fobj.close()
file_content = False
if file_content:
for row_data in file_content:
try:
# This will return the 3rd columns value i.e 'energy'
row_data[2] = row_data[2].strip()
print row_data[2]
except:
pass
fobj.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python find matching string in each line - python

Related

Delete rows from csv file using function in Python

how to find its the last item in for loop ,when read from file - Python

CSV file input and sorted output is formmated incorrectly

Python read a file replace a string in a word

Error while reading csv file using python

Categories

Resources