I want to remove the "." character. When I use this
text = "Rmyname.lastname#mail.com"
text = (text.replace('.',' '))
head, sep, tail = text.partition('#')
print(head)
It works and this is the output: Rmyname lastname
But when load an external file and read every line, its doesnt replace the "." character.
with open('found.txt', 'r') as csvfile:
spamreader = csv.reader(csvfile)
for row in spamreader:
head = (row[0].replace('.', ' '))
head, sep, tail = row[0].partition('#')
print(head)
This is the output: Rmyname.lastname
How can i solve this?
You store the result of the replacement into the variable head. The original row[0] still has the period. Change row[0].partition('#') to head.partition('#').
Related
Im accessing a csv file, looping through all of its rows(strings) and i want too keep / print all parts of each string which start with a "." , has two words in the middle and ends with either a "." "?" or "!".
For example, if the string was: "This is my new channel. Please subscribe!" i'd only want to keep the ". Please subscribe!"
So far i only have this to show me how many words are inside each string:
with open("data2.csv", encoding="utf-8", newline='') as f:
reader = csv.reader(f)
for row in reader:
rowstr = str(row[1])
res = len(row[1].split())
print(res)
I've tried:
with open("data2.csv", encoding="utf-8", newline='') as f:
reader = csv.reader(f)
for row in reader:
rowstr = row[1]
res = len(row[1].split())
re.findall(r"\.\S+\s\S+[.?!]", rowstr)
print(row[1])
I get no output from findall, only from printing row[1]
Fixed it
Working code:
with open("data2.csv", encoding="utf-8", newline='') as f:
reader = csv.reader(f)
for row in reader:
rowstr = row[1]
res = len(row[1].split())
finalData = re.findall(r"(\.\W\w+\W\w+[\.\?!])", rowstr)
print(finalData)
You can use regular expression:
re.findall(r'(\.\W\w+\W\w+[\.\?!])$',"This is my new channel. Please subscribe!" )
which output:
['. Please subscribe!']
Regex is the best solution to the problems like this. Please refer here here!
My goal is to read a csv file and then print the 10 items with One Single Space between them. Following are the tasks to do that:
If the string has more than one word than add "Double quotes" around it.
The problem i am facing is that if it is a single string but with white space at the end, i am supposed to remove it.
I tried strip and rstrip in python, but it doesnt seem to work.
Following is the code for it:
with open(accidents_csv, mode='r', encoding="utf-8") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
count = 0
for row in csv_reader:
if count <=10:
new_row = [beautify_columns(item) for item in row]
print(' '.join(new_row))
count +=1
def beautify_columns(col):
col.strip()
if(' ' in col):
col = f'"{col}"'
return col
The following image shows the current behavior of the code without removing the trailing spaces.
Kindly advise me how to remove spaces at the end of a string
You have to assign the result of strip(), i.e.
col = col.strip()
Only other thing to note is that strip() will remove whitespace (i.e. not just space characters) at beginning as well as end of the string.
Partially unrelated, but a csv.writer could natively meet your other requirements, because it will automatically quote fields containing a separator:
with open(accidents_csv, mode='r', encoding="utf-8") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
csv_writer = csv.writer(sys.stdout, delimiter=" ")
for count, row in enumerate(csv_reader):
new_row = [item.strip() for item in row]
csv_writer.writerow(new_row)
if count >= 9: break
As said by #barny, strip() will remove all whitespace characters, including "\r" or "\t". Use strip(' ') if you want to only remove space characters.
I simply want to replace the last character in my file. The reason is because when I write to my file, at the last point in which I write to the file, there is an extra , that is included at the end. I simply don't want to write that , at the end, but rather would want to replace it with a ] if possible. Here is my attempt:
reader = csv.DictReader(open(restaurantsCsv), delimiter=';')
with open(fileName, 'w+') as textFile:
textFile.write('[')
for row in reader:
newRow = {}
for key, value in row.items():
if key == 'stars_count' or key == 'reviews_count':
newRow[key] = float(value)
else:
newRow[key] = value
textFile.write(json.dumps(newRow) + ',')
textFile.seek(-1, os.SEEK_END)
textFile.truncate()
textFile.write(']')
It all works properly until I get to textFile.seek(-1, os.SEEK_END) where I want to seek the end of the file and I want to remove that last , in the file, but I get an error saying io.UnsupportedOperation: can't do nonzero end-relative seeks. Therefore, I made it so that my file opens with wb+ parameters, but if I do that, then I can only write bytes to my file, and not strings. Is there any way I can simply replace the last character in my file with a ] instead of a ,? I know I can simply open the file to read, truncate the file, then open the file again to write the last ] but that seems inefficient (as shown here):
with open(filename, 'rb+') as filehandle:
filehandle.seek(-1, os.SEEK_END)
filehandle.truncate()
with open(filename, 'a') as filehandle:
filehandle.write(']')
Any help would be appreciated. Thanks!
You can slightly modify your approach and instead of appending a comma at the end of each line, you just prepend a comma to every line but the first:
reader = csv.DictReader(open(restaurantsCsv), delimiter=';')
with open(fileName, 'w+') as text_file:
text_file.write('[')
for index, row in enumerate(reader):
new_row = {}
for key, value in row.items():
if key in ('stars_count', 'reviews_count'):
new_row[key] = float(value)
else:
new_row[key] = value
if index != 0:
text_file.write(',')
text_file.write(json.dumps(new_row))
text_file.write(']')
To replace last character of the file, i.e. last character of last line.
To see if its working properly
sed '$ s/.$/]/' file_name
To replace the last character of last line i.e comma in your case with ']' and change the file.
sed -i '$ s/.$/]/' file_name
To run from within python
import os
print os.system("sed -i '$ s/.$/]/' file_name")
As suggested by #Chris, accumulate all the new rows in a list then write all those row once. Then you won't have that pesky hanging comma.
......
rows = []
for row in reader:
newRow = {}
for key, value in row.items():
if key == 'stars_count' or key == 'reviews_count':
newRow[key] = float(value)
else:
newRow[key] = value
rows.append(newRow)
textFile.write(json.dumps(rows))
I've a large csv file(comma delimited). I would like to replace/rename few random cell with the value "NIL" to an empty string "".
I tried this to find the keyword "NIL" and replace with '' empty
string. But it's giving me an empty csv file
ifile = open('outfile', 'rb')
reader = csv.reader(ifile,delimiter='\t')
ofile = open('pp', 'wb')
writer = csv.writer(ofile, delimiter='\t')
findlist = ['NIL']
replacelist = [' ']
s = ifile.read()
for item, replacement in zip(findlist, replacelist):
s = s.replace(item, replacement)
ofile.write(s)
From seeing you code i fell you directly should
read the file
with open("test.csv") as opened_file:
data = opened_file.read()
then use regex to change all NIL to "" or " " and save back the data to the file.
import re
data = re.sub("NIL"," ",data) # this code will replace NIL with " " in the data string
NOTE: you can give any regex instead of NIL
for more info see re module.
EDIT 1: re.sub returns a new string so you need to return it to data.
A few tweaks and your example works. I edited your question to get rid of some indenting errors - assuming those were a cut/paste problem. The next problem is that you don't import csv ... but even though you create a reader and writer, you don't actually use them, so it could just be removed. So, opening in text instead of binary mode, we have
ifile = open('outfile') # 'outfile' is the input file...
ofile = open('pp', 'w')
findlist = ['NIL']
replacelist = [' ']
s = ifile.read()
for item, replacement in zip(findlist, replacelist):
s = s.replace(item, replacement)
ofile.write(s)
We could add 'with' clauses and use a dict to make replacements more clear
replace_this = { 'NIL': ' '}
with open('outfile') as ifile, open('pp', 'w') as ofile:
s = ifile.read()
for item, replacement in replace_this.items:
s = s.replace(item, replacement)
ofile.write(s)
The only real problem now is that it also changes things like "NILIST" to "IST". If this is a csv with all numbers except for "NIL", that's not a problem. But you could also use the csv module to only change cells that are exactly "NIL".
with open('outfile') as ifile, open('pp', 'w') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
for row in reader:
# row is a list of columns. The following builds a new list
# while checking and changing any column that is 'NIL'.
writer.writerow([c if c.strip() != 'NIL' else ' '
for c in row])
I have similar problem to this guy: find position of a substring in a string
The difference is that I don't know what my "mystr" is. I know my substring but my string in the input file could be random amount of words in any order, but i know one of those words include substring cola.
For example a csv file: fanta,coca_cola,sprite in any order.
If my substring is "cola", then how can I make a code that says
mystr.find('cola')
or
match = re.search(r"[^a-zA-Z](cola)[^a-zA-Z]", mystr)
or
if "cola" in mystr
When I don't know what my "mystr" is?
this is my code:
import csv
with open('first.csv', 'rb') as fp_in, open('second.csv', 'wb') as fp_out:
reader = csv.DictReader(fp_in)
rows = [row for row in reader]
writer = csv.writer(fp_out, delimiter = ',')
writer.writerow(["new_cola"])
def headers1(name):
if "cola" in name:
return row.get("cola")
for row in rows:
writer.writerow([headers1("cola")])
and the first.csv:
fanta,cocacola,banana
0,1,0
1,2,1
so it prints out
new_cola
""
""
when it should print out
new_cola
1
2
Here is a working example:
import csv
with open("first.csv", "rb") as fp_in, open("second.csv", "wb") as fp_out:
reader = csv.DictReader(fp_in)
writer = csv.writer(fp_out, delimiter = ",")
writer.writerow(["new_cola"])
def filter_cola(row):
for k,v in row.iteritems():
if "cola" in k:
yield v
for row in reader:
writer.writerow(list(filter_cola(row)))
Notes:
rows = [row for row in reader] is unnecessary and inefficient (here you convert a generator to list which consumes a lot of memory for huge data)
instead of return row.get("cola") you meant return row.get(name)
in the statement return row.get("cola") you access a variable outside of the current scope
you can also use the unix tool cut. For example:
cut -d "," -f 2 < first.csv > second.csv