Python: replace a string in a CSV file - python

I am a beginner and I have an issue with a short code. I want to replace a string from a csv to with another string, and put out a new
csv with an new name. The strings are separated with commas.
My code is a catastrophe:
import csv
f = open('C:\\User\\Desktop\\Replace_Test\\Testreplace.csv')
csv_f = csv.reader(f)
g = open('C:\\Users\\Desktop\\Replace_Test\\Testreplace.csv')
csv_g = csv.writer(g)
findlist = ['The String, that should replaced']
replacelist = ['The string that should replace the old striong']
#the function ?:
def findReplace(find,replace):
s = f.read()
for item, replacement in zip(findlist,replacelist):
s = s.replace(item,replacement)
g.write(s)
for row in csv_f:
print(row)
f.close()
g.close()

You can do this with the regex package re. Also, if you use with you don't have to remember to close your files, which helps me.
EDIT: Keep in mind that this matches the exact string, meaning it's also case-sensitive. If you don't want that then you probably need to use an actual regex to find the strings that need replacing. You would do this by replacing find_str in the re.sub() call with r'your_regex_here'.
import re
# open your csv and read as a text string
with open(my_csv_path, 'r') as f:
my_csv_text = f.read()
find_str = 'The String, that should replaced'
replace_str = 'The string that should replace the old striong'
# substitute
new_csv_str = re.sub(find_str, replace_str, my_csv_text)
# open new file and save
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(new_csv_str)

Related

Make CSV escape Double Quotation Marks

I need to prepare a .csv file so that a double quotation marks gets ignored by the program processing it (ArcMap). Arc was blending the contents of all following cells on that line into any previous one containing double quotation marks. For example:
...and no further rows would get processed at all.
How does one make a CSV escape Double Quotation Marks for successful processing in ArcMap (10.2)?
Let's say df is the DataFrame created for the csv files as follows
df = pd.read_csv('filename.csv')
Let us assume that comments is the name of the column where the issue occurs, i.e. you want to replace every double quotes (") with a null string ().
The following one-liner does that for you. It will replace every double quotes for every row in df['comments'] with null string.
df['comments'] = df['comments'].apply(lambda x: x.replace('"', ''))
The lambda captures every row in df['comments'] in variable x.
EDIT: To escape the double quotes you need to convert the string to it's raw format. Again another one-liner very similar to the one above.
df['comments'] = df['comments'].apply(lambda x: r'{0}'.format(x))
The r before the string is an escape to escape characters in python.
You could try reading the file with the csv module and writing it back in the hopes that the output format will be more digestible for your other tool. See the docs for formatting options.
import csv
with open('in.csv', 'r') as fin, open('out.csv', 'w') as fout:
reader = csv.reader(fin, delimiter='\t')
writer = csv.writer(fout, delimiter='\t')
# alternative:
# writer = csv.writer(fout, delimiter='\t', escapechar='\\', doublequote=False)
for line in reader:
writer.writerow(line)
What worked for me was writing a module to do some "pre-processing" of the CSV file as follows. The key line is where the "writer" has the parameter "quoting=csv.QUOTE_ALL". Hopefully this is useful to others.
def work(Source_CSV):
from __main__ import *
import csv, arcpy, os
# Derive name and location for newly-formatted .csv file
Head = os.path.split(Source_CSV)[0]
Tail = os.path.split(Source_CSV)[1]
name = Tail[:-4]
new_folder = "formatted"
new_path = os.path.join(Head,new_folder)
Formatted_CSV = os.path.join(new_path,name+"_formatted.csv")
#arcpy.AddMessage("Formatted_CSV = "+Formatted_CSV)
# Populate the new .csv file with quotation marks around all field contents ("quoting=csv.QUOTE_ALL")
with open(Source_CSV, 'rb') as file1, open(Formatted_CSV,'wb') as file2:
# Instantiate the .csv reader
reader = csv.reader(file1, skipinitialspace=True)
# Write column headers without quotes
headers = reader.next() # 'next' function actually begins at the first row of the .csv.
str1 = ''.join(headers)
writer = csv.writer(file2)
writer.writerow(headers)
# Write all other rows wrapped in double quotes
writer = csv.writer(file2, delimiter=',', quoting=csv.QUOTE_ALL)
# Write all other rows, at first quoting none...
#writer = csv.writer(file2, quoting=csv.QUOTE_NONE,quotechar='\x01')
for row in reader:
# ...then manually doubling double quotes and wrapping 3rd column in double quotes.
#row[2] = '"' + row[2].replace('"','""') + '"'
writer.writerow(row)
return Formatted_CSV

Removing punctuation and change to lowercase in python CSV file

The code below allow me to open the CSV file and change all the texts to lowercase. However, i have difficulties trying to also remove the punctuation in the CSV file. How can i do that? Do i use string.punctuation?
file = open('names.csv','r')
lines = [line.lower() for line in file]
with open('names.csv','w') as out
out.writelines(sorted(lines))
print (lines)
sample of my few lines from the file:
Justine_123
ANDY*#3
ADRIAN
hEnNy!
You can achieve this by importing strings and make use of the following example code below.
The other way you can achieve this is by using regex.
import string
str(lines).translate(None, string.punctuation)
Also you may want to learn more about how import string works and its features
The working example you requested for.
import string
with open("sample.csv") as csvfile:
lines = [line.lower() for line in csvfile]
print(lines)
will give you ['justine_123\n', 'andy*#3\n', 'adrian\n', 'henny!']
punc_table = str.maketrans({key: None for key in string.punctuation})
new_res = str(lines).translate(punc_table)
print(new_res)
new_s the result will give you justine123n andy3n adriann henny
Example with regular expressions.
import csv
import re
filename = ('names.csv')
def reg_test(name):
reg_result = ''
with open(name, 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
row = re.sub('[^A-Za-z0-9]+', '', str(row))
reg_result += row + ','
return reg_result
print(reg_test(filename).lower())
justine123,andy3,adrian,henny,

Send keylogger log files to e-mail [duplicate]

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Replace multiple cell in csv python module

I've a large csv file(comma delimited). I would like to replace/rename few random cell with the value "NIL" to an empty string "".
I tried this to find the keyword "NIL" and replace with '' empty
string. But it's giving me an empty csv file
ifile = open('outfile', 'rb')
reader = csv.reader(ifile,delimiter='\t')
ofile = open('pp', 'wb')
writer = csv.writer(ofile, delimiter='\t')
findlist = ['NIL']
replacelist = [' ']
s = ifile.read()
for item, replacement in zip(findlist, replacelist):
s = s.replace(item, replacement)
ofile.write(s)
From seeing you code i fell you directly should
read the file
with open("test.csv") as opened_file:
data = opened_file.read()
then use regex to change all NIL to "" or " " and save back the data to the file.
import re
data = re.sub("NIL"," ",data) # this code will replace NIL with " " in the data string
NOTE: you can give any regex instead of NIL
for more info see re module.
EDIT 1: re.sub returns a new string so you need to return it to data.
A few tweaks and your example works. I edited your question to get rid of some indenting errors - assuming those were a cut/paste problem. The next problem is that you don't import csv ... but even though you create a reader and writer, you don't actually use them, so it could just be removed. So, opening in text instead of binary mode, we have
ifile = open('outfile') # 'outfile' is the input file...
ofile = open('pp', 'w')
findlist = ['NIL']
replacelist = [' ']
s = ifile.read()
for item, replacement in zip(findlist, replacelist):
s = s.replace(item, replacement)
ofile.write(s)
We could add 'with' clauses and use a dict to make replacements more clear
replace_this = { 'NIL': ' '}
with open('outfile') as ifile, open('pp', 'w') as ofile:
s = ifile.read()
for item, replacement in replace_this.items:
s = s.replace(item, replacement)
ofile.write(s)
The only real problem now is that it also changes things like "NILIST" to "IST". If this is a csv with all numbers except for "NIL", that's not a problem. But you could also use the csv module to only change cells that are exactly "NIL".
with open('outfile') as ifile, open('pp', 'w') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
for row in reader:
# row is a list of columns. The following builds a new list
# while checking and changing any column that is 'NIL'.
writer.writerow([c if c.strip() != 'NIL' else ' '
for c in row])

Python: How to capitalize the first column of a .txt file.

I have a .csv formatted .txt file. I am deliberating over the best manner in which to .capitalize the text in the first column.
.capitalize() is a string method, so I considered the following; I would need to open the file, convert the data to a list of strings, capitalize the the required word and finally write the data back to file.
To achieve this, I did the following:
newGuestList = []
with open("guestList.txt","r+") as guestFile :
guestList = csv.reader(guestFile)
for guest in guestList :
for guestInfo in guest :
capitalisedName = guestInfo.capitalize()
newGuestList.append(capitalisedName)
Which gives the output:
[‘Peter’, ‘35’, ‘ spain’, ‘Caroline’, ‘37’, ‘france’, ‘Claire’,’32’, ‘ sweden’]
The problem:
Firstly; in order to write this new list back to file, I will need to convert it to a string. I can achieve this using the .join method. However, how can I introduce a newline, \n, after every third word (the country) so that each guest has their own line in the text file?
Secondly; this method, of nested for loops etc. seems highly convoluted, is there a cleaner way?
My .txt file:
peter, 35, spain\n
caroline, 37, france\n
claire, 32, sweden\n
You don't need to split the lines, since the first caracter of the first word is the first caracter of the line :
with open("lst.txt","r") as guestFile :
lines=guestFile.readlines()
newlines=[line.capitalize() for line in lines]
with open("lst.txt","w") as guestFile :
guestFile.writelines(newlines)
You can just use a CSV reader and writer and access the element you want to capitalize from the list.
import csv
import os
inp = open('a.txt', 'r')
out = open('b.txt', 'w')
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
row[0] = row[0].capitalize()
writer.writerow(row)
inp.close()
out.close()
os.rename('b.txt', 'a.txt') # if you want to keep the same name

Categories