Python - remove spaces on the right side - python

I've text files in folder, but files have data as below:
I don't know how I can remove spaces from right side between CRLF. These are spaces:
33/22-BBB<there is a space to remove>CRLF
import os
root_path = "C:/Users/adm/Desktop/test"
if os.path.exists(root_path):
files = []
for name in os.listdir(root_path):
if os.path.isfile(os.path.join(root_path, name)):
files.append(os.path.join(root_path, name))
for ii in files:
with open(ii) as file:
for line in file:
line = line.rstrip()
if line:
print(line)
file.close()
Does anyone have any idea how to get rid of this?

Those are control characters, change your open() command to:
with open(ii, "r", errors = "ignore") as file:
or
# Bytes mode
with open(ii, "rb") as file:
or
# '\r\n' is CR LF. See link at bottom
with open(ii, "r", newline='\r\n') as file:
Control characters in ASCII

If it is the CRLF characters you would like to remove from each string, you could use line.replace('CRLF', '').

Related

Python adding a string leaves extra characters

If you need any more info just Let Me Know
I have a python script that adds a string after each line on a CSV file. the line file_lines = [''.join([x.strip(), string_to_add, '\n']) for x in f.readlines()] is the trouble maker. For each file line it will add the string and then add a new line after each time the string is added.
Here is the script:
#Adding .JPG string to the end of each line for the Part Numbers
string_to_add = ".JPG"
#Open the file and join the .JPG to the current lines
with open("PartNums.csv", 'r') as f:
file_lines = [''.join([x.strip(), string_to_add, '\n']) for x in f.readlines()]
#Writes to the file until its done
with open("PartNums.csv", 'w') as f:
f.writelines(file_lines)
The script works and does what it is supposed to, however my issue is later on in this larger script. This script outputs into a CSV file and it looks like this:
X00TB0001.JPG
X01BJ0003.JPG
X01BJ0004.JPG
X01BJ0005.JPG
X01BJ0006.JPG
X01BJ0007.JPG
X01BJ0008.JPG
X01BJ0026.JPG
X01BJ0038.JPG
X01BJ0039.JPG
X01BJ0040.JPG
X01BJ0041.JPG
...
X01BJ0050.JPG
X01BJ0058.JPG
X01BJ0059.JPG
X01BJ0060.JPG
X01BJ0061.JPG
X01BJ0170.JPG
X01BJ0178.JPG
Without the \n in that line the csv file output looks like this file_lines = [''.join([x.strip(), string_to_add]) for x in f.readlines()]:
X00TB0001.JPGX01BJ0003.JPGX01BJ0004.JPGX01BJ0005.JPGX01BJ0006.JPG
The issue is when I go to read this file later and move files with it using this script:
#If the string matches a file name move it to a new directory
dst = r"xxx"
with open('PicsWeHave.txt') as my_file:
for filename in my_file:
src = os.path.join(XXX") # .strip() to avoid un-wanted white spaces
#shutil.copy(src, os.path.join(dst, filename.strip()))
shutil.copy(os.path.join(src, filename), os.path.join(dst, filename))
When I run this whole Script it works until it has to move the files I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'XXX\\X15SL0447.JPG\n'
I know the file exist however the '\n' should not be there and that's why I am asking how can I still get everything on a new line and not have \n after each name so when I move the file the strings match.
Thank You For Your Help!
As they said above you should use .strip():
shutil.copy(os.path.join(src, filename.strip()), os.path.join(dst, filename.strip()))
This way it gives you the file name or string you need and then it removes anything else.

broken CJK data when reading ISO-8859-1 file in python

I'm parsing some file that is ISO-8859-1 and has Chinese, Japanese, Korean characters in it.
import os
from os import listdir
cnt = 0
base_path = 'data/'
cwd = os.path.abspath(os.getcwd())
for f in os.listdir(base_path):
path = cwd + '/' + base_path + f
cnt = 0
with open(path, 'r', encoding='ISO-8859-1') as file:
for line in file:
print('line {}: {}'.format(cnt, line))
cnt +=1
The code runs but it prints broken characters. Other stackoverflow questions suggest I use encode and decode. For example, for Korean texts, I tried
file.read().encode('latin1').decode('euc-kr'), but that didn't do anything. I also tried to convert the files into utf-8 using iconv but the characters are still broken in the converted text file.
Any suggestions would be much appreciated.
Sorry, no. ISO-8859-1 cannot have any Chinese, Japanese, nor Korean characters in it. The code page doesn't support them at the first place.
What you did in the code is to ask Python to assume the file is in ISO-8859-1 encoding and return characters in Unicode (which is how strings are built). If you do not specify the encoding parameter in open(), the default would be assuming UTF-8 encoding use in the file and still return in Unicode, i.e. logical characters without any encoding specified.
Now the question is how are those CJK characters encoded in the file. If you know the answer, you can just put the right encoding parameter in open() and it works right away. Let's say it is EUC-KR as you mentioned, the code should be:
with open(path, 'r', encoding='euc-kr') as file:
for line in file:
print('line {}: {}'.format(cnt, line))
cnt +=1
If you feel frustrated, please take a look at chardet. It should help you detect the encoding from text. Example:
import chardet
with open(path, 'rb') as file:
rawdata = file.read()
guess = chardet.detect(rawdata) # e.g. {'encoding': 'EUC-KR', 'confidence': 0.99}
text = guess.decode(guess['encoding'])
cnt = 0
for line in text.splitlines():
print('line {}: {}'.format(cnt, line))
cnt +=1

Replacing string in txt file with content of another txt file (regular expressions)

I have two files: "invoiceencoded.txt"(base64 code) and "invoice.txt". I want to replace the word 'INPUT' in the second text file with the base64 code of the first text file. The purpose is to loop over the specific path for multiple examples of those, but that doesn't matter. I have the following code:
import re
import os
for f_name in os.listdir('C:/..'):
if f_name.endswith('encoded.txt'):
fin = open(f_name, "rt")
filedata = fin.read()
with open(f_name[:-11]+".txt", 'r+') as f:
text = f.read()
text = re.sub('INPUT', filedata, text)
f.seek(0)
f.write(text)
f.truncate()
The 'INPUT' string is concatenated as 'abcINPUTdef'. However, instead of giving me
"abcbase64codedef", I get:
"abcbase64code
def"
Does anyone know how to remove this line break?
Thanks in advance
Probably the line break is at the end of your base64 string in invoiceencoded.txt.
I'd suggest that you remove those line breaks and rerun your script.

Replace newlines with a space in all files in a directory - Python

I have about 4000 txt files in a directory. I'd like to replace newlines with spaces in each file using a for loop. Actually, the script works for that purpose but when I save the file, it doesn't get saved or it gets saved with newlines again. Here is my script;
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.write(data)
As I said I'm able to replace the newlines with a space, but at the end, it doesn't get saved. I also don't get any errors.
To further elaborate my comment ("It's almost always a bad idea to open a file in the 'r+' mode (because of the way the current position is handled). Open a file for reading, read the data, replace the newlines, open the same file file for writing, write the data"):
for file in glob.glob(path):
with open(file) as f:
data = f.read().replace('\n', ' ')
with open(file, "w") as f:
f.write(data)
You need to reset file position to 0 with seek and then truncate the leftover with truncate after you finishing writing the replacement string.
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.seek(0)
f.write(data)
f.truncate()

Can't escape control character "\r" when extracting file paths

I am trying to open each of the following files separately.
"C:\recipe\1,C:\recipe\2,C:\recipe\3,"
I attempt to do this using the following code:
import sys
import os
import re
line = "C:\recipe\1,C:\recipe\2,C:\recipe\3,"
line = line.replace('\\', '\\\\') # tried to escape control chars here
line = line.replace(',', ' ')
print line # should print "C:\recipe\1 C:\recipe\2 C:\recipe\3 "
for word in line.split():
fo = open(word, "r+")
# Do file stuff
fo.close()
print "\nDone\n"
When I run it, it gives me:
fo = open(word, "r+")
IOError: [Errno 13] Permission denied: 'C:'
So it must be a result of the '\r's in the original string not escaping correctly. I tried many other methods of escaping control characters but none of them seem to be working. What am I doing wrong?
Use a raw string:
line = r"C:\recipe\1,C:\recipe\2,C:\recipe\3,"
If for whatever reason you don't use raw string, you need to escape your single slashes by adding double slash:
line = "C:\\recipe\\1,C:\\recipe\\2,C:\\recipe\\3,"
print(line.split(','))
Output:
['C:\\recipe\\1', 'C:\\recipe\\2', 'C:\\recipe\\3', '']

Categories