Reading from file. Input character \ becomes \\ in file output. Best approach? - python

I have a Python script that is reading files as input which contains \ characters, alters some of the content, and writes it to another output file. A simplified version of this script looks like this:
inputFile = open(sys.argv[1], 'r')
input = inputFile.read()
outputFile = open(sys.argv[2], 'w')
outputFile.write(input.upper())
Given, this content in input file:
My name\'s Bob
the output is:
MY NAME\\'S BOB
instead of:
MY NAME'S BOB
I suspect that this is because of the input file's format because direct string input yields desirable results (e.g. outputFile.write(('My name\'s Bob').upper())). This does not occur for all files (e.g. .txt files work, but .js files don't). Because I am reading different files as text files, the ideal solution should not require that input file be of certain type, so is there a better way to read files? This leads me to question whether I should use different read/write functions.
Thanks in advance for all help

Related

Writing Python CSV files - encoding

I open a .csv file and I write another .csv file in output.
I specified encoding='utf-8' for both files.
When I read the input file, in a dictionary, I have an accented character (ì) which I can see in the variables I use, but "ì" becomes "ì" when I write it in the output file.
I create the output line by concatenating some variables, like this:
output_line = [name, address, citizenship_flag]
citizenship_flag may be "sì" or "no".
In the output file it becomes "ì".
Where am I wrong?
Thanks.

Adding text at the beginning of multiple txt files into a folder. Problem of overwriting the text inside

im trying to add the same text at the beggining of all the txt files that are in a folder.
With this code i can do it, but there is a problem, i dont know why it overwrite part of the text that is at the beginning of each txt file.
output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
for f in glob.glob("*.txt"):
with open(f, 'r', encoding="utf8") as inputfile:
with open('%s/%s' % (output_dir, ntpath.basename(f)), 'w', encoding="utf8") as outputfile:
for line in inputfile:
outputfile.write(line.replace(line,"more_text"+line+"text_that_is_overwrited"))
outputfile.seek(0,io.SEEK_SET)
outputfile.write('text_that_overwrite')
outputfile.seek(0, io.SEEK_END)
outputfile.write("more_text")
The content of txt files that im trying to edit start with this:
here 4 spaces text_line_1
here 4 spaces text_line_2
The result is:
On file1.txt: text_that_overwriteited
On file1.txt: text_that_overwriterited
Your mental model of how writing a file works seems to be at odds with what's actually happening here.
If you seek back to the beginning of the file, you will start overwriting all of the file. There is no such thing as writing into the middle of a file. A file - at the level of abstraction where you have open and write calls - is just a stream; seeking back to the beginning of the stream (or generally, seeking to a specific position in the stream) and writing replaces everything which was at that place in the stream before.
Granted, there is a lower level where you could actually write new bytes into a block on the disk whilst that block still remains the storage for a file which can then be read as a stream. With most modern file systems, the only way to make this work is to replace that block with exactly the same amount of data, which is very rarely feasible. In other words, you can't replace a block containing 1024 bytes with data which isn't also exactly 1024 bytes. This is so marginally useful that it's simply not an operation which is exposed to the higher level of the file system.
With that out of the way, the proper way to "replace lines" is to not write those lines at all. Instead, write the replacement, followed by whichever lines were in the original file.
It's not clear from your question what exactly you want overwritten, so this is just a sketch with some guesses around that part.
output_dir = "output"
# prefer exist_ok=True over if not os.path.exists()
os.makedirs(output_dir, exist_ok=True)
for f in glob.glob("*.txt"):
# use a single with statement
# prefer os.path.basename over ntpath.basename; use os.path.join
with open(f, 'r', encoding="utf8") as inputfile, \
open(os.path.join(output_dir, os.path.basename(f)), 'w', encoding="utf8") as outputfile:
for idx, line in enumerate(inputfile):
if idx == 0:
outputfile.write("more text")
outputfile.write(line.rstrip('\n'))
outputfile.write("text that is overwritten\n")
continue
# else:
outputfile.write(line)
outputfile.write("more_text\n")
Given an input file like
here is some text
here is some more text
this will create an output file like
more texthere is some texttext that is overwritten
here is some more text
more_text
where the first line is a modified version of the original first line, and a new line is appended after the original file's contents.
I found this elsewhere on StackOverflow. Why does my text file keep overwriting the data on it?
Essentially, the w mode is meant to overwrite text.
Also, you seem to be writing a sitemap manually. If you are using a web framework like Flask or Django, they have plugin or built-in support for auto-generated sitemaps — you should use that instead. Alternatively, you could create an XML template for the sitemap using Jinja or DTL. Templates are not just for HTML files.

How do I copy .reg file to pure .txt in Python?

I file.readline() some registry file in order to filter some substrings out. I am making a copy of it (just to preserve original) using shutil.copyfile(), processing by foo() and see nothing filtered out. Tried debugging and the contents of lines are very binary:
'˙ţW\x00i\x00n\x00d\x00o\x00w\x00s\x00 \x00R\x00e\x00g\x00i\x00s\x00t\x00r\x00y\x00 \x00E\x00d\x00i\x00t\x00o\x00r\x00 \x00V\x00e\x00r\x00s\x00i\x00o\x00n\x00 \x005\x00.\x000\x000\x00\n'
which is rather obvious, but was not aware of this (Notepad++ neaty presentation of text). My question is: how can I filter my strings out?
I see two options, which are reg->txt->reg approach (what I meant by the title) or converting there strings to bytes and then compare them with contents.
When I create files by hand (copy and paste contents of input file) and give them .txt, then everything works fine, but I wish it could be automated.
inputfile = "filename_in.reg"
outputfile = "filename_out.reg"
copyfile(inputfile, output file)
with open(outputfile, 'r+') as fd:
contents = fd.readlines()
for d in data:
foo(fd, d, contents)
Reg files are usually UTF-16 (usually referred to in MS documentation as "Unicode". It looks like your debug is treating the data as 8-bit characters (so there are lots of \x00 for the high order bytes of the 16-bit characters). Notepad++ can be persuaded to display UTF-16.
The fix is to tell Python that the text you are reading is in UTF-16 format:
open(outputfile, 'r+', encoding='utf16')

Find and replace texts in all files from the text file input using Python in Notepad++

I'm using Notepad ++ to do a find and replacement function. Currently I have a a huge numbers of text files. I need to do a replacement for different string in different file. I want do it in batch. For example.
I have a folder that has the huge number of text file. I have another text file that has the strings for find and replace in order
Text1 Text1-corrected
Text2 Text2-corrected
I have a small script that do this replacement only for the opened files in Notepad++. For achieving this I'm using python script in Notepad++. The code is as follows.
with open('C:/replace.txt') as f:
for l in f:
s = l.split()
editor.replace(s[0], s[1])
In simple words, the find and replace function should fetch the input from a file.
Thanks in advance.
with open('replace.txt') as f:
replacements = [tuple(line.split()) for line in f]
for filename in filenames:
with open(filename, 'w') as f:
contents = f.read()
for old, new in replacements:
contents = contents.replace(old, new)
f.write(contents)
Read replacements into a list of tuples, then go through each file, and read the contents into memory, do the replacements, then write it back. I think the files get overwritten properly, but you might want to double check.

reading from multiple txt files - strip data and save to xls

i'm very new to python, so far i have written the following code below, which allows me to search for text files in a folder, then read all the lines from it, open an excel file and save the read lines in it. (Im still unsure whether this does it for all the text files one by one)
Having run this, i only see the file text data being read and saved into the excel file (first column). Or it could be that it is overwriting the the data from multiple text files into the same column until it finishes.
Could anyone point me in the right direction on how to get it to write the stripped data to the next available column in excel through each text file?
import os
import glob
list_of_files = glob.glob('./*.txt')
for fileName in list_of_files:
fin = open( fileName, "r" )
data_list = fin.readlines()
fin.close() # closes file
del data_list[0:17]
del data_list[1:27] # [*:*]
fout = open("stripD.xls", "w")
fout.writelines(data_list)
fout.flush()
fout.close()
Can be condensed in
import glob
list_of_files = glob.glob('./*.txt')
with open("stripD.xls", "w") as fout:
for fileName in list_of_files:
data_list = open( fileName, "r" ).readlines()
fout.write(data_list[17])
fout.writelines(data_list[44:])
Are you aware that writelines() doesn't introduce newlines ? readlines() keeps newlines during a reading, so there are newlines present in the elements of data_list written in the file by writelines() , but this latter doesn't introduce newlines itself
You may like to check this and for simple needs also csv.
These lines are "interesting":
del data_list[0:17]
del data_list[1:27] # [*:*]
You are deleting as many of the first 17 lines of your input file as exist, keeping the 18th (if it exists), deleting another 26 (if they exist), and keeping any following lines. This is a very unusual procedure, and is not mentioned at all in your description of what you are trying to do.
Secondly, you are writing the output lines (if any) from each to the same output file. At the end of the script, the output file will contain data from only the last input file. Don't change your code to use append mode ... opening and closing the same file all the time just to append records is very wasteful, and only justified if you have a real need to make sure that the data is flushed to disk in case of a power or other failure. Open your output file once, before you start reading files, and close it once when you have finished with all the input files.
Thirdly, any old arbitrary text file doesn't become an "excel file" just because you have named it "something.xls". You should write it with the csv module and name it "something.csv". If you want more control over how Excel will interpret it, write an xls file using xlwt.
Fourthly, you mention "column" several times, but as you have not given any details about how your input lines are to be split into "columns", it is rather difficult to guess what you mean by "next available column". It is even possible to suspect that you are confusing columns and rows ... assuming less than 43 lines in each input file, the 18th ROW of the last input file will be all you will see in the output file.

Categories