I'm trying to write 32 bytes of binary data to a file but an extra byte is being added
wb mode doesn't seem to accept a newline argument so I'm not sure what to do here.
str_ = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
with open('test.bin', 'wb') as f:
f.write(str_)
You MUST view the file in a hex editor to be able to see the extra byte being added.
Hex View of the file from VIM: https://i.imgur.com/0VcjTCT.png
os.system('touch efuse.bin')
with open('efuse.bin', 'wb') as f:
f.write(generateBinString())
why you are writing newline=""
c=b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
with open('efuse.bin', 'wb') as f:
f.write(c.rstrip())
with open('efuse.bin', 'rb') as f:
print(f.read())
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Line Feeds (0A) and Carriage Returns (0D)
CAN YOU SEE Python file.write creating extra carriage return
hope it helps
Related
I want to use BOM with UTF-8. But it only saves files in UTF-8. What can I do ?I'm rather new, could you please write an answer as an addition to the sample code I shared directly?
import os
import codecs
a=1
filelist=os.listdir("name")
for file in filelist:
filelen=len(os.listdir("name/"+file))
if filelen==10:
with open(file + ".iadx", "w", encoding="UTF-8") as f:
f.write("<name>")
f.write("\n")
f.write('something')
From Python documentation on codecs (search for "-sig") :
On encoding the utf-8-sig codec will write 0xef, 0xbb, 0xbf as the first three bytes to the file.
So just doing :
with open(file + ".iadx", "w", encoding="utf-8-sig") as f:
# ^^^^
will do the trick.
Let's say that I have a txt file that I have to get all in lowercase. I tried this
def lowercase_txt(file):
file = file.casefold()
with open(file, encoding = "utf8") as f:
f.read()
Here I get "'str' object has no attribute 'read'"
then I tried
def lowercase_txt(file):
with open(poem_filename, encoding="utf8") as f:
f = f.casefold()
f.read()
and here '_io.TextIOWrapper' object has no attribute 'casefold'
What can I do?
EDIT: I re-runned this exact code and now there are no errors (dunno why) but the file doesn't change at all, all the letters stay the way they are.
This will rewrite the file. Warning: if there is some type of error in the middle of processing (power failure, you spill coffee on your computer, etc.) you could lose your file. So, you might want to first make a backup of your file:
def lowercase_txt(file_name):
"""
file_name is the full path to the file to be opened
"""
with open(file_name, 'r', encoding = "utf8") as f:
contents = f.read() # read contents of file
contents = contents.lower() # convert to lower case
with open(file_name, 'w', encoding = "utf8") as f: # open for output
f.write(contents)
For example:
lowercase_txt('/mydirectory/test_file.txt')
Update
The following version opens the file for reading and writing. After the file is read, the file position is reset to the start of the file before the contents is rewritten. This might be a safer option.
def lowercase_txt(file_name):
"""
file_name is the full path to the file to be opened
"""
with open(file_name, 'r+', encoding = "utf8") as f:
contents = f.read() # read contents of file
contents = contents.lower() # convert to lower case
f.seek(0, 0) # position back to start of file
f.write(contents)
f.truncate() # in case new encoded content is shorter than older
In my case I am going to write some content to a file in bytearray format and tries to read the content that I have written . But here the problem is if I am not giving the seek function then the file content read is empty. What I understood is by default the reference point is at the beginning of the file which is similar to seek(0). Please help me to understand this problem. I will give you both scenarios as example here
Without seek command
filename = "my_file"
Arr = [0x1, 0x2]
file_handle = open(filename, "wb+")
binary_format = bytearray(Arr)
file_handle.write(binary_format)
#file_handle.seek(0) #Here commenting the seek(0) part
print("file_handle-",file_handle.read())
file_handle.close()
Output in the console
file_handle- b''
With seek command
filename = "my_file"
Arr = [0x1, 0x2]
file_handle = open(filename, "wb+")
binary_format = bytearray(Arr)
file_handle.write(binary_format)
file_handle.seek(0)
print("file_handle-",file_handle.read())
file_handle.close()
Output in the console is
file_handle- b'\x01\x02'
Is the seek(0) is mandatory here even if by default it points to the beginning of file ?
I have about 4000 txt files in a directory. I'd like to replace newlines with spaces in each file using a for loop. Actually, the script works for that purpose but when I save the file, it doesn't get saved or it gets saved with newlines again. Here is my script;
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.write(data)
As I said I'm able to replace the newlines with a space, but at the end, it doesn't get saved. I also don't get any errors.
To further elaborate my comment ("It's almost always a bad idea to open a file in the 'r+' mode (because of the way the current position is handled). Open a file for reading, read the data, replace the newlines, open the same file file for writing, write the data"):
for file in glob.glob(path):
with open(file) as f:
data = f.read().replace('\n', ' ')
with open(file, "w") as f:
f.write(data)
You need to reset file position to 0 with seek and then truncate the leftover with truncate after you finishing writing the replacement string.
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.seek(0)
f.write(data)
f.truncate()
I am having a the following string:
>>> line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
When I type the variable line in the python terminal it showing the following:
>>> line
'\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
When I am printing it, its showing the following:
>>> print line
7 Cardio Metabolic Care 12,788,528.04
In the variable line each word is separated using \t and I wanted to save it to a csv file. So I tried using the following code:
import csv
with open('test.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerow(line.split('\t'))
When I look into the test.csv file, I am getting only the following
,,,,,,
Is there any to get the words into the csv file. Kindly help.
Your input text is not corrupted, it's encoded - as UTF-16 (Big Endian in this case). And it's CSV itself, just with tab as the delimiter.
You must decode it into a string, after that you can use it normally.
Ideally you declare the proper byte encoding when you read it from a source. For example, when you open a file you can state the encoding the file uses so that the file reader will decode the contents for you.
If you have that byte string from a source where you can't declare an encoding while reading it, you can decode manually:
line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
decoded = line.decode('utf_16_be')
print decoded
# 7 Cardio Metabolic Care 12,788,528.04
But since I suppose that you are actually reading it from a file:
import csv
import codecs
with codecs.open('input.txt', 'r', encoding='utf16') as in_file, codecs.open('output.csv', 'w', encoding='utf8') as out_file:
reader = csv.reader(in_file, delimiter='\t')
writer = csv.writer(out_file, delimiter=',', quotechar='"')
writer.writerows(reader)