Writing out text with double double quotes - Python on Linux

Writing out text with double double quotes - Python on Linux - python

I'm trying to take the text output of a query to an SSD (pulling a log page, similar to pulling SMART data. I'm then trying to write this text data out of a log file I update periodically.
My problem happens when the log data for some drives has double double-quotes as a placeholder for a blank field. Here is a snippet of the input:
VER 0x10200
VID 0x15b7
BoardRev 0x0
BootLoadRev ""
When this gets written out (appended) to my own log file, the text gets replaced with several null characters and then when I try to open all the text editors tell me it's corrupted.
The "" characters are replaced by something like this on my Linux system:
BootLoadRev "\00\00\00\00"
Some fields are even longer with the \00 characters. If the "" is not there, things write out OK.
The code is similar to this:
f=open(fileName, 'w')
test_bench.send_command('get_log_page')
identify_data = test_bench.get_data_in()
f.write(identify_data)
f.close()
Is there a way to send this text to a file w/o these nulls causing problems?

Assuming that this is Python 2 (and that your content is thus what Python 3 would call a bytestring), and that your intended data format is raw ASCII, the trivial solution is simply to remove the NULs from your content before you write to disk:
f.write(identify_data.replace('\0', ''))

Related

How can I write characters such as § into a file using Python?

This is my code for creating the string to be written ('result' is the variable that holds the final text):
fileobj = open('file_name.yml','a+')
begin = initial+":0 "
n_name = '"§'+tag+name+'§!"'
begin_d = initial+"_desc:0 "
n_desc = '"§3'+desc+'§!"'
title = ' '+begin + n_name
descript = ' '+begin_d + n_desc
result = title+'\n'+descript
print()
fileobj.close()
return result
This is my code for actually writing it into the file:
text = writing(initial, tag, name, desc)
override = inserter(fileobj, country, text)
fileobj.close()
fileobj = open('file_name.yml','w+')
fileobj.write(override)
fileobj.close()
(P.S: Override is a function which works perfectly. It returns a longer string to be written into the file.)
I have tried this with .txt and .yml files but in both cases, instead of §, this is what takes its place: xA7 (I cannot copy the actual text into the internet as it changes into the correct character. It is, however, appearing as xA7 in the file.) Everything else is unaffected, and the code runs fine.
Do let me know if I can improve the question in any way.

You're running into a problem called character encoding. There are two parts to the problem - first is to get the encoding you want in the file, the second is to get the OS to use the same encoding.
The most flexible and common encoding is UTF-8, because it can handle any Unicode character while remaining backwards compatible with the very old 7-bit ASCII character set. Most Unix-like systems like Linux will handle it automatically.
fileobj = open('file_name.yml','w+',encoding='utf-8')
You can set your PYTHONIOENCODING environment value to make it the default.
Windows operating systems are a little trickier because they'll rarely assume UTF-8, especially if it's a Microsoft program opening the file. There's a magic byte sequence called a BOM that will trigger Microsoft to use UTF-8 if it's at the beginning of a file. Python can add that automatically for you:
fileobj = open('file_name.yml','w+',encoding='utf_8_sig')

how to use python-gitlab to upload file with newline?

I'm trying to use python-gitlab projects.files.create to upload a string content to gitlab.
The string contains '\n' which I'd like it to be the real newline char in the gitlab file, but it'd just write '\n' as a string to the file, so after uploading, the file just contains one line.
I'm not sure how and at what point should I fix this, I'd like the file content to be as if I print the string using print() in python.
Thanks for your help.
EDIT---
Sorry, I'm using python 3.7 and the string is actually a csv content, so it's basically like:
',col1,col2\n1,data1,data2\n'
So when I upload it the gitlab file I want it to be:
,col1,col2
1,data1,data2

I figured out by saving the string to a file and read it again, this way the \n in the string will be translated to the actual newline char.
I'm not sure if there's other of doing this but just for someone that encounters a similar situation.

Wrong character encoding of rtf file

When I copy and paste the sentence How brave they’ll all think me at home! into a blank TextEdit rtf document on the Mac, it looks fine. But if I create an an apparently identical rtf file programatically, and write the same sentence into it, on opening TextEdit it appears as How brave theyâ€™ll all think me at home! In the following code, output is OK, but the file when viewed in TextEdit has problems with the right single quotation mark (here used as an apostrophe), unicode U-2019.
header = r"""{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf400
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs24 \cf0 """
sen = 'How brave they’ll all think me at home!'
with open('staging.rtf', 'w+’) as f:
f.write(header)
f.write(sen)
f.write('}')
with open('staging.rtf') as f:
output = f.read()
print(output)
I’ve discovered from https://www.i18nqa.com/debug/utf8-debug.html that this may be caused by “UTF-8 bytes being interpreted as Windows-1252”, and that makes sense as it seems that ansicpg1252 in the header indicates US Windows.
But I still can’t work out how to fix it, even having read the similar issue here: Encoding of rtf file. I’ve tried replacing ansi with mac without effect. And adding ,encoding='utf8' to the open function doesn’t seem to help either.
(The reason for using rtf by the way is to be able to export sentences with colour-coded words, allow them to be manually edited, then read back in for further processing).

OK, I've found the answer myself. I needed to use , encoding='windows-1252' both when writing to the rtf file and also when reading from it.

Which newline character is in my CSV?

We receive a .tar.gz file from a client every day and I am rewriting our import process using SSIS. One of the first steps in my process is to unzip the .tar.gz file which I achieve via a Python script.
After unzipping we are left with a number of CSV files which I then import into SQL Server. As an aside, I am loading using the CozyRoc DataFlow Task Plus.
Most of my CSV files load without issue but I have five files which fail. By reading the log I can see that the process is reading the Header and First line as though there is no HeaderRow Delimiter (i.e. it is trying to import the column header as ColumnHeader1ColumnValue1
I took one of these CSVs, copied the top 5 rows into Excel, used Text-To-Columns to delimit the data then saved that as a new CSV file.
This version imported successfully.
That makes me think that somehow the original CSV isn't using {CR}{LF} as the row delimiter but I don't know how to check. Any suggestions?

I ended up using the suggestion commented by #vahdet because I already had notepad++ installed. I can't find the same option in EmEditor but it may exist
For those who are curious, the files are using {LF} which is consistent with the other files. My investigation continues...

Seeing that you have EmEditor, you can use EmEditor to find the eol character in two ways:
Use View > Character Code Value... at the end of a line to display a dialog box showing information about the character at the current position.
Go to View > Marks and turn on Newline Characters and CR and LF with Different Marks to show the eol while editing. LF is displayed with a down arrow while CRLF is a right angle.
Some other things you could try checking for are: file encoding, wrong type of data for a field and an inconsistent number of columns.

Parse log file in python

I have a log file that has lines that look like this:
"1","2546857-23541","f_last","user","4:19 P.M.","11/02/2009","START","27","27","3","c2546857-23541",""
Each line in the log as 12 double quote sections and the 7th double quote section in the string comes from where the user typed something into the chat window:
"22","2546857-23541","f_last","john","4:38 P.M.","11/02/2009","
What's up","245","47","1","c2546857-23541",""
This string also shows the issue I'm having; There are areas in the chat log where the text the user typed is on a new line in the log file instead of the same line like the first example.
So basically I want the lines in the second example to look like the first example.
I've tried using Find/Replace in N++ and I am able to find each "orphaned" line but I was unable to make it join the line above it.
Then I thought of making a python file to automate it for me, but I'm kind of stuck about how to actually code it.
Python errors out at this line running unutbu's code
"1760","4746880-00129","bwhiteside","tom","11:47 A.M.","12/10/2009","I do not see ^"refresh your knowledge
^" on the screen","422","0","0","c4746871-00128",""

The csv module is smart enough to recognize when a quoted item is not finished (and thus must contain a newline character).
import csv
with open('data.log',"r") as fin:
with open('data2.log','w') as fout:
reader=csv.reader(fin,delimiter=',', quotechar='"', escapechar='^')
writer=csv.writer(fout, delimiter=',',
doublequote=False, quoting=csv.QUOTE_ALL)
for row in reader:
row[6]=row[6].replace('\n',' ')
writer.writerow(row)

If you data is valid CSV you can use Python's csv.reader class. It should work just fine with your sample data. It may not work correctly depending an what an embeded double-quote looks like from the source system. See: http://docs.python.org/library/csv.html#module-contents.

Unless I'm misunderstanding the problem. You simply need to read in the file and remove any newline characters that occur between double quote characters.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing out text with double double quotes - Python on Linux - python

Assuming that this is Python 2 (and that your content is thus what Python 3 would call a bytestring), and that your intended data format is raw ASCII, the trivial solution is simply to remove the NULs from your content before you write to disk: f.write(identify_data.replace('\0', ''))

Related

How can I write characters such as § into a file using Python?

how to use python-gitlab to upload file with newline?

Wrong character encoding of rtf file

Which newline character is in my CSV?

Parse log file in python

Categories

Resources