How Can I Remove Skipped Lines from Pastebin Output?

How Can I Remove Skipped Lines from Pastebin Output? - python

I am trying to use Pastebin to host two text files for me to allow any copy of my script to update itself through the internet. My code is working, but the resultant .py file has a blank line added between each line. Here is my script...
import os, inspect, urllib2
runningVersion = "1.00.0v"
versionUrl = "http://pastebin.com/raw.php?i=3JqJtUiX"
codeUrl = "http://pastebin.com/raw.php?i=GWqAQ0Xj"
scriptFilePath = (os.path.abspath(inspect.getfile(inspect.currentframe()))).replace("\\", "/")
def checkUpdate(silent=1):
# silently attempt to update the script file by default, post messages if silent==0
# never update if "No_Update.txt" exists in the same folder
if os.path.exists(os.path.dirname(scriptFilePath)+"/No_Update.txt"):
return
try:
versionData = urllib2.urlopen(versionUrl)
except urllib2.URLError:
if silent==0:
print "Connection failed"
return
currentVersion = versionData.read()
if runningVersion!=currentVersion:
if silent==0:
print "There has been an update.\nWould you like to download it?"
try:
codeData = urllib2.urlopen(codeUrl)
except urllib2.URLError:
if silent==0:
print "Connection failed"
return
currentCode = codeData.read()
with open(scriptFilePath.replace(".py","_UPDATED.py"), mode="w") as scriptFile:
scriptFile.write(currentCode)
if silent==0:
print "Your program has been updated.\nChanges will take effect after you restart"
elif silent==0:
print "Your program is up to date"
checkUpdate()
I stripped the GUI (wxpython) and set the script to update another file instead of the actual running one. The "No_Update" bit is for convenience while working.
I noticed that opening the resultant file with Notepad does not show the skipped lines, opening with Wordpad gives a jumbled mess, and opening with Idle shows the skipped lines. Based on that, this seems to be a formatting problem even though the "raw" Pastebin file does not appear to have any formatting.
EDIT: I could just strip all blank lines or leave it as is without any problems, (that I've noticed) but that would greatly reduce readability.

Try adding the binary qualifier in your open():
with open(scriptFilePath.replace(".py","_UPDATED.py"), mode="wb") as scriptFile:
I notice that your file on pastebin is in DOS format, so it has \r\n in it. When you call scriptFile.write(), it translates \r\n to \r\r\n, which is terribly confusing.
Specifying "b" in the open() will cause scriptfile to skip that translate and write the file is DOS format.
In the alternative, you could ensure that the pastebin file has only \n in it, and use mode="w" in your script.

Related

Why does '\x01\x1A' (Start-of-Header and Substitute control characters) in a textfile line stop a for-loop prematurely?

I'm using Python 2.7.15, Windows 7
Context
I wrote a script to read and tokenize each line of a FileZilla log file (specifications here) for the IP address of the host that initiated the connection to the FileZilla server. I'm having trouble parsing the log text field that follows the > character. The script I wrote uses the:
with open('fz.log','r') as rh:
for lineno, line in rh:
pass
construct to read each line. That for-loop stopped prematurely when it encountered a log text field that contained the SOH and SUB characters. I can't show you the log file since it contains sensitive information but the crux of the problem can be reproduced by reading a textfile that contains those characters on a line.
My goal is to extract the IP addresses (which I can do using re.search()) but before that happens, I have to remove those control characters. I do this by creating a copy of the log file where the lines containing those control characters are removed. There's probably a better way, but I'm more curious why the for-loop just stops after encountering the control characters.
Reproducing the Issue
I reproduced the problem with this code:
if __name__ == '__main__':
fn = 'writetest.txt'
fn2 = 'writetest_NoControlChars.txt'
# Create the problematic textfile
with open(fn, 'w') as wh:
wh.write("This line comes first!\n");
wh.write("Blah\x01\x1A\n"); # Write Start-of-Header and Subsitute unicode character to line
wh.write("This comes after!")
# Try to read the file above, removing the SOH/SUB characters if encountered
with open(fn, 'r') as rh:
with open(fn2, 'w') as wh:
for lineno, line in enumerate(rh):
sline = line.translate(None,'\x01\x1A')
wh.write(sline)
print "Line #{}: {}".format(lineno, sline)
print "Program executed."
Output
The code above creates 2 output files and produces the following in a console window:
Line #0: This line comes first!
Line #1: Blah
Program executed.
I step-debugged through the code in Eclipse and immediately after executing the
for lineno, line in enumerate(rh):
statement, rh, the handle for that opened file was closed. I had expected it to move onto the third line, printing out This comes after! to console and writing it out to writetest_NoControlChars.txt but neither events happened. Instead, execution jumped to print "Program executed".
Picture of Local Variable values in Debug Console

You have to open this file in binary mode if you know it contains non-text data: open(fn, 'rb')

Tailing a log file

I want to add a log viewer tab to my website. The tab is supposed to print the whole log file, and after that print new lines (such as tail -F command in Linux) only. The client Side is in HTML and Javascript, and the server side is in Python.
Here is my tail Python function (I found it in the web):
#cherrypy.expose
def tail(self):
filename = '/opt/abc/logs/myLogFile.log'
f = subprocess.Popen(['tail','-F',filename],\
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)
while True:
if p.poll(1):
print f.stdout.readline()
time.sleep(1)
This code is indeed printing the whole log file. However, each time I add new lines to the file, the file has been printed from the beginning, instead of printing the new lines.
Any suggestions how to fix it? I'm pretty new in Python, so I would appreciate any kind of help.

Check out pytailer
https://github.com/six8/pytailer
Specifically the follow command:
# Follow the file as it grows
for line in tailer.follow(open('/opt/abc/logs/myLogFile.log')):
print line

intermittent Bad File Descriptor error

I have a script to read messages on a mail server and save them in specific folders based on the content of the message bodies. Intermittently, usually about once or twice a day, it fails while executing this part of the code:
if not os.path.isfile(att_path) :
# finally write the stuff
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
ext = att_path.split(".")[-1]
print "att_path",att_path
f = open(att_path.replace("."+ext,".txt"),'wb')
f.write(headers)
f.write("\n\n\n")
f.write(body)
f.close()
filelist.append(vdir+"/"+filename)
messageReceived = True
else:
noErrors = False
errFiles.append(vdir+"/"+filename)
It saves the actual attachment in the expected directory, but not the subsequent text file with the headers and body information. Because an exception is thrown ("[Errno 9] Bad file descriptor"), the email is not marked for deletion and stays on the server until the saved attachment is either deleted or moved, at which point both files will be saved without any errors.
I'm stumped at what could be causing it, since it processes several hundred emails every day without any problems, except for this intermittent issue.

I encountered intermittent bad descriptor error in a script run with pywin32 (running python as Windows service). A near identical script (sans the pywin32 boilerplate) runs without issues in the cmd. The module traceback also points to various print statements, thus I commented out all the print statements, and it works!
Please correct me if I'm wrong, I suspect this is something to do with the lack of stdout. I used to use print statements to debug but switched to the logging module after this.

Local Blast empty xml file python

I am trying to implement a little script in order to automatize a local blast alignment.
I had ran commands in the terminal en it works perfectly. However when I try to automatize this, I have a message like : Empty XML file.
Do we have to implement a "system" waiting time to let the file be written, or I did something wrong?
The code :
#sequence identifier as key, sequence as value.
for element in dictionnaryOfSequence:
#I make a little temporary fasta file because the blast command need a fasta file as input.
out_fasta = open("tmp.fasta", 'w')
query = ">" + element + "\n" + str(dictionnary[element])
out_fasta.write(query) # And I have this file with my sequence correctly filled
OUT_FASTA.CLOSE() # EDIT : It was out of my loop....
#Now the blast command, which works well in the terminal, I have my tmp.xml file well filled.
os.system("blastn -db reads.fasta -query tmp.fasta -out tmp.xml -outfmt 5 -max_target_seqs 5000")
#Parsing of the xml file.
handle = open("tmp.xml", 'r')
blast_records = NCBIXML.read(handle)
print blast_records
I have an Error : Your XML file was empty, and the blast_records object doesn't exist.
Did I make something wrong with handles?
I take all advice. Thank you a lot for your ideas and help.
EDIT : Problem solved, sorry for the useless question. I did wrong with handle and I did not open the file in the right location. Same thing with the closing.
Sorry.

try to open the file "tmp.xml" in Internet explorer. All tags are closed?

Python not splitting CRLF correctly

I'm writing a script to convert very simple function documentation to XML in python. The format I'm using would convert:
date_time_of(date) Returns the time part of the indicated date-time value, setting the date part to 0.
to:
<item name="date_time_of">
<arg>(date)</arg>
<help> Returns the time part of the indicated date-time value, setting the date part to 0.</help>
</item>
So far it works great (the XML I posted above was generated from the program) but the problem is that it should be working with several lines of documentation pasted, but it only works for the first line pasted into the application. I checked the pasted documentation in Notepad++ and the lines did indeed have CRLF at the end, so what is my problem?
Here is my code:
mainText = input("Enter your text to convert:\r\n")
try:
for line in mainText.split('\r\n'):
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
Any idea of what the issue is here?
Thanks.

input() only reads one line.
Try this. Enter a blank line to stop collecting lines.
lines = []
while True:
line = input('line: ')
if line:
lines.append(line)
else:
break
print(lines)

The best way to handle reading lines from standard input (the console) is to iterate over the sys.stdin object. Rewritten to do this, your code would look something like this:
from sys import stdin
try:
for line in stdin:
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
That said, It's worth noting that your parsing code could be significantly simplified with a little help from regular expressions. Here's an example:
import re, sys
for line in sys.stdin:
result = re.match(r"(.*?)\((.*?)\)(.*)", line)
if result:
name = result.group(1)
arg = result.group(2).split(",")
hlp = result.group(3)
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
else:
print "There was an error parsing this line: '%s'" % line
I hope this helps you simplify your code.

Patrick Moriarty,
It seems to me that you didn't particularly mention the console and that your main concern is to pass several lines together at one time to be treated. There's only one manner in which I could reproduce your problem: it is, executing the program in IDLE, to copy manually several lines from a file and pasting them to raw_input()
Trying to understand your problem led me to the following facts:
when data is copied from a file and pasted to raw_input() , the newlines \r\n are transformed into \n , so the string returned by raw_input() has no more \r\n . Hence no split('\r\n') is possible on this string
pasting in a Notepad++ window a data containing isolated \r and \n characters, and activating display of the special characters, it appears CR LF symbols at all the extremities of the lines, even at the places where there are \r and \n alone. Hence, using Notepad++ to verify the nature of the newlines leads to erroneous conclusion
.
The first fact is the cause of your problem. I ignore the prior reason of this transformation affecting data copied from a file and passed to raw_input() , that's why I posted a question on stackoverflow:
Strange vanishing of CR in strings coming from a copy of a file's content passed to raw_input()
The second fact is responsible of your confusion and despair. Not a chance....
.
So, what to do to solve your problem ?
Here's a code that reproduce this problem. Note the modified algorithm in it, replacing your repeated splits applied to each line.
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
print "open 'funcdoc.txt' to manually copy its content, and paste it on the following line"
mainText = raw_input("Enter your text to convert:\n")
print "OK, copy-paste of file 'funcdoc.txt' ' s content has been performed"
print "\nrepr(mainText)==",repr(mainText)
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
Here's the solution mentioned by delnan : « read from the source instead of having a human copy and paste it. »
It works with your split('\r\n') :
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
#####################################
with open('funcdoc.txt','rb') as f:
mainText = f.read()
print "\nfile 'funcdoc.txt' has just been opened and its content copied and put to mainText"
print "\nrepr(mainText)==",repr(mainText)
print
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
And finally, here's the solution of Python to process the altered human copy: providing the splitlines() function that treat all kind of newlines (\r or \n or \r\n) as splitters. So replace
for line in mainText.split('\r\n'):
by
for line in mainText.splitlines():

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How Can I Remove Skipped Lines from Pastebin Output? - python

Related

Why does '\x01\x1A' (Start-of-Header and Substitute control characters) in a textfile line stop a for-loop prematurely?

Tailing a log file

intermittent Bad File Descriptor error

Local Blast empty xml file python

Python not splitting CRLF correctly

Categories

Resources