Python runs regex on variable but not on file with same content - python

I am writing a python (2.7) script to parse some logs from a java application using regex.
I used http://pythex.org/ to help test the patterns and they work there with a reduced log sample just fine.
Once I do the same on my script it works if I put some of the log in a variable, but wont work if I point it to a file.
Here is the code
import re
regex_sql_java_error = "\[use.(.*?)\]\nThread:.{9}(GENERAL|LOADER).{17}(ERROR(.*?)\n)"
logfile = open('example_files/Log_file.txt', 'r')
data = logfile.read()
logfile.close()
filtered = re.finditer(regex_sql_java_error, data, re.DOTALL | re.MULTILINE)
if filtered:
for item in filtered:
print item.group(0)
The logfile I am using is a measly 1MB file.
I can't imagine the pattern being the issue, but heres a sample of the log file that matched just fine on pythex.org
Thread: 5624 LOADER 08:26:37.078 INFO executeDdlStatements:
[use ADMINI;, SOME BROKEN SQL HERE;]
Thread: 5624 LOADER 08:26:37.086 ERROR 'executeDdlStatements' command failed with the error: Table 'ADMININTT' doesn't exist
RANDOM JAVA STUFF
Link to it on pythex http://goo.gl/mZSx4z
I've been bashing my head on this for a couple days, read a bunch of docs, but I cant figure out what I am doing wrong.
Hopefully its something really stupid Ill be able to laugh about later.
Anyway, if anybody can point me in the right direction I'd really appreciate.

This was dumb and fast, and like I thought, I can laugh about it now..
Logfiles came from windows, substitute \n for \r\n everywhere and be happy!

Related

How to use "dump_pstats" properly to retrieve the sorted data of the "cProfile" into "txt" file?

As the title indicate I have this issue of retrieving those information from dump_stats properly. Without further ado here is my simple code.
Code
import cProfile
import pstats
def fun_to_profile():
... code to be profilled ...
profiler = cProfile.Profile()
profiler.runcall(fun_to)profile)
stats.sort_stats('cumulative')
stats.print_stats()
stats.dump_stats("output.txt")
This is the simple code that I could found, and I really read multiple times the documentation.
Problem
My problem when I open the file "output.txt", even if it's empty or with non comprehended characters. So do I need to specify any extension of the file, or maybe the issue is with my compiler.
Thanks in advance.
Apparently working with cProfile is so easy and straight forwards. I figure the solution for the problem.
First of all we need to know that the more adequate file extension is "file.dat". Then we need to read it and writing down in the desired files format like text.txt.
For that we need the following piece of code :
import cProfile
import pstats
cProfile.run("fun_to_profile", "Out_put_profile.dat") # here we just run and save the output
with open("Profile_time.txt", "w") as f:
p = pstats.Stats("Out_put_profile.dat", stream=f)
p.sort_stats("time").print_stats() # here we sort our analysis by the time-spent
And just like this we will have a more materials for analyzing the code and in human readable format. Thanks for IDG TECHtalk for sharing the solution.
Link to the youtube video: https://youtu.be/dmnA3axZ3FY.

Python subprocess.call with multiline string EOF

I've hit a issue that I don't really understand how to overcome. I'm trying to create a subprocess in python to run another python script. Not too difficult. The issue is I'm unable to get around is EOF error when a python file includes a super long string.
Here's an example of what my files look like.
Subprocess.py:
### call longstr.py from the primary pyfile
subprocess.call(['python longstr.py'], shell = True)
Longstr.py
### called from subprocess.py
### the actual string is a lot longer; this is an example to illustrate how the string is formatted
lngstr = """here is a really long
string (which has many n3w line$ and "characters")
that are causing the python file to state the file is ending early
"""
print lngstr
Printer error in terminal
SyntaxError: EOF while scanning triple-quoted string literal
As a work around, I tried to remove all linebreaks as well as all spaces to see if it was due to it being multi-line. That still returned the same result.
My assumption is that when the subprocess is running and the shell is doing something with the file contents, when the new line is reached the shell itself is freaking out and that's what's terminating the process; not the file.
What is the correct workaround for having subprocess run a file like this?
Thank you for your help.
Answering my own question here; my problem was that I didn't file.close() before trying to execute a subprocess.call.
If you encounter this problem, and are working with recently written files this could be your issue too. Thank you to everyone who read or responded to this thread.

Python not inserting an EOF character after file close?

I'm having a strange problem where some Python code that prints to a file is not inserting an EOF character. Basically, the Python script generates runscripts to later be submitted as jobs on a cluster. I essentially wrote the entire runscript between """'s, allowing for variables to be plugged in (to vary some parameters in my simulation). I write the runscripts using the
with open(file_name, 'w') as runscrpt:
runscrpt.write("""ENTIRE_FILE_CONTENTS_HERE""")
syntax. I can give the actual code if necessary but it's not much more than above. Despite the script running fine and generating all of my runsripts, whenever I submitted them nothing happened. It took me a long time to figure out why, but it's because they're missing an EOF character. I can fix it by, for example, opening one, adding some trailing whitespace or a blank line somewhere in vim, and resaving the file.
Why isn't Python inserting the EOF character, and is there a better way to fix this than manually making trivial edits to all the files with vim?
Sounds like you mean there is no EOL (not EOF!) at the end, because that's what diff will typically tell you. Just add a newline at the end of the write (make sure there is a newline before the final """ terminator, or write a separate newline explicitly).
with open(file_name, 'w') as runscript:
runscript.write("""ENTIRE_FILE_CONTENTS_HERE\n""")
(As a bonus, I added the missing vowel.)

Why do I get a SyntaxError <unicode error> on my import statement? Should this not be simple?

Here's my first simple test program in Python. Without importing the os library the program runs fine... Leading me to believe there's something wrong with my import statement, however this is the only way i ever see them written. Why am I still getting a syntax error?
import os # <-- why does this line give me a syntax error?!?!?! <unicode error> -->
CalibrationData = r'C:/Users/user/Desktop/blah Redesign/Data/attempts at data gathering/CalibrationData.txt'
File = open(CalibrationData, 'w')
File.write('Test')
File.close()
My end goal is to write a simple program that will look through a directory and tabularize data from relevant .ini files within it.
Well, as MDurant pointed out... I pasted in some unprintable character - probably when i entered the URL.

Problem with exiting a Word doc using Python

This is my first time using this so be kind :) basically my question is I am making a program that opens many Microsoft Word 2007 docs and reads from a certain table in that document and writes that info to an excel file there is well in excess of 1000 word docs. I have all of this working but the only problem when I run my code it does not close MSword after opening each doc I have to manually do this at the end of the program run by opening word and selecting exit word option from the Home menu. Another problem is also if a run this program consecutively on the second run everything goes to hell it prints the same thing repeatedly no matter which doc is selected I think this may have to do with how MSword is deciding which doc is active e.g. is it still opening the last active document that was not closed from the last run. Anyways here is my code for the opening and closing part I wont bore you guys with the rest::
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0
# Open a specific file
#myWordDoc = tkFileDialog.askopenfilename()
MSWord.Documents.Open("C:\\Documents and Settings\\fdosier" + chosen_doc)
#Get the textual content
docText = MSWord.Documents[0].Content
charText = MSWord.Documents[0].Characters
# Get a list of tables
ListTables = MSWord.Documents[0].Tables
------Main Code---------
MSWord.Documents.Close
MSWord.Documents.Quit
del MSWord
Basically, Python is not VBA, so this:
MSWord.Documents.Close
is equivalent to:
getattr(MSWord.Documents, "Close")
i.e. you just get some method object and do nothing with it. You need to call the method with the call operator (the parentheses :) :
MSWord.Documents.Close()
Accordingly for .Quit.
Before your MSWord.Quit did you try using:
MSWord.ActiveWindow.Close
Or even more simpley just doing
MSWord.Quit
I dont really understand if you are trying to close a document or the application.
I think you need a MSWord.Quit at the end (before and/or instead of the the del)

Categories