python question regarding re.search and indexing - python

Im new to python and coding and im trying to understand how re.search and indexing works.
below is what I have so far and I want physical_state to equal what comes after the Completion: (in this case it is success) but I dont really understand how the match.group(1) and re.search works.
import ops # Import the OPS module.
import sys # Import the sys module.
import re
# Subscription processing function
def ops_condition (o):
status, err_str = o.timer.relative("tag",10)
return status
def ops_execute (o):
handle, err_desp = o.cli.open()
print("OPS opens the process of command:",err_desp)
result, n11, n21 = o.cli.execute(handle,"return")
result, n11, n21 = o.cli.execute(handle,"dis nqa results test-instance sla 1 | i Completion:")
match = re.search(r"Completion:", result)
if not match:
print("Could not determine the state.")
return 0 # Look into what the return values mean.
physical_state = match.group(1) # Gets the first group from the match.
print (physical_state)
result = o.cli.close(handle)
return 0
output of result, n11, n21 = o.cli.execute(handle,"dis nqa results test-instance sla 1 | i Completion:")
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
Completion:success RTD OverThresholds number: 0
error when run
<setup>('OPS opens the process of command:', 'success')
Oct 18 2018 06:12:57+00:00 setup %%01OPSA/3/OPS_RESULT_EXCEPTION(l)[410]:Script is test3.py, current event is tag, instance is 1381216156, exception reason is Traceback (most recent call last):
File ".lib/frame.py", line 114, in <module>
ret = m.ops_execute(o)
File "flash:$_user/test3.py", line 21, in ops_execute
physical_state = match.group(1) # Gets the first group from the match.
IndexError: no such group
Thanks in advance

Related

Python scripting with ete3 to query NCBI's Taxonomy: "sqlite3 Warning (can only execute one statement at a time)"

I am using this script:
import csv
import time
import sys
from ete3 import NCBITaxa
ncbi = NCBITaxa()
def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
names = ncbi.get_taxid_translator(lineage)
lineage2ranks = ncbi.get_rank(names)
ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}
if __name__ == '__main__':
file = open(sys.argv[1], "r")
taxids = []
contigs = []
for line in file:
line = line.split("\n")[0]
taxids.append(line.split(",")[0])
contigs.append(line.split(",")[1])
desired_ranks = ['superkingdom', 'phylum']
results = list()
for taxid in taxids:
results.append(list())
results[-1].append(str(taxid))
ranks = get_desired_ranks(taxid, desired_ranks)
for key, rank in ranks.items():
if rank != '<not present>':
results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
else:
results[-1].append(rank)
i = 0
for result in results:
print(contigs[i] + ','),
print(','.join(result))
i += 1
file.close()
The script takes taxids from a file and fetches their respective lineages from a local copy of NCBI's Taxonomy database. Strangely, this script works fine when I run it on small sets of taxids (~70, ~100), but most of my datasets are upwards of 280k taxids and these break the script.
I get this complete error:
Traceback (most recent call last):
File "/data1/lstout/blast/scripts/getLineageByETE3.py", line 31, in <module>
ranks = get_desired_ranks(taxid, desired_ranks)
File "/data1/lstout/blast/scripts/getLineageByETE3.py", line 11, in get_desired_ranks
lineage = ncbi.get_lineage(taxid)
File "/data1/lstout/.local/lib/python2.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 227, in get_lineage
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %taxid)
sqlite3.Warning: You can only execute one statement at a time.
The first two files from the traceback are simply the script I referenced above, the third file is one of ete3's. And as I stated, the script works fine with small datasets.
What I have tried:
Importing the time module and sleeping for a few milliseconds/hundredths of a second before/after my offending lines of code on lines 11 and 31. No effect.
Went to line 227 in ete3's code...
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %merged_conversion[taxid])
and changed the "execute" function to "executescript" in order to be able to handle multiple queries at once (as that seems to be the problem). This produced a new error and led to a rabbit hole of me changing minor things in their script trying to fudge this to work. No result. This is the complete offending function:
def get_lineage(self, taxid):
"""Given a valid taxid number, return its corresponding lineage track as a
hierarchically sorted list of parent taxids.
"""
if not taxid:
return None
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %taxid)
raw_track = result.fetchone()
if not raw_track:
#perhaps is an obsolete taxid
_, merged_conversion = self._translate_merged([taxid])
if taxid in merged_conversion:
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %merged_conversion[taxid])
raw_track = result.fetchone()
# if not raise error
if not raw_track:
#raw_track = ["1"]
raise ValueError("%s taxid not found" %taxid)
else:
warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid]))
track = list(map(int, raw_track[0].split(",")))
return list(reversed(track))
What bothers me so much is that this works on small amounts of data! I'm running these scripts from my school's high performance computer and have tried running on their head node and in an interactive moab scheduler. Nothing has helped.

Python TypeError: '_sre.SRE_Match' object has no attribute '__getitem__'

Im brand new to python and coding, im trying to get below working.
this is test code and if I can get this working I should be able to build on it.
Like I said im new to this so sorry if its a silly mistake.
# coding=utf-8
import ops # Import the OPS module.
import sys # Import the sys module.
import re
# Subscription processing function
def ops_condition (o):
enter code herestatus, err_str = o.timer.relative("tag",10)
return status
def ops_execute (o):
handle, err_desp = o.cli.open()
print("OPS opens the process of command:",err_desp)
result, n11, n21 = o.cli.execute(handle,"return")
result, n11, n21 = o.cli.execute(handle,"display interface brief | include Ethernet0/0/1")
match = re.search(r"Ethernet0/0/1\s*(\S+)\s*", result)
if not match:
print("Could not determine the state.")
return 0
physical_state = match[1] # Gets the first group from the match.
print (physical_state)
if physical_state == "down":
print("down")
result = o.cli.close(handle)
else :
print("up")
return 0
Error
<setup>('OPS opens the process of command:', 'success')
Oct 17 2018 11:53:39+00:00 setup %%01OPSA/3/OPS_RESULT_EXCEPTION(l)[4]:Script is test.py, current event is tag, instance is 1515334652, exception reason is Trac eback (most recent call last):
File ".lib/frame.py", line 114, in <module>
ret = m.ops_execute(o)
File "flash:$_user/test.py", line 22, in ops_execute
physical_state = match[1] # Gets the first group from the match.
TypeError: '_sre.SRE_Match' object has no attribute '__getitem__'
The __getitem__ method for the regex match objects was only added since Python 3.6. If you're using an earlier version, you can use the group method instead.
Change:
physical_state = match[1]
to:
physical_state = match.group(1)
Please refer to the documentation for details.

How to parse multiple line catalina log in python - regex

I have catalina log:
oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated
WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
at ais.api.rest.rdss.Resource.lookAT(Resource.java:22)
at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
I try to parse it in python. My problem is that I dont know how many lines there are in log. Minimum are 2 lines. I try read from file and when first line start with j,m,s,o etc. it mean it is first line of log, because this are first letters of months. But I dont know how to continue. When I stop read the lines ? When next line will starts with one of these letters ? But how I do that?
import datetime
import re
SPACE = r'\s'
TIME = r'(?P<time>.*?M)'
PATH = r'(?P<path>.*?\S)'
METHOD = r'(?P<method>.*?\S)'
REQUEST = r'(?P<request>.*)'
TYPE = r'(?P<type>.*?\:)'
REGEX = TIME+SPACE+PATH+SPACE+METHOD+SPACE+TYPE+SPACE+REQUEST
def parser(log_line):
match = re.search(REGEX,log_line)
return ( (match.group('time'),
match.group('path'),
match.group('method'),
match.group('type'),
match.group('request')
)
)
db = MySQLdb.connect(host="localhost", user="myuser", passwd="mypsswd", db="Database")
with db:
cursor = db.cursor()
with open("Mylog.log","rw") as f:
for line in f:
if (line.startswith('j')) or (line.startswith('f')) or (line.startswith('m')) or (line.startswith('a')) or (line.startswith('s')) or (line.startswith('o')) or (line.startswith('n')) or (line.startswith('d')) :
logLine = line
result = parser(logLine)
sql = ("INSERT INTO ..... ")
data = (result[0])
cursor.execute(sql, data)
f.close()
db.close()
Best idea I have is read just two lines at a time. But that means discard all another data. There must be better way.
I want read lines like this:
1.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
2.line - oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl java:43)
3.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
So I want start read when line starts with datetime (this is no problem). Problem is that I want stop read when next line starts with datetime.
This may be what you want.
I read lines from the log inside a generator so that I can determine whether they are datetime lines or other lines. Also, importantly, I can flag that end-of-file has been reached in the log file.
In the main loop of the program I start accumulating lines in a list when I get a datetime line. The first time I see a datetime line I print it out if it's not empty. Since the program will have accumulated a complete line when end-of-file occurs I arrange to print the accumulated line at that point too.
import re
a_date, other, EOF = 0,1,2
def One_line():
with open('caroline.txt') as caroline:
for line in caroline:
line = line.strip()
m = re.match(r'[a-z]{3}\s+[0-9]{1,2},\s+[0-9]{4}\s+[0-9]{1,2}:[0-9]{2}:[0-9]{2}\s+[AP]M', line, re.I)
if m:
yield a_date, line
else:
yield other, line
yield EOF, ''
complete_line = []
for kind, content in One_line():
if kind in [a_date, EOF]:
if complete_line:
print (' '.join(complete_line ))
complete_line = [content]
else:
complete_line.append(content)
Output:
oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

(python)(2.7.7) odd Syntax errors in code

from PIL import Image
import input_class
import input_function
import input_crop_values
import input_translate
class ImageChanger():
def __init__ ():
image_filename = input_class.input_file_name()
self.my_image = Image.open(image_filename)
chosen_Function = input_function.input_vaild_function()
if chosen_Function == "Crop" or chosen_Function == "crop":
crop_values = input_crop_values.input_crop_data()
my_image.crop(crop_values)
elif chosen_Function == "Translate" or chosen_function == "translate":
translate_values = input_translate.input_translate_data()
my_image.crop(translate_values)
else:
print("unexpected error while running code")
def printState():
print( "Your image file name is %s" % self.my_filename );
return (self.my_filename);
def translate(self,x_cord,y_cord):
return (self.my_image.offset(x_cord,y_cord));
def crop(self, box):
return (self.my_image.crop(box))
sorry for odd formatting,
Error:
Traceback (most recent call last):
File "C:\Users\Alexander\Desktop\Final_Udacity_project\dragon_picture_test.py", line 1, in <module>
import picture_changer
File "C:\Users\Alexander\Desktop\Final_Udacity_project\picture_changer.py", line 3, in <module>
import input_function
File "C:\Users\Alexander\Desktop\Final_Udacity_project\input_function.py", line 14
else
^
SyntaxError: invalid syntax
I have some other functions, not listed that use the input command sort of like a user interface, then it returns it to the larger class.
The places that the values are returned and what is return is:
line 8*: image_filename = input_class.input_file_name() # you would give a name of a picture here,
line 10*: chosen_function = input_function.input_vaild_function() # you Would give it either crop or translate can be sentence case or lowercase
line 11*: crop_values = input_crop_values.input_crop_data() # you would give it data in brackets like [1,2,3,4] and it would crop it.
line 15*: translate_values = input_translate.input_translate_data() # you would give it information in ( ), like (1,2,3,4) and it would translate it.
The code is acting really weird with the Boolean and the error message isn't helping me that much.
else, like other statements that come before a block, requires a colon.
else:

.strip won't remove last quotation mark in Python3 Program

EDIT: UPDATE, so I ran this with #Alex Thornton's suggestion.
This is my output:
'100.00"\r'
Traceback (most recent call last):
File "budget.py", line 48, in <module>
Main()
File "budget.py", line 44, in Main
budget = readBudget("budget.txt")
File "budget.py", line 21, in readBudget
p_value = float(maxamount)
ValueError: invalid literal for float(): 100.00"
Under Windows though, I just get the list of numbers, with the qutations and the \r's stripped off.
Now, I don't know too much about the way Windows and Linux handle text files, but isn't it due to the way Windows and Linux handle the return/enter key?
So I have this code:
def readBudget(budgetFile):
# Read the file into list lines
f = open(budgetFile)
lines = f.readlines()
f.close()
budget = []
# Parse the lines
for i in range(len(lines)):
list = lines[i].split(",")
exptype = list[0].strip('" \n')
if exptype == "Type":
continue
maxamount = list[1].strip('$" \n')
entry = {'exptype':exptype, 'maxamnt':float(maxamount)}
budget.append(entry)
#print(budget)
return budget
def printBudget(budget):
print()
print("================= BUDGET ==================")
print("Type".ljust(12), "Max Amount".ljust(12))
total = 0
for b in budget:
print(b['exptype'].ljust(12), str("$%0.2f" %b['maxamnt']).ljust(50))
total = total + b['maxamnt']
print("Total: ", "$%0.2f" % total)
def Main():
budget = readBudget("budget.txt")
printBudget(budget)
if __name__ == '__main__':
Main()
Which reads from this file:
"Type", "MaxAmount"
"SCHOOL","$100.00"
"UTILITIES","$200.00"
"AUTO", "$100.00"
"RENT", "$600.00"
"MEALS", "$300.00"
"RECREATION", "$100.00"
It is supposed to extract the budget type (school, utilities, etc) and the max amount. The max amount is supposed to be converted to a float. However, when I run the program, I get this error.
Traceback (most recent call last):
File "budget.py", line 47, in <module>
Main()
File "budget.py", line 43, in Main
budget = readBudget("budget.txt")
File "budget.py", line 22, in readBudget
entry = {'exptype':exptype, 'maxamnt':float(maxamount)}
ValueError: invalid literal for float(): 100.00"
Shouldn't the strip function in readBudget remove the last quotation mark?
When I tried this:
>>> attempt = '"$100.00"'
>>> new = attempt.strip('$" \n')
'100.00'
>>> float(new)
100.00
I got exactly what one would expect- so it must be something to do with what we cannot see from the file. From what you've posted, it's not clear whether there is something subtly wrong with the string you're trying to pass to float() (because it looks perfectly reasonable). Try adding a debug print statement:
print(repr(maxamount))
p_value = float(maxamount)
Then you can determine exactly what is being passed to float(). The call to repr() will make even normally invisible characters visible. Add the result to your question and we will be able to comment further.
EDIT:
In which case, replace:
maxamount = list[1].strip('$" \n')
With:
maxamount = list[1].strip('$" \n\r')
That should then work fine.
Adding in this:
maxamount = list[1].strip('$" \n\r')
Or more specifically, the \r, removed the error.
You can use a regex to capture either all or most of the floating point info in a string.
Consider:
import re
valid='''\
123.45"
123.
123"
.123
123e-16
-123e16
123e45
+123.45'''
invalid='''\
12"34
12f45
e123'''
pat=r'(?:^|\s)([-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)'
for e in [valid, invalid]:
print
for line in e.splitlines():
m=re.search(pat, line)
if m:
print '"{}" -> {} -> {}'.format(line, m.group(1), float(m.group(1)))
else:
print '"{}" not valid'.format(line)
Prints:
"123.45"" -> 123.45 -> 123.45
"123." -> 123 -> 123.0
"123"" -> 123 -> 123.0
".123" -> .123 -> 0.123
"123e-16" -> 123e-16 -> 1.23e-14
"-123e16" -> -123e16 -> -1.23e+18
"123e45" -> 123e45 -> 1.23e+47
"+123.45" -> +123.45 -> 123.45
"12"34" -> 12 -> 12.0
"12f45" -> 12 -> 12.0
"e123" not valid
Just modify the regex to capture what you consider a valid floating point data point -- or invalid.

Categories