I have a below data frame Df1 with Columns 'Summary' and 'Closing group'
Summary Closing Group
XX012 job abended with error Automation
XX015 job abended with error Automation
Front End issue TSL error Automation
XX015 job abended with error Automation
Front End issue TSL error Automation
Front End issue TSL error Automation
File not present error Automation
I have another data frame Df2 below with column 'Label'
Label
TSL error
job abended
File not present
I want to map each Label against Summary column if exact string from Label exist in Summary.
I have written the below script to handle my condition using for loop:
list_label= Df2['Label']
def is_phrase_in(phrase, text):
return re.search(r"\b{}\b".format(phrase), text, re.IGNORECASE) is not None
for idx2,row2 in Df1.iterrows():
for label in list_label:
print(label)
if is_phrase_in(label, row2['Summary']):
Df1.at[idx2,'Label'] =label
break
The above code gave me the expected results but its taking much time when run on 7000 label list and 20000 Summary.
To optimize this I have used Lambda function as below :
Df1['Label'] = Df1['Summary'].apply(lambda x : next((l for l in list_label['Label'] if is_phrase_in(l,x)), 'No Label Found'))
But this script take more time , even more than if for loop.
Can anyone tell me if I am doing anything wrong here or is there any other way to optimize this code.
My expected output:
Summary Closing Group Label
XX012 job abended with error Automation job abended
XX015 job abended with error Automation job abended
Front End issue TSL error Automation TSL error
Server down error Server No Label found
XX015 job abended with error Automation job abended
Front End issue TSL error Automation TSL error
Front End issue TSL error Automation TSL error
File not present error Automation File not present
It must be clear that most of the processing in the above code would be spent on the regular expression search (re.search).
Can you try for the alternative 'Python String find() Method' .i.e. str.find(str, beg=0, end=len(string)) with your data.
if text.find(phrase) == -1:
return 'No Label Found'
else:
return phrase
The replacing of comparison of strings using "in" instead of regex makes the code a little bit faster. But however from the example that you have provided, it seems that the summaries are repeating("XX015 job abended with error" occurred two times and "Front End issue TSL error " occurred three times). Maybe you can take the set of unique summaries and labels and do your string operations, store them somewhere else as dictionaries and then do the final mapping. I guess this makes it a lot faster compared to directly computing the function each time you see the strings.
Related
The answer I'm looking for is a reference to documentation. I'm still debugging, but can't find a reference for the error code.
The full error comment is:
snowflake.connector.errors.OperationalError: 255005: Failed to read next arrow batch: b'Array length did not match record batch length'
Some background, if it helps:
The error is in response to the call to fetchall as shown here (python):
cs.execute (f'SELECT {specific_column} FROM {table};')
all_starts = cs.fetchall()
The code context is: when running from cron (i.e., timed job), successful connect, then for a list of tables, the third time through, there's an error. (I.e., two tables are "successful".) When the same script is run at other times (via command line, not cron), there's no error (all tables are "successful").
I have a script that detects the language of incomming text (which are in bulk) with the help of langdetect module.I also have set an email alert script for error so when I get an error a mail will be sent to me. my problem is that whenever langdetect is not able to recognise a language (that happens a lot of time as I get many random texts from internet) it throws an exception "No Features in Text". Due to this my daily email sending capacity gets exhausted. What I want is for it to check if the error is for no features in text then it will skip the sending email part else it will send email.
How can I do this?
I tried using if case:
if LangDetectException.code == 'no features in text':
pass
else:
sendmail()
Thank you
I solved it usingget_code method as follows
if error.get_code() == 5:
pass
else:
sendmail()
5 is for no features in text.
Thank you
So im practicing some RegEx in python and essentially I want to look through a log of transaction numbers and see if any of them are returning an error such as Error in phone Activation.
I was successful in searching in a dictionary for something that starts with Error and then ends with Activation, so that if it was tablet, watch, etc , it would still find the error. However, as a bulk text file, it will not successfully find the pattern.
So the code I used to find it in a dictionary was such that the dictionary key was a transaction number and the error (or lack thereof) was the value:
for i in Transaction_Log:
if bool(re.search("^Error.* Activation$", Transaction_Log[i])):
print("Found requested error in transaction number " + i)
error_count += 1
This works, however using the same search function cant find anything when in a text file setup like this:
Transnum: 20190510001 error: Error in phone Activation,
Transnum: 20190510002 error: none,
Transnum: 20190510003 error: Error in tablet Activation,
Ideally, it can find the type of errors, and when successful I can make a counter to see how many there are, however my boolean statement is not True when searching this way through a text file.
Searching for just the word Error does work though.
With the help of #CAustin, I figured out that I was searching for the wrong pattern due to the line not starting with error and the ending of the line also having a comma at the end. By removing both anchors, I was able to find what I needed to find in this example, so for anyone else looking for something similar it was this...
for line in testingDoc:
if bool(re.search("Error.* Activation", line)):
print("found error in transaction")
I hooked up the Keithley 2701 DMM, installed the software and set the IPs right. I can access and control the instrument via the internet explorer webpage and the Keithley communicator. When I try to use python, it detects the instrument
i.e. a=visa.instrument("COM1") doesn't give an error.
I can write to the instrument as well:
a.write("*RST")
a.write("DISP:ENAB ON/OFF")
a.write("DISP:TEXT:STAT ON/OFF")
etc all don't give any error but no change is seen on the instrument screen.
However when I try to read back, a.ask("*IDN?") etc give me an error
saying timeout expired before operation completed.
I tried redefining as:
a=visa.instrument("COM1",timeout=None)
a=visa.instrument("TCPIP::<the IP adress>::1354::SOCKET")
and a few other possible combinations but I'm getting the same error.
Please do help.
The issue with communicating to the 2701 might be an invalid termination character. By default the termination character has the value CR+LF which is “\r\n”.
The python code to set the termination character is:
theInstrument = visa.instrument(“TCPIP::<IPaddress>::1394::SOCKET”, term_chars = “\n”)
or
theInstrument = visa.instrument(“TCPIP::<IPaddress>::1394::SOCKET”)
theInstrument.term_chars = “\n”
I hope this helps,
I am learning hive. I have setup a table named records. With schema as follows:
year : string
temperature : int
quality : int
Here are sample rows
1999 28 3
2000 28 3
2001 30 2
Now I wrote a sample map reduce script in python exactly as specified in the book Hadoop The Definitive Guide:
import re
import sys
for line in sys.stdin:
(year,tmp,q) = line.strip().split()
if (tmp != '9999' and re.match("[01459]",q)):
print "%s\t%s" % (year,tmp)
I run this using following command:
ADD FILE /usr/local/hadoop/programs/sample_mapreduce.py;
SELECT TRANSFORM(year, temperature, quality)
USING 'sample_mapreduce.py'
AS year,temperature;
Execution fails. On the terminal I get this:
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-08-23 18:30:28,506 Stage-1 map = 0%, reduce = 0%
2012-08-23 18:30:59,647 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201208231754_0005 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201208231754_0005_m_000002 (and more) from job job_201208231754_0005
Exception in thread "Thread-103" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://master:50060/tasklog?taskid=attempt_201208231754_0005_m_000000_2&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
I go to failed job list and this is the stack trace
java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hit error while closing ..
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:452)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
... 8 more
The same trace repeated 3 times more.
Please, can someone help me with this? What is wrong here? I am going exactly by the book. What seems to be the problem. There are two errors it seems. On terminal it says that it can't read from task log url. In the failed job list, the exception says something different. Please help
I went to stedrr log from the hadoop admin interface and saw that there was syntax error from python. Then I found that when I created hive table the field delimiter was tab. And in the split() i hadn't mentioned. So I changed it to split('\t') and it worked alright !
just use 'describe formatted ' and near the bottom of the output you'll find 'Storage Desc Params:' which describe any delimiters used.