The answer I'm looking for is a reference to documentation. I'm still debugging, but can't find a reference for the error code.
The full error comment is:
snowflake.connector.errors.OperationalError: 255005: Failed to read next arrow batch: b'Array length did not match record batch length'
Some background, if it helps:
The error is in response to the call to fetchall as shown here (python):
cs.execute (f'SELECT {specific_column} FROM {table};')
all_starts = cs.fetchall()
The code context is: when running from cron (i.e., timed job), successful connect, then for a list of tables, the third time through, there's an error. (I.e., two tables are "successful".) When the same script is run at other times (via command line, not cron), there's no error (all tables are "successful").
Related
I am doing research on AWS OpenSearch and one of the things I'm trying to measure is run times or execution times for different queries andindex commands. For example how long does it take to perform an action such as a query search, create index, delete index?
Right now I am using the awswrangler Python library for interacting with OpenSearch.API for that library here
Read Index Code I currently have:
awswrangler.opensearch.search(client=self.client, index="index_name", search_body=any_dsl_query,size=100)
awswrangler.opensearch.search_by_sql(client=self.client, sql_query="SELECT * from index_name limit 100")
Delete Index Code:
awswrangler.opensearch.delete_index(client=self.client, index="index_name")
Create Index Code (this one actually returns Elapsed time as desired):
awswrangler.opensearch.index_csv(client=self.client, path=csv_file_path, index="index_name")
Unfortunately none of these except Create Index return the runtime out of the box.
I know that I can create my own timer script to get the runtime but I don't want to do this client side, because then that would include my network, latency in the execution time. Is there anyway to do this with OpenSearch?
I couldn't find a way in the awswrangler python library I was using, or with any other method so far.
I was able to resolve this by using the python requests library and looking at the "took" value in the response, which is the time it took to run the query in ms. Here is the code I used to get this working:
sample_sql_query = "SELECT * FROM <index_name> LIMIT 5"
sql_result = requests.post("<opensearch_domain>/_plugins/_sql?format=json", auth=(username, password), data=json.dumps({"query":sample_sql_query}), headers=headers).json()
print (sql_result)
I have a below data frame Df1 with Columns 'Summary' and 'Closing group'
Summary Closing Group
XX012 job abended with error Automation
XX015 job abended with error Automation
Front End issue TSL error Automation
XX015 job abended with error Automation
Front End issue TSL error Automation
Front End issue TSL error Automation
File not present error Automation
I have another data frame Df2 below with column 'Label'
Label
TSL error
job abended
File not present
I want to map each Label against Summary column if exact string from Label exist in Summary.
I have written the below script to handle my condition using for loop:
list_label= Df2['Label']
def is_phrase_in(phrase, text):
return re.search(r"\b{}\b".format(phrase), text, re.IGNORECASE) is not None
for idx2,row2 in Df1.iterrows():
for label in list_label:
print(label)
if is_phrase_in(label, row2['Summary']):
Df1.at[idx2,'Label'] =label
break
The above code gave me the expected results but its taking much time when run on 7000 label list and 20000 Summary.
To optimize this I have used Lambda function as below :
Df1['Label'] = Df1['Summary'].apply(lambda x : next((l for l in list_label['Label'] if is_phrase_in(l,x)), 'No Label Found'))
But this script take more time , even more than if for loop.
Can anyone tell me if I am doing anything wrong here or is there any other way to optimize this code.
My expected output:
Summary Closing Group Label
XX012 job abended with error Automation job abended
XX015 job abended with error Automation job abended
Front End issue TSL error Automation TSL error
Server down error Server No Label found
XX015 job abended with error Automation job abended
Front End issue TSL error Automation TSL error
Front End issue TSL error Automation TSL error
File not present error Automation File not present
It must be clear that most of the processing in the above code would be spent on the regular expression search (re.search).
Can you try for the alternative 'Python String find() Method' .i.e. str.find(str, beg=0, end=len(string)) with your data.
if text.find(phrase) == -1:
return 'No Label Found'
else:
return phrase
The replacing of comparison of strings using "in" instead of regex makes the code a little bit faster. But however from the example that you have provided, it seems that the summaries are repeating("XX015 job abended with error" occurred two times and "Front End issue TSL error " occurred three times). Maybe you can take the set of unique summaries and labels and do your string operations, store them somewhere else as dictionaries and then do the final mapping. I guess this makes it a lot faster compared to directly computing the function each time you see the strings.
Problem
I am connecting with JayDeBeApi to SQL Server 2017 and running a script like:
SELECT ... INTO #a-temp-table
DELETE FROM a-table
INSERT INTO a-table SELECT FROM #a-temp-table
DELETE #a-temp-table
During step 3 i get the following error:
Cannot insert duplicate key row in object 'dbo.a-table' with unique index 'UQ_a-table'. The duplicate key value is (11, 0001, 3751191, T70206CAT, 0000).
Instead of ~360k records, only ~180k get inserted. So step 3 aborts.
The temp table however gets deleted. So step 4 completes.
I am able to fix the error. But with JayDeBeApi, I am not seeing the error.
It seems like everything went fine from the Python point of view.
My goal is to capture those errors to handle them appropriately.
Any idea how to achieve that?
What I've tried
My Python code looks like.
try:
localCursor = dbConnection.cursor()
x = localCursor.execute(query)
logInfo("Run script %s... done" % (scriptNameAndPath), "run script", diagnosticLog)
except Exception as e:
logError("Error running sql statement " + scriptNameAndPath + ". Skipping rest of row.",
"run script", e, diagnosticLog)
myrow = skipRowAndLogRecord(startRowTime, cursor, recordLog)
continue
x = localCursor.execute(myqrystm) completes successfully, so no exception is thrown. x is None and while inspecting localCursor, I see no sign of any error message(s)/code(s)
Step 3 should be all-or-none so the a-table should be empty following the duplicate key error unless your actual code has a WHERE clause.
Regarding the undetected exception, add SET NOCOUNT ON as the first statement in the script. That will suppress DONE_IN_PROC messages that will interfere with script execution unless your code handles multiple result sets.
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/try-catch-transact-sql?view=sql-server-2017
-- Create procedure to retrieve error information.
CREATE PROCEDURE usp_GetErrorInfo
AS
SELECT
ERROR_NUMBER() AS ErrorNumber
,ERROR_SEVERITY() AS ErrorSeverity
,ERROR_STATE() AS ErrorState
,ERROR_PROCEDURE() AS ErrorProcedure
,ERROR_LINE() AS ErrorLine
,ERROR_MESSAGE() AS ErrorMessage;
GO
BEGIN TRY
-- Generate divide-by-zero error.
SELECT 1/0;
END TRY
BEGIN CATCH
-- Execute error retrieval routine.
EXECUTE usp_GetErrorInfo;
END CATCH;
I hooked up the Keithley 2701 DMM, installed the software and set the IPs right. I can access and control the instrument via the internet explorer webpage and the Keithley communicator. When I try to use python, it detects the instrument
i.e. a=visa.instrument("COM1") doesn't give an error.
I can write to the instrument as well:
a.write("*RST")
a.write("DISP:ENAB ON/OFF")
a.write("DISP:TEXT:STAT ON/OFF")
etc all don't give any error but no change is seen on the instrument screen.
However when I try to read back, a.ask("*IDN?") etc give me an error
saying timeout expired before operation completed.
I tried redefining as:
a=visa.instrument("COM1",timeout=None)
a=visa.instrument("TCPIP::<the IP adress>::1354::SOCKET")
and a few other possible combinations but I'm getting the same error.
Please do help.
The issue with communicating to the 2701 might be an invalid termination character. By default the termination character has the value CR+LF which is “\r\n”.
The python code to set the termination character is:
theInstrument = visa.instrument(“TCPIP::<IPaddress>::1394::SOCKET”, term_chars = “\n”)
or
theInstrument = visa.instrument(“TCPIP::<IPaddress>::1394::SOCKET”)
theInstrument.term_chars = “\n”
I hope this helps,
I am learning hive. I have setup a table named records. With schema as follows:
year : string
temperature : int
quality : int
Here are sample rows
1999 28 3
2000 28 3
2001 30 2
Now I wrote a sample map reduce script in python exactly as specified in the book Hadoop The Definitive Guide:
import re
import sys
for line in sys.stdin:
(year,tmp,q) = line.strip().split()
if (tmp != '9999' and re.match("[01459]",q)):
print "%s\t%s" % (year,tmp)
I run this using following command:
ADD FILE /usr/local/hadoop/programs/sample_mapreduce.py;
SELECT TRANSFORM(year, temperature, quality)
USING 'sample_mapreduce.py'
AS year,temperature;
Execution fails. On the terminal I get this:
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-08-23 18:30:28,506 Stage-1 map = 0%, reduce = 0%
2012-08-23 18:30:59,647 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201208231754_0005 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201208231754_0005_m_000002 (and more) from job job_201208231754_0005
Exception in thread "Thread-103" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://master:50060/tasklog?taskid=attempt_201208231754_0005_m_000000_2&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
I go to failed job list and this is the stack trace
java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hit error while closing ..
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:452)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
... 8 more
The same trace repeated 3 times more.
Please, can someone help me with this? What is wrong here? I am going exactly by the book. What seems to be the problem. There are two errors it seems. On terminal it says that it can't read from task log url. In the failed job list, the exception says something different. Please help
I went to stedrr log from the hadoop admin interface and saw that there was syntax error from python. Then I found that when I created hive table the field delimiter was tab. And in the split() i hadn't mentioned. So I changed it to split('\t') and it worked alright !
just use 'describe formatted ' and near the bottom of the output you'll find 'Storage Desc Params:' which describe any delimiters used.