Python Locust stopped without error message, how should I check? - python

I run locust using this command:
locust -f locustfile.py --no-web -c10 -r10 &> locust.log &
My understanding is all output (stdout, stderr) will goes to locust.log
However, when the program stopped without me triggering to stop, last lines of locust.log is only the stat like below, no error message could be found:
Name # reqs # fails Avg Min Max | Median req/s
--------------------------------------------------------------------------------------------------------------------------------------------
GET /*******/**********/ 931940 8(0.00%) 45 10 30583 | 23 101.20
GET /**************/************/ 931504 14(0.00%) 47 11 30765 | 24 104.10
GET /**************/***************/ 594 92243(99.36%) 30 12 549 | 23 0.00
--------------------------------------------------------------------------------------------------------------------------------------------
Total 1864038 92265(4.95%) 205.30
Since I didn't put number of request, the job should not stop forever.
Where and how should I check why the job is stopping?

Related

Selenium: Element 'name=login' not visible after 10 seconds

When I launch a test in robot framework with Jenkins, I get this error :
Full Name: 03 VariableList
Source: /data/application/xxxxxxx/robot/03_VariableList.robot
Start / End / Elapsed: 20230214 11:18:46.964 / 20230214 11:19:08.600 / 00:00:21.636
Status: 7 tests total, 0 passed, 7 failed, 0 skipped
Message: Suite setup failed:
Element 'name=login' not visible after 10 seconds.
Does anyone know how to solve this problem?
I expect to have something like this :
Full Name: 03 VariableList
Source: /data/application/xxxxxxxxx/robot/03_VariableList.robot
Start / End / Elapsed: 20230214 11:18:46.964 / 20230214 11:19:08.600 / 00:00:21.636
Status: 7 tests total, 7 passed, 0 failed, 0 skipped

PySpark: how do I get the mean of several fft arrays?

I'm doing some analysis on a large set of waveform data. I have to get all the FFTs of all waveforms and then take the mean/min/max and standard deviation from them to check if all the tests were sound.
What I wrote is working... on a very small data set. I think spark is trying to get all the FFTs and after that get the mean. I'm getting a lot of errors and it seems that I'm either out of memory, or that the request is taking to long. I think I should try another solution by getting 1 FFT and do something like a rolling average or so?? I'm trying to find a solution but I can't wrap my head around it.
I tried using F.avg() but that doesn't work on ArrayTypes. I thought of writing my own UDF but I couldn't figure out how to inject the group size into it. (I left the grouping stage out of the example though)
This is what I got. First you'll see the code that sets up the dataframe I get from every test instance. Then I'll show you the DF shape. After that I'll try to aggregate all of those into a single DF.
import numpy as np
import pyspark.sql.types as T
import pyspark.sql.functions as F
from pyspark.sql.functions import col
# Maybe some other imports
# Setup:
def _rfft(x):
transformed = np.fft.rfft(x)
return transformed.real.tolist(), transformed.imag.tolist(), np.abs(transformed).tolist()
spark_complex_abs = T.StructType([
T.StructField("real", T.ArrayType(T.DoubleType(),False), False),
T.StructField("imag", T.ArrayType(T.DoubleType(),False), False),
T.StructField("abs", T.ArrayType(T.DoubleType(),False), False),
])
spark_rfft = F.udf(_rfft, spark_complex_abs)
def _rfft_bins(size, periodMicroSeconds):
return np.fft.rfftfreq(size, d=(periodMicroSeconds/10**6)).tolist()
spark_rfft_bins = F.udf(_rfft_bins, T.ArrayType(T.DoubleType(), False))
df = df.select('samplePeriod') \
.withColumn('dataSize', col("waveformData")['dimensions'][0]) \
.withColumn("data", col("waveformData")['elements']) \
.withColumn('fft', spark_rfft('data')) \
.withColumn('fftAmplitude', col('fft')['abs'])
.withColumn('fftBins', spark_rfft_bins('dataSize', 'samplePeriod'))
# Other selects (but not part of this example)
# DataFrame shape:
# | samplePeriod | dataSize | data | fft | fftAmplitude | fftBins |
# | DoubleType | DoubleType | ArrayType(DoubleType) | ComplexType | ArrayType(DoubleType) | ArrayType(DoubleType) |
# Grouping stage not part of this example
# Aggregration
# This will not work on large data sets :(
def _index_avg(arr):
return np.mean(arr, axis = 0).tolist()
spark_index_avg = F.udf(_index_avg, T.ArrayType(T.DoubleType(), False))
df = df.agg(\
spark_index_avg(F.collect_list(col('fftAmplitude'))).alias('avg'), \
F.first('fftBins').alias('fftBins') \
)
result = df.toPandas()
(Censored) error messages:
23/02/14 10:36:52 ERROR YarnScheduler: Lost executor 14 on XXXXXXX: Container from a bad node: container_XXXXXXX on host: XXXXXXX. Exit status: 143. Diagnostics: [2023-02-14 10:36:51.901]Container killed on request. Exit code is 143
[2023-02-14 10:36:51.922]Container exited with a non-zero exit code 143.
[2023-02-14 10:36:51.926]Killed by external signal
.
23/02/14 10:36:52 WARN TaskSetManager: Lost task 0.0 in stage 13.0 XXXXXXX: ExecutorLostFailure (executor 14 exited caused by one of the running tasks) Reason: Container from a bad node: containerXXXXXXX on host: XXXXXXX. Exit status: 143. Diagnostics: [2023-02-14 10:36:51.901]Container killed on request. Exit code is 143
[2023-02-14 10:36:51.922]Container exited with a non-zero exit code 143.
[2023-02-14 10:36:51.926]Killed by external signal
.
23/02/14 10:36:52 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 14 for reason Container from a bad node: container XXXXX on host: XXXXXXX. Exit status: 143. Diagnostics: [2023-02-14 10:36:51.901]Container killed on request. Exit code is 143
[2023-02-14 10:36:51.922]Container exited with a non-zero exit code 143.
[2023-02-14 10:36:51.926]Killed by external signal
EDIT:
I also tried
df = df.agg(F.avg(F.explode(F.collect_list('fftAmplitude'))).alias('avg'))
But then it will complain with: The generator is not supported: nested in expressions "avg(explode(collect_list(fft.abs)))"

Index error while using OligoMiner commands in wsl

I am trying to design oligoprobes using OligoMiner tool in windows subsystem linux (wsl) by activating anaconda software. Please have a loo to th ecommands given below. in the middle i am facing an error of Index error and I am not able to fix it. I copied the fasta DNA files into Oligominer folder before giving commands in wsl. these were as follows:
(ol) rahul#DESKTOP-J3Q9JD9:~/ol_dir/OligoMiner$ python blockParse.py 5.fa
0 of 345789
100000 of 345789
200000 of 345789
6635 candidate probes identified in 345.77 kb yielding 19.19 candidates/kb
Program took 14.599275 seconds
(ol) rahul#DESKTOP-J3Q9JD9:~/ol_dir/OligoMiner$ bowtie2-build 5.fa 5
Settings:
Output files: "5.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
5.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 86447
Using parameters --bmax 64836 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 64836 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 345789 (target: 64835)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 345789 for bucket 1
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 345790 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 82484
fchr[G]: 168573
fchr[T]: 263073
fchr[$]: 345789
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4309819 bytes to primary EBWT file: 5.1.bt2
Wrote 86452 bytes to secondary EBWT file: 5.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 345789
bwtLen: 345790
sz: 86448
bwtSz: 86448
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 21612
offsSz: 86448
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 1801
numLines: 1801
ebwtTotLen: 115264
ebwtTotSz: 115264
color: 0
reverse: 0
Total time for call to driver() for forward index: 00:00:00
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 86447
Using parameters --bmax 64836 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 64836 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 345789 (target: 64835)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 345789 for bucket 1
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 345790 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 82484
fchr[G]: 168573
fchr[T]: 263073
fchr[$]: 345789
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4309819 bytes to primary EBWT file: 5.rev.1.bt2
Wrote 86452 bytes to secondary EBWT file: 5.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 345789
bwtLen: 345790
sz: 86448
bwtSz: 86448
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 21612
offsSz: 86448
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 1801
numLines: 1801
ebwtTotLen: 115264
ebwtTotSz: 115264
color: 0
reverse: 1
Total time for backward call to driver() for mirror index: 00:00:01
(ol) rahul#DESKTOP-J3Q9JD9:~/ol_dir/OligoMiner$ bowtie2 -x ~/ol_dir/OligoMiner$ bowtie2 -x ~/ol_dir/OligoMiner/5 -U 5.fa
stq --no-hd -t -k 100 --very-sensitive-local -S 5_u.sam
(ERR): "/home/rahul/ol_dir/OligoMiner$" does not exist or is not a Bowtie 2 index
Exiting now ...
(ol) rahul#DESKTOP-J3Q9JD9:~/ol_dir/OligoMiner$ bowtie2 -x ~/ol_dir/OligoMiner/5 -U 5.fa
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): bowtie2-align exited with value 134
(ol) rahul#DESKTOP-J3Q9JD9:~/ol_dir/OligoMiner$ bowtie2 -x ~/ol_dir/OligoMiner/5 -U 5.fastq --no-hd -t -k 100 --very-sensitive-local -S 5_u.sam
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:00
6635 reads; of these:
6635 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
6578 (99.14%) aligned exactly 1 time
57 (0.86%) aligned >1 times
100.00% overall alignment rate
Time searching: 00:00:00
Overall time: 00:00:00
(ol) rahul#DESKTOP-J3Q9JD9:~/ol_dir/OligoMiner$ python outputClean.py -u -f 5_u.sam
Traceback (most recent call last):
File "outputClean.py", line 486, in <module>
main()
File "outputClean.py", line 480, in main
reportVal, debugVal, metaVal, outNameVal, startTime)
File "outputClean.py", line 70, in cleanOutput
if x[0] is not '#' else ' ' for x in file_read]
IndexError: list index out of range
(ol) rahul#DESKTOP-J3Q9JD9:~/ol_dir/OligoMiner$
I was trying Anaconda activated environment where Oligominer tool was used to create oligoprobes from the input DNA fasta file.
I was expecting to get the probes after the commands given but could not get them.

Opencv_traincascade- is it not possible to train opcncv_traincascade with just 26 samples or less?

I am trying to run this code in terminal
C:\Ankit\VirEnv\opencv\build\x64\vc15\bin\opencv_traincascade.exe -data cascade/ -vec C:\Ankit\VirEnv\pos3.vec -bg neg.txt -w 24 -h 24 -negPos 20 -numNeg 1000 -minHitRate 0.9
This is my output
(env) PS C:\Ankit\VirEnv> C:\Ankit\VirEnv\opencv\build\x64\vc15\bin\opencv_traincascade.exe -data cascade/ -vec C:\Ankit\VirEnv\pos3.vec -bg neg.txt -w 24 -h 24 -negPos 20 -numNeg 1000 -minHitRate 0.9
PARAMETERS:
cascadeDirName: cascade/
vecFileName: C:\Ankit\VirEnv\pos3.vec
bgFileName: neg.txt
numPos: 2000
numNeg: 1000
numStages: 20
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 24
sampleHeight: 24
boostType: GAB
minHitRate: 0.9
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: BASIC
Number of unique features given windowSize [24,24] : 162336
===== TRAINING 0-stage =====
<BEGIN
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(3.4.16) Error: Bad argument (> Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.
> ) in CvCascadeImageReader::PosReader::get, file C:\build\3_4_winpack-build-win64-vc15\opencv\apps\traincascade\imagestorage.cpp, line 158
I am getting this error
OpenCV(3.4.16) Error: Bad argument (> Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.
> ) in CvCascadeImageReader::PosReader::get, file C:\build\3_4_winpack-build-win64-vc15\opencv\apps\traincascade\imagestorage.cpp, line 158
I only have 26 sample, I have tried to mess with Pog value and the minHitRate vale but still this error is there, i don't know what to do next. Is it not possible to do this with just 26 sample? is it necessary to have more than 26 sample to make this work.

MapReduce output not the complete set expected?

I'm running a streaming hadoop job on a single hadoop pseudo-distributed node in python, also using hadoop-lzo to produce splits on a .lzo compressed input file.
Everything works as expected when using small compressed or uncompressed test datasets; MapReduce output matches that from a simple 'cat | map | sort | reduce' pipeline in unix. - whether the input is compressed or not.
However, once I move to processing the single large .lzo (pre-indexed) dataset (~40GB compressed) and the job is split to multiple mappers, the output looks to be truncated - only the first few key values are present.
The code + outputs follow - as you can see, it's a very simple count for testing the whole process.
output from straight forward unix pipeline on test data (subset of large dataset);
lzop -cd objectdata_input.lzo | ./objectdata_map.py | sort | ./objectdata_red.py
3656 3
3671 3
51 6
output from hadoop job on test data (same test data as above)
hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-streaming-*.jar -input objectdata_input.lzo -inputformat com.hadoop.mapred.DeprecatedLzoTextInputFormat -output retention_counts -mapper objectdata_map.py -reducer objectdata_red.py -file /home/bob/python-dev/objectdata_map.py -file /home/bob/python-dev/objectdata_red.py
3656 3
3671 3
51 6
Now, the test data is a small subset of lines from the real dataset, so I would at least expect to see the keys from above in the resulting output when the job is run against the full dataset. However, what I get is;
hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-streaming-*.jar -input objectdata_input_full.lzo -inputformat com.hadoop.mapred.DeprecatedLzoTextInputFormat -output retention_counts -mapper objectdata_map.py -reducer objectdata_red.py -file /home/bob/python-dev/objectdata_map.py -file /home/bob/python-dev/objectdata_red.py
1 40475582
12 48874
14 8929777
15 219984
16 161340
17 793211
18 78862
19 47561
2 14279960
20 56399
21 3360
22 944639
23 1384073
24 956886
25 9667
26 51542
27 2796
28 336767
29 840
3 3874316
30 1776
33 1448
34 12144
35 1872
36 1919
37 2035
38 291
39 422
4 539750
40 1820
41 1627
42 97678
43 67581
44 11009
45 938
46 849
47 375
48 876
49 671
5 262848
50 5674
51 90
6 6459687
7 4711612
8 20505097
9 135592
...There are many less keys than I would expect based on the dataset.
I'm less bothered by the key's themselves - this set could be expected given the input dataset, I am more concerned that there should be many many more keys, in the thousands. When I run the code in a unix pipeline against the first 25million records in the dataset, I get keys in the range approx 1 - 7000.
So, this output appears to be just the first few lines of what I would actually expect, and I'm not sure why. Am I missing collating many part-0000# files? or something similar? this is just a single-node pseudo-distributed hadoop I am testing on at home, so if there are more part-# files to collect I have no idea where they could be; they do not show up in the retention_counts dir in HDFS.
The mapper and reducer code is as follows - effectivley the same as the many word-count examples floating about;
objectdata_map.py
#!/usr/bin/env python
import sys
RETENTION_DAYS=(8321, 8335)
for line in sys.stdin:
line=line.strip()
try:
retention_days=int(line[RETENTION_DAYS[0]:RETENTION_DAYS[1]])
print "%s\t%s" % (retention_days,1)
except:
continue
objectdata_red.py
#!/usr/bin/env python
import sys
last_key=None
key_count=0
for line in sys.stdin:
key=line.split('\t')[0]
if last_key and last_key!=key:
print "%s\t%s" % (last_key,key_count)
key_count=1
else:
key_count+=1
last_key=key
print "%s\t%s" % (last_key,key_count)
This is all on a manually installed hadoop 1.1.2, pseudo-distributed mode, with hadoop-lzo built and installed from
https://github.com/kevinweil/hadoop-lzo

Categories