Locust while running as library with grpc client not outputting stats

Locust while running as library with grpc client not outputting stats - python

We are trying to add a locust grpc client and got the code from examples directory here.
Our use case is such that locust will used as a library and not through UI.
The issues we are seeing stats outputted either in csv file or the code. I only difference between what we code we see in locust doc here and our code is service is started independently.
The issue is reproduced in this GitHub repo : https://github.com/saurabhsharma721/grpc_locust_stats_repro_prj
This issue is only observed in library mode. If we run it through ui the stats produced are correct. Please help.

I'm not sure, but have you tried adding this?
# start a greenlet that save current stats to history
gevent.spawn(stats_history, env.runner)
(from the documentation https://docs.locust.io/en/stable/use-as-lib.html)

I was able to solve this using following question with following code :
Added two static methods to be called on init and tear down
#staticmethod
def on_init(environment, **kwargs) -> None: # type: ignore[no-untyped-def]
print("Perform global setup to create a global state in LocustRunner")
#staticmethod
def on_quit(environment, **kwargs) -> None: # type: ignore[no-untyped-def]
print('Perform global teardown to clear the global state in LocustRunner')
Then added this code in driver
events.init.add_listener(LocustRunner.on_init)
env = Environment(user_classes=[TestClass], events=events)
# Using local runner in start
runner = env.create_local_runner()
# init event
env.events.init.fire(environment=env, runner=runner)
Driver code
# quit event
env.events.quitting.fire(environment=env, reverse=True)
There is already a stack-overflow answer for this but unable to search for that currently

Related

Flyte 0.16.2: Error loading Blob - How to get Types.Blob.fetch() to work in task decorated function?

I have a Flyte task function like this:
#task
def do_stuff(framework_obj):
framework_obj.get_outputs() # This calls Types.Blob.fetch(some_uri)
Trying to load a blob URI using flytekit.sdk.types.Types.Blob.fetch, but getting this error:
ERROR:flytekit: Exception when executing No temporary file system is present. Either call this method from within the context of a task or surround with a 'with LocalTestFileSystem():' block. Or specify a path when calling this function. Note: Cleanup is not automatic when a path is specified.
I can confirm I can load blobs using with LocalTestFileSystem(), in tests, but when actually trying to run a workflow, I'm not sure why I'm getting this error, as the function that calls blob-processing is decorated with #task so it's definitely a Flyte Task. I also confirmed that the task node exists on the Flyte web console.
What path is the error referencing and how do I call this function appropriately?
Using Flyte Version 0.16.2

Could you please give a bit more information about the code? This is flytekit version 0.15.x? I'm a bit confused since that version shouldn't have the #task decorator. It should only have #python_task which is an older API. If you want to use the new python native typing API you should install flytekit==0.17.0 instead.
Also, could you point to the documentation you're looking at? We've updated the docs a fair amount recently, maybe there's some confusion around that. These are the examples worth looking at. There's also two new Python classes, FlyteFile and FlyteDirectory that have replaced the Blob class in flytekit (though that remains what the IDL type is called).
(would've left this as a comment but I don't have the reputation to yet.)
Some code to help with fetching outputs and reading from a file output
#task
def task_file_reader():
client = SynchronousFlyteClient("flyteadmin.flyte.svc.cluster.local:81", insecure=True)
exec_id = WorkflowExecutionIdentifier(
domain="development",
project="flytesnacks",
name="iaok0qy6k1",
)
data = client.get_execution_data(exec_id)
lit = data.full_outputs.literals["o0"]
ctx = FlyteContext.current_context()
ff = TypeEngine.to_python_value(ctx, lv=lit,
expected_python_type=FlyteFile)
with open(ff, 'rb') as fh:
print(fh.readlines())

Connecting to local dockerized Perforce server in Python unittests

I maintain a Python tool that runs automation against a Perforce server. For obvious reasons, parts of my test suite (which are unittest.TestCase classes run with Pytest) require a live server. Until now I've been using a remote testing server, but I'd like to move that into my local environment, and make server initialization part of my pre-test setup.
I'm experimenting with dockerization as a solution, but I get strange connection errors when trying to run Perforce commands against the server in my test code. Here's my test server code (using a custom docker image, Singleton metaclass based on https://stackoverflow.com/a/6798042, and with the P4Python library installed):
class P4TestServer(metaclass=Singleton):
def __init__(self, conf_file='conf/p4testserver.conf'):
self.docker_client = docker.from_env()
self.config = P4TestServerConfig.load_config(conf_file)
self.server_container = None
try:
self.server_container = self.docker_client.containers.get('perforce')
except docker.errors.NotFound:
self.server_container = self.docker_client.containers.run(
'perforce-server',
detach=True,
environment={
'P4USER': self.config.p4superuser,
'P4PORT': self.config.p4port,
'P4PASSWD': self.config.p4superpasswd,
'NAME': self.config.p4name
},
name='perforce',
ports={
'1667/tcp': 1667
},
remove=True
)
self.p4 = P4()
self.p4.port = self.config.p4port
self.p4.user = self.config.p4superuser
self.p4.password = self.config.p4superpasswd
And here's my test code:
class TestSystemP4TestServer(unittest.TestCase):
def test_server_connection(self):
testserver = P4TestServer()
with testserver.p4.connect():
info = testserver.p4.run_info()
self.assertIsNotNone(info)
So this is the part that's getting to me: the first time I run that test (i.e. when it has to start the container), it fails with the following error:
E P4.P4Exception: [P4#run] Errors during command execution( "p4 info" )
E
E [Error]: 'TCP receive failed.\nread: socket: Connection reset by peer'
But on subsequent runs, when the container is already running, it passes. What's frustrating is that I can't otherwise reproduce this error. If I run that test code in any other context, including:
In a Python interpreter
In a debugger stopped just before the testserver.p4.run_info() invokation
The code completes as expected regardless of whether the container was already running.
All I can think at this point is that there's something unique about the pytest environment that's tripping me up, but I'm at a loss for even how to begin diagnosing. Any thoughts?

I had a similar issue recently where I would start postgres container and then immediately run a python script to setup database as per my app requirement.
I had to introduce a sleep command in between the two steps and that resolved the issue.
Ideally you should check if the start sequence of the docker container is done before trying to use it. But for my local development use case, sleep 5 seconds was good enough workaround.

How to debug PySpark in local mode from test class

I'm writing an aggregation in pysaprk
To this project, I'm also adding test, where I create a session, put some data, and then run my aggregation, and check the results
The code looks like as following:
def mapper_convert_row(row):
#... specific of business logic code, eventually return one string value
return my_str
def run_spark_query(spark: SparkSession, from_dt, to_dt):
query = get_hive_query_str(from_dt, to_dt)
df = spark.sql(query).rdd.map(lambda row: Row(mapper_convert_row(row)))
out_schema = StructType([StructField("data", StringType())])
df_conv = spark.createDataFrame(df, out_schema)
df_conv.write.mode('overwrite').format("csv").save(folder)
And here is my test class
class SparkFetchTest(unittest.TestCase):
#staticmethod
def getOrCreateSC():
conf = SparkConf()
conf.setMaster("local")
spark = (SparkSession.builder.config(conf=conf).appName("MyPySparkApp")
.enableHiveSupport().getOrCreate())
return spark
def test_fetch(self):
dt_from = datetime.strptime("2019-01-01-10-00", '%Y-%m-%d-%H-%M')
dt_to = datetime.strptime("2019-01-01-10-05", '%Y-%m-%d-%H-%M')
spark = self.getOrCreateSC()
self.init_and_populate_table_with_test_data(spark, input_tbl, dt_from, dt_to)
run_spark_query(spark, dt_from, dt_to)
# assert on results
I've added PySpark dependencies via the Conda environment
and running this code via PyCharm. Just to make it clear - there is no spark installation on my local machine except PySpark Conda package
When I set the breakpoint inside the code, it works for me in the driver code, but it does not stop inside mapper_convert_row function.
How can I debug this business logic function in a local test environment?
The same approach in scala works perfectly, but this code should be in python.

pyspark is a conduit to the spark runtime that runs on the jvm / is written in scala. The connection is through py4j that provides a tcp-based socket from the python executable to the jvm. Unfortunately that means
No local debugging
I'm no more happy about it than you. I might just write/maintain a parallel code branch in scala to figure some things out that are tiring to do without the debugger.
Update Pycharm is able to debug spark programs. I have been using it nearly daily Pycharm Debugging of Pyspark

RuntimeError passing a yaml profile to Essentia MusicExtractor

I am trying to generate a feature set with the Essentia MusicExtractor from a yaml profile as described in the documentation here and here via python.
My code snippet:
from essentia.standard import MusicExtractor
profile = "some_profile.yaml"
audio = "some_audio.mp3"
features, frames = MusicExtractor(profile=profile)(audio)
My yaml profile:
This produces the folling error:
RuntimeError:
Error while configuring MusicExtractor:
Pool: Cannot set/add/merge value to the pool under the name 'rhythm.stats'
because that name already exists but contains a different data type than value.
It does not really look that i am doing something wrong.

I ran into the same problem and fixed it this way:
Downloaded a sample profile from the essentia repos examples.
Ran the profile.
Commented out the conflicting lines after each run, which are just a few. Basically the stats and statsMFCC lines.
From this I could derive a working profile.

Python AppEngine MapReduce

i have created a pretty simple MapReduce pipeline, but i am having a cryptic:
PipelineSetupError: Error starting production.cron.pipelines.ItemsInfoPipeline(*(), **{})#a741186284ed4fb8a4cd06e38921beff:
when i try to start it. This is the pipeline code:
class ItemsInfoPipeline(base_handler.PipelineBase):
"""
"""
def run(self):
output = yield mapreduce_pipeline.MapreducePipeline(
job_name="items_job",
mapper_spec="production.cron.mappers.items_info_mapper",
input_reader_spec="mapreduce.input_readers.DatastoreInputReader",
mapper_params={
"input_reader": {
"entity_kind": "production.models.Transaction"
}
}
)
yield ItemsInfoStorePipeline(output)
class ItemsInfoStorePipeline(base_handler.PipelineBase):
"""
"""
def run(self, statistics):
print statistics
return "OK"
Of course i have double checked that the mapper path is right, and take into account that ItemsInfoStorePipeline is not doing anything because i am focusing the have the pipeline started, which is not happening.
It is all triggered by a Flask view, the following:
class ItemsInfoMRJob(views.MethodView):
"""
It's based on transacions.
"""
def get(self):
"""
:return:
"""
pipeline = ItemsInfoPipeline()
pipeline.start()
redirect_url = "%s/status?root=%s" % (pipeline.base_path, pipeline.pipeline_id)
return flask.redirect(redirect_url)
I am using GoogleAppEngineMapReduce==1.9.22.0
Thanks for any help.
UPDATE
The above code works once deployed.
UPDATE 2
Apparently there's more people dealing with this:
https://github.com/GoogleCloudPlatform/appengine-mapreduce/issues/103

I'm updating this. I have a code base that's using pipelines and works fine in OSX. I had another developer using OSX that simply nothing I do seems to get it working, he gets this:
Encountered unexpected error from ProtoRPC method implementation: PipelineSetupError
I've tried swapping versions around and making our PC's match perfectly and it continue to happen. I finally broke down and built an Ubuntu image in docker. I'm also doing my best to match our versions of AppEngine and libraries perfectly.
It also refuses to start with the same message. I started working through the libraries uncommenting the part that's swallowing the error but that was a long rabbit hole I started down because lots of stuff above it also seems to be swallowing up whatever is going on.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Locust while running as library with grpc client not outputting stats - python

I'm not sure, but have you tried adding this? # start a greenlet that save current stats to history gevent.spawn(stats_history, env.runner) (from the documentation https://docs.locust.io/en/stable/use-as-lib.html)

Related

Flyte 0.16.2: Error loading Blob - How to get Types.Blob.fetch() to work in task decorated function?

Connecting to local dockerized Perforce server in Python unittests

How to debug PySpark in local mode from test class

RuntimeError passing a yaml profile to Essentia MusicExtractor

Python AppEngine MapReduce

Categories

Resources