Problem with getting data from redis using aioredis - python

I am using redis to store tokens and their vector representation in database. First I convert vector as list of floats to str and then, getting it back, convert from str to list of floats, however "AttributeError: 'int' object has no attribute 'strip'" occurs
import aioredis #aioredis==1.3.1
app = FastAPI()
redis = None
#app.on_event('startup')
async def startup_event():
global redis
redis = await aioredis.create_redis(address=('redis', 6379))
#app.on_event('shutdown')
async def shutdown_event():
redis.close()
await redis.wait_closed()
#app.get("/vectorize_token")
async def vectorize_token(
token: str = Query("python", max_length=250),
model_name: ModelSelection = ModelSelection.BERT,
):
# get cache from memory
cache = await redis.get(token)
# check value from cache, if exists return it
if cache is not None:
vector = [float(char.strip('[,]')) for char in cache]
return {'query':token, 'vector':vector}
vector = model_mapping[model_name.value].vectorize_token(token)
response = QueryResponse(query=token, vector=vector)
# save cache in memory
await redis.set(token, str(vector))
return response

Related

The date in the server response is different from the date in the database FastAPI, SQLAlchemy

I I wrote a simple function that should return records from my database (potsgres)
async def all_message(self, chat_id: int) -> List[MessageAll]:
query = message.select().where(message.c.chat_id == chat_id).order_by(desc(message.c.date_send))
return await self.database.fetch_all(query)
my endpoint
#mes_router.get('/{chat_id}')
async def get_message_for_chat(
chat_id: int,
message: MessageRepository = Depends(get_message_repository),
):
return await message.all_message(chat_id=chat_id)
my model
class MessageAll(BaseModel):
id: int
message: str
date_send: d
chat_id: int
sender_id: int
However, the time in my database is different from the time I get in response from the server
EXAMPLE
database entry(Its True)
message: hello
date: 2023-01-15 01:49:06.535 +0300
response from the server(Its False)
message:hello
date: 2023-01-14T22:49:06.535436+00:00
How can I display the correct time in the response from the server?

Returning a big object from activity function throws an error

I have a Durable Functions instance with the following functions:
Starter/Trigger function. It uses a Storage Queue trigger. It also implements Singleton Orchestrator, because I want only one instance of the Orchestrator to run at a time:
async def main(msg: func.QueueMessage, starter: str) -> None:
client = df.DurableOrchestrationClient(starter)
payload = {"day": msg.get_body().decode('utf-8')}
instance_id = msg.get_body().decode('utf-8')
await client.terminate(instance_id, "New instance to be called")
existing_instance = await client.get_status(instance_id)
status = existing_instance.runtime_status
if existing_instance.runtime_status in [df.OrchestrationRuntimeStatus.Completed, df.OrchestrationRuntimeStatus.Failed, df.OrchestrationRuntimeStatus.Terminated, None]:
instance_id = await client.start_new("Orchestrator", instance_id, payload)
logging.info(f"Started orchestration with ID = '{instance_id}'.")
return
else:
return {
'status': 409,
'body': f"An instance with ID '${existing_instance.instance_id}' already exists"
}
Orchestrator function:
def orchestrator_function(context: df.DurableOrchestrationContext):
input_context = context.get_input()
day = input_context.get('day')
day_date = datetime.datetime.strptime(day, '%y%m%d')
day_raw = day_date.strftime('%Y%m%d')
gps_data = yield context.call_activity('GetGPSData', day)
logging.info("Downloaded blobs")
Activity function:
def main(day: str) -> str:
blob_data = download_blob(day)
df_gps= pd.from_csv(blob_data)
return df_gps.to_json()
As you can see, the activity function simply downloads a blob (around 400MiB) from Azure Blob Storage, creates a dataframe out of it and returns the dataframe in JSON.
However, on execution, I get the following error:
[2022-12-12T10:53:03.677Z] 221101: Function 'Orchestrator (Orchestrator)' failed with an error. Reason: Message: Too many characters. The resulting number of bytes is larger than what can be returned as an int. (Parameter 'count')
Unhandled exception while executing task: System.ArgumentOutOfRangeException: Too many characters. The resulting number of bytes is larger than what can be returned as an int. (Parameter 'count')
[2022-12-12T10:53:03.678Z] at System.Text.UTF32Encoding.GetByteCount(Char* chars, Int32 count, EncoderNLS encoder)
[2022-12-12T10:53:03.678Z] at System.Text.UTF32Encoding.GetByteCount(String s)
[2022-12-12T10:53:03.679Z] at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableTaskExtension.GetIntputOutputTrace(String rawInputOutputData) in D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\DurableTaskExtension.cs:line 1480
I would like to split and process the dataframe concurrently using the Fan In behavior further in other activity functions; however it seems like the output of the activity function is too big for the Orchestrator. What should I do instead of it in this scenario? I would be thankful for any suggestions as I have officially entered the 'losing my mind' phase of debugging this. I've already tried compressing the DataFrame in many ways, to no avail.

How to map shuffled response to input using python asyncio aiohttp

I make asynchronous post requests to API using asyncio and aiohttp. I send parameter (X, Y) (float, float) to get list of data in response - let's call it scores. The data points coming in response is not in order it got sent, so I can not zip it on index to the input, which I can do using synchronous requests. I tried mapping input to response on parameter (X, Y), which is included in the response, but it gets rounded and decimal points get cut off on the API side. I have no way of finding out what is the exact API rounding mechanism. Also I can't round it before sending request.
Is there a way to somehow tag requests and send it as kind of passive attribute to be able to map responses back?
Or maybe there is other way to map input to response?
I am not sure if my code is needed, but here is a sample.
The scores response has to be matched to corresponding xy input.
btw. Yes I know that one response consists of 1000 xy. You will notice that if you read into _get_scores_async method. It is just the way API is built, that you can send 1000 xy.
import asyncio
import logging
from typing import Awaitable, Dict, List, Union
import aiohttp
import requests
import random
logger = logging.getLogger(__name__)
class APIWrapper:
base_urls = {
"prod": "https://apiprodlink.com/",
"stage": "https://apistagelink.com/",
}
_max_concurrent_connections = 20
def __init__(self, user: str, secret: str, env: str) -> None:
try:
self.base_url = self.base_urls[env]
except KeyError:
raise EnvironmentNotSupported(f"Environment {env} not supported.")
self._user = user
self._secret = secret
#property
def _headers(self) -> Dict:
"""Returns headers for requests"""
return {"Accept": "application/json"}
#property
def _client_session(self) -> aiohttp.ClientSession:
"""Returns aiohttp ClientSession"""
session = aiohttp.ClientSession(
auth=aiohttp.BasicAuth(self._user, self._secret), headers=self._headers
)
return session
async def _post_url_async(
self,
url: str,
session: aiohttp.ClientSession,
semaphore: asyncio.Semaphore,
**params,
) -> Awaitable:
"""Creates awaitable post request. To be awaited with async function.
Parameters
----------
url : str
post request will be done to this url
session : aiohttp.ClientSession
instance of ClientSession with auth and headers
semaphore : asyncio.Semaphore
Semaphore with defined max concurrent connections
Returns
-------
Awaitable
Coroutine object from response
"""
async with semaphore, session.post(url=url, json=params) as res:
res.raise_for_status()
response = await res.json()
return response
async def _get_scores_async(
self, xy: List[Tuple]
) -> Awaitable:
"""Creates coroutine of awaitable requests to scores endpoint
Parameters
----------
locations : List[Tuples]
Returns
-------
Awaitable
Coroutine of tasks to be run
"""
PER_REQUEST_LIMIT = 1000
semaphore = asyncio.Semaphore(self._max_concurrent_connections)
tasks = []
async with self._client_session as session:
for batch in range(0, len(xy), PER_REQUEST_LIMIT):
subset = xy[batch : batch + PER_REQUEST_LIMIT]
task = asyncio.create_task(
self._post_url_async(
f"{self.base_url}scores/endpoint",
session,
semaphore,
xy_param=subset,
)
)
tasks.append(task)
responses = await asyncio.gather(*tasks)
return responses
def get_scores(self, xy: List[Tuple]) -> List[Dict]:
"""Get scores for given xy
Parameters
----------
locations : List[Tuple]
Returns
-------
List[Dict]
"""
response = asyncio.run(self._get_scores_async(xy))
return [x for batch in response for x in batch]
if __name__ == "__main__":
api_client = APIWrapper("user", "secret", "prod")
xy = [(random.uniform(1,100),random.uniform(1,100)) for i in range(0,500000)]
scores = api_client.get_scores(xy)

Why isn't the value updated when publishing to a Redis channel?

I'm trying to use Redis in my FastAPI -application and struggling to update a value when publishing to a channel.
This is my Redis client setup:
import logging
import redis as _redis
logger = logging.getLogger(__name__)
REDIS_URL = "redis://redis:6379"
class RedisClient:
__instance = None
client: _redis.Redis
def __new__(cls) -> "RedisClient":
if cls.__instance is None:
cls.__instance = object.__new__(cls)
try:
logger.info("Connecting to Redis")
client = _redis.Redis.from_url(REDIS_URL)
cls.client = client
except Exception:
logger.error("Unable to connect to Redis")
logger.info("Connected to Redis")
return cls.__instance
def disconnect(self):
logger.info("Closing Redis connection")
self.client.connection_pool.disconnect()
redis = RedisClient().client
redis_pubsub = redis.pubsub()
This is my POST route what I'm using to test updating the value:
#router.post("/", response_model=Any)
async def post(tracing_no: int) -> Any:
data = {"tracing_no": tracing_no, "result": 200, "client_id": "10"}
data = json.dumps(data)
result = redis.publish(TestChannel.TestTopic.value, data)
return redis.get(TestChannel.TestTopic.value)
This code returns always "{\"tracing_no\": 5, \"result\": 200, \"client_id\": \"10\"}", no matter what number I give as tracking_no when I send the request. I think 5 was the first number I used when testing and that is in my dump.rdb -file.
Any ideas why my tracking_no isn't updating?

A blocked Python async function invocation also block another async function

I use FastAPI to develope data layer APIs accessing SQL Server.
No mater using pytds or pyodbc,
if there is a database transaction caused any request hangs,
all the other requests would be blocked. (even without database operation)
Reproduce:
Intentaionally do a serializable SQL Server session, begin a transaction and do not rollback or commit
INSERT INTO [dbo].[KVStore] VALUES ('1', '1', 0)
begin tran
SET TRANSACTION ISOLATION LEVEL Serializable
SELECT * FROM [dbo].[KVStore]
Send a request to the API with async handler function like this:
def kv_delete_by_key_2_sql():
conn = pytds.connect(dsn='192.168.0.1', database=cfg.kvStore_db, user=cfg.kvStore_uid,
password=cfg.kvStore_upwd, port=1435, autocommit=True)
engine = conn.cursor()
try:
sql = "delete KVStore; commit"
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(engine.execute, sql)
rs = future.result()
j = {
'success': True,
'rowcount': rs.rowcount
}
return jsonable_encoder(j)
except Exception as exn:
j = {
'success': False,
'reason': exn_handle(exn)
}
return jsonable_encoder(j)
#app.post("/kvStore/delete")
async def kv_delete(request: Request, type_: Optional[str] = Query(None, max_length=50)):
request_data = await request.json()
return kv_delete_by_key_2_sql()
And send a request to the API of the same app with async handler function like this:
async def hangit0(request: Request, t: int = Query(0)):
print(t, datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
await asyncio.sleep(t)
print(t, datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
j = {
'success': True
}
return jsonable_encoder(j)
#app.get("/kvStore/hangit/")
async def hangit(request: Request, t: int = Query(0)):
return await hangit0(request, t)
I expected step.2 would hang and step.3 should directly return after 2 seconds.
However step.3 never return if the transaction doesn't commit or rollback...
How do I make these handler functions work concurrently?
The reason is that rs = future.result() is actually a blocking call - see python docs. Unfortunately, executor.submit() doesn't return an awaitable object (concurrent.futures.Future is different from asyncio.Future.
You can use asyncio.wrap_future which takes concurrent.futures.Future and returns asyncio.Future (see python docs). The new Future object is awaitable thus you can convert your blocking function into an async function.
An Example:
import asyncio
import concurrent.futures
async def my_async():
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(lambda x: x + 1, 1)
return await asyncio.wrap_future(future)
print(asyncio.run(my_async()))
In your code, simply change the rs = future.result() to rs = await asyncio.wrap_future(future) and make the whole function async. That should do the magic, good luck! :)

Categories