How to decode Arrow Flight `FlightData` with a pure gRPC client

How to decode Arrow Flight `FlightData` with a pure gRPC client - python

I came across a situation where we need to use a plain gRPC client (through the grpc.aio API) to talk to an Arrow Flight gRPC server.
The DoGet call did make it to the server, and we have received a FlightData in response. If our understanding of the Flight gRPC definition is correct, the response contains a flatbuffers message that can somehow be decoded into a RecordBatch.
Following, is the client-side code,
import asyncio
import pathlib
import grpc
import pyarrow as pa
import pyarrow.flight as pf
import flight_pb2, flight_pb2_grpc
async def main():
ticket = pf.Ticket("tick")
sock_file = pathlib.Path.cwd().joinpath("arena.sock").resolve()
async with grpc.aio.insecure_channel(f"unix://{sock_file}") as channel:
stub = flight_pb2_grpc.FlightServiceStub(channel)
async for data in stub.DoGet(flight_pb2.Ticket(ticket=ticket.ticket)):
assert type(data) is flight_pb2.FlightData
print(data)
# How to convert data into a RecordBatch?
asyncio.run(main())
Currently we stuck on this last step of decoding the FlightData response.
The question is two fold,
are there some existing facilities form pyarrow.flight that we can use to decode a python grpc object of the FlightData type;
if #1 is not possible, what are some other options to decode the content of the FlightData and reconstruct a RecordBatch from scratch?
The main interest here is to use the AsyncIO of plain gRPC client. Supposedly, this is not feasible with the current version of Arrow Flight gRPC client.

There is indeed no utility exposed in pyarrow.flight for this.
ArrowData contains, among other things, the Arrow IPC header and body. So you can instead decode it using pyarrow.ipc. Here's an example:
import asyncio
import pathlib
import struct
import grpc
import pyarrow as pa
import pyarrow.flight as pf
import Flight_pb2, Flight_pb2_grpc
async def main():
ticket = pf.Ticket("tick")
async with grpc.aio.insecure_channel("localhost:1234") as channel:
stub = Flight_pb2_grpc.FlightServiceStub(channel)
schema = None
async for data in stub.DoGet(Flight_pb2.Ticket(ticket=ticket.ticket)):
# 4 bytes: Need IPC continuation token
token = b'\xff\xff\xff\xff'
# 4 bytes: message length (little-endian)
length = struct.pack('<I', len(data.data_header))
buf = pa.py_buffer(token + length + data.data_header + data.data_body)
message = pa.ipc.read_message(buf)
print(message)
if schema is None:
# This should work but is unimplemented
# print(pa.ipc.read_schema(message))
schema = pa.ipc.read_schema(buf)
print(schema)
else:
batch = pa.ipc.read_record_batch(message, schema)
print(batch)
print(batch.to_pydict())
asyncio.run(main())
Server:
import pyarrow.flight as flight
import pyarrow as pa
class TestServer(flight.FlightServerBase):
def do_get(self, context, ticket):
table = pa.table([[1,2,3,4]], names=["a"])
return flight.RecordBatchStream(table)
TestServer("grpc://localhost:1234").serve()
There's some discussion about async Flight APIs, please join the dev# mailing list if you would like to chime in.

Related

Sending payload to IoT Hub for using in Azure Digital Twin using an Azure Function

Apologies for any incorrect formatting, long time since I posted anything on stack overflow.
I'm looking to send a json payload of data to Azure IoT Hub which I am then going to process using an Azure Function App to display real-time telemetry data in Azure Digital Twin.
I'm able to post the payload to IoT Hub and view it using the explorer fine, however my function is unable to take this and display this telemetry data in Azure Digital Twin. From Googling I've found that the json file needs to be utf-8 encrypted and set to application/json, which I think might be the problem with my current attempt at fixing this.
I've included a snipped of the log stream from my azure function app below, as shown the "body" part of the message is scrambled which is why I think it may be an issue in how the payload is encoded:
"iothub-message-source":"Telemetry"},"body":"eyJwb3dlciI6ICIxLjciLCAid2luZF9zcGVlZCI6ICIxLjciLCAid2luZF9kaXJlY3Rpb24iOiAiMS43In0="}
2023-01-27T13:39:05Z [Error] Error in ingest function: Cannot access child value on Newtonsoft.Json.Linq.JValue.
My current test code is below for sending payloads to IoT Hub, with the potential issue being that I'm not encoding the payload properly.
import datetime, requests
import json
deviceID = "JanTestDT"
IoTHubName = "IoTJanTest"
iotHubAPIVer = "2018-04-01"
iotHubRestURI = "https://" + IoTHubName + ".azure-devices.net/devices/" + deviceID + "/messages/events?api-version=" + iotHubAPIVer
SASToken = 'SharedAccessSignature'
Headers = {}
Headers['Authorization'] = SASToken
Headers['Content-Type'] = "application/json"
Headers['charset'] = "utf-8"
datetime = datetime.datetime.now()
payload = {
'power': "1.7",
'wind_speed': "1.7",
'wind_direction': "1.7"
}
payload2 = json.dumps(payload, ensure_ascii = False).encode("utf8")
resp = requests.post(iotHubRestURI, data=payload2, headers=Headers)
I've attempted to encode the payload correctly in several different ways including utf-8 within request.post, however this produces an error that a dict cannot be encoded or still has the body encrypted within the Function App log stream unable to decipher it.
Thanks for any help and/or guidance that can be provided on this - happy to elaborate further on anything that is not clear.

is there any particular reason why you want to use Azure IoT Hub Rest API end point instead of using Python SDK? Also, even though you see the values in JSON format when viewed through Azure IoT Explorer, the message format when viewed through a storage end point such as blob reveals a different format as you pointed.
I haven't tested the Python code with REST API, but I have a Python SDK that worked for me. Please refer the code sample below
import os
import random
import time
from datetime import date, datetime
from json import dumps
from azure.iot.device import IoTHubDeviceClient, Message
def json_serial(obj):
"""JSON serializer for objects not serializable by default json code"""
if isinstance(obj, (datetime, date)):
return obj.isoformat()
raise TypeError("Type %s not serializable" % type(obj))
CONNECTION_STRING = "<AzureIoTHubDevicePrimaryConnectionString>"
TEMPERATURE = 45.0
HUMIDITY = 60
MSG_TXT = '{{"temperature": {temperature},"humidity": {humidity}, "timesent": {timesent}}}'
def run_telemetry_sample(client):
print("IoT Hub device sending periodic messages")
client.connect()
while True:
temperature = TEMPERATURE + (random.random() * 15)
humidity = HUMIDITY + (random.random() * 20)
x = datetime.now().isoformat()
timesent = dumps(datetime.now(), default=json_serial)
msg_txt_formatted = MSG_TXT.format(
temperature=temperature, humidity=humidity, timesent=timesent)
message = Message(msg_txt_formatted, content_encoding="utf-8", content_type="application/json")
print("Sending message: {}".format(message))
client.send_message(message)
print("Message successfully sent")
time.sleep(10)
def main():
print("IoT Hub Quickstart #1 - Simulated device")
print("Press Ctrl-C to exit")
client = IoTHubDeviceClient.create_from_connection_string(CONNECTION_STRING)
try:
run_telemetry_sample(client)
except KeyboardInterrupt:
print("IoTHubClient sample stopped by user")
finally:
print("Shutting down IoTHubClient")
client.shutdown()
if __name__ == '__main__':
main()
You can edit the MSG_TXT variable in the code to match the payload format and pass the values. Note that the SDK uses Message class from Azure IoT Device library which has an overload for content type and content encoding. Here is how I have passed the overloads in the code message = Message(msg_txt_formatted, content_encoding="utf-8", content_type="application/json")
I have validated the message by routing to a Blob Storage container and could see the telemetry data in the JSON format. Please refer below image screenshot referring the data captured at end point.
Hope this helps!

How do I authorize with a Telethon QR code?

I'm trying to authorize in telethon via QR.
In the docs of telegram I found the method exportLoginToken, which allows you to create a token for qr code.
If I understand it correctly, the desktop telegram client uses this mechanics. You scan the qr from an authorized device and the session opens on the pc.
Telethon also has it. Example from the documentation:
with TelegramClient(name, api_id, api_hash) as client:
result = client(functions.auth.ExportLoginTokenRequest(
api_id=42,
api_hash='some string here',
except_ids=[42]
))
print(result.stringify())
If we don't have an active session, it will create one when we enter as telethon.
For this we need a number and a code. Or an active session and a connected client
The telethon docs say:
Note that you must be connected before invoking this, as with any other request.
So in order to create an authorization token on the new device I must already be authorized?
How do I get a token for the qr code on a device that has no active sessions?

When the documentation says that you must be connected, it means that you must call the method TelegramClient.connect which connects you to telegram.
So if you do this, it will work out:
import telethon
from telethon import TelegramClient
from qrcode import QRCode
from base64 import urlsafe_b64encode as base64url
qr = QRCode()
def gen_qr(token:str):
qr.clear()
qr.add_data(token)
qr.print_ascii()
def display_url_as_qr(url):
print(url) # do whatever to show url as a qr to the user
gen_qr(url)
async def main(client: telethon.TelegramClient):
if(not client.is_connected()):
await client.connect()
client.connect()
qr_login = await client.qr_login()
print(client.is_connected())
r = False
while not r:
display_url_as_qr(qr_login.url)
# Important! You need to wait for the login to complete!
try:
r = await qr_login.wait(10)
except:
await qr_login.recreate()
TELEGRAM_API_ID=
TELEGRAM_API_HASH=
client = TelegramClient("SessionName", TELEGRAM_API_ID, TELEGRAM_API_HASH)
client.loop.run_until_complete(main(client))

Python-binance futures user data websocket

I need help with the code to stary binance futures userdata stream as the package is not working.
I tried following changes in the code but it failed stating that APIERROR(code=0) Invalid Jason error message from Binance. The tweaks I made to subscribe futures userdatastream are as follows:
In my script:
bm = BinanceSocketManager(client)
def processmessage(msg):
print(msg)
conn_keys = bm.start_user_socket(processmsg)
bm.start()
In websockets.py in def start_socket: I replaced stream_url with fstream_url
In client.py in def create_api_uri: I replaced api_url with futures_url
I am in need of binance futures’s userdata stream websockets.

I am also using the package as you.
What you need is to re-write the process_data logic to get the data, and then us the code as following:
bm.start_futures_socket(process_data)
bm.start()

I was also trying to do the same. I couldn't get python-binance to work, so I switched to the unicorn_binance_websocket_api.
Quick example:
from unicorn_binance_websocket_api.unicorn_binance_websocket_api_manager import BinanceWebSocketApiManager
binance_websocket_api_manager = BinanceWebSocketApiManager(exchange="binance.com-futures")
binance_websocket_api_manager.create_stream('arr', '!userData', api_key=<Your API key>, api_secret=<Your secret API key>, output="UnicornFy")
while True:
data = binance_websocket_api_manager.pop_stream_data_from_stream_buffer()
if data:
#do something with that data
pass
Maybe that helps.

this is my code:
binance = await AsyncClient.create(config.API_KEY, config.API_SECRET)
bsm = BinanceSocketManager(binance)
async with bsm.futures_multiplex_socket(streams) as ms:
while True:
msg = await ms.recv()
...

Threads can only be started once in Django Channels

I created a simple Django Channels consumer that should connects to an external source, retrieve data and send it to the client. So, the user opens the page > the consumer connects to the external service and gets the data > the data is sent to the websocket.
Here is my code:
import json
from channels.generic.websocket import WebsocketConsumer, AsyncConsumer, AsyncJsonWebsocketConsumer
from binance.client import Client
import json
from binance.websockets import BinanceSocketManager
import time
import asyncio
client = Client('', '')
trades = client.get_recent_trades(symbol='BNBBTC')
bm = BinanceSocketManager(client)
class EchoConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await self.accept()
await self.send_json('test')
bm.start_trade_socket('BNBBTC', self.process_message)
bm.start()
def process_message(self, message):
JSON1 = json.dumps(message)
JSON2 = json.loads(JSON1)
#define variables
Rate = JSON2['p']
Quantity = JSON2['q']
Symbol = JSON2['s']
Order = JSON2['m']
asyncio.create_task(self.send_json(Rate))
print(Rate)
This code works when i open one page; if i try to open a new window with a new account, though, it will throw the following error:
File "C:\Users\User\Desktop\Heroku\github\master\myapp\consumers.py", line 54, in connect
bm.start()
File "C:\Users\User\lib\threading.py", line 843, in start
raise RuntimeError("threads can only be started once")
threads can only be started once
I'm new to Channels, so this is a noob question, but how can i fix this problem? What i wanted to do was: user opens the page and gets the data, another user opens the page and gets the data; is there no way to do that? Or am i simply misunderstanding how Django Channels and websockets works?

Do you really need a secondary thread ?
class EchoConsumer(AsyncJsonWebsocketConsumer):
symbol = ''
async def connect(self):
self.symbol = 'BNBBTC'
# or, more probably, retrieve the value for "symbol" from query_string
# so the client can specify which symbol he's interested into:
# socket = new WebSocket("ws://.../?symbol=BNBBTC");
await self.accept()
def process_message(self, message):
# PSEUDO-CODE BELOW !
if self.symbol == message['symbol']:
await self.send({
'type': 'websocket.send',
'text': json.dumps(message),
})
For extra flexibility, you might also accept al list of symbols from the client, instead:
//HTML
socket = new WebSocket("ws://.../?symbols=XXX,YYY,ZZZ");
then (in the consumer):
class EchoConsumer(AsyncJsonWebsocketConsumer):
symbols = []
async def connect(self):
# here we need to parse "?symbols=XXX,YYY,ZZZ" ...
# the code below has been stolen from another project of mine and should be suitably adapted
params = urllib.parse.parse_qs(self.scope.get('query_string', b'').decode('utf-8'))
try:
self.symbols = json.loads(params.get('symbols', ['[]'])[0])
except:
self.symbols = []
def process_message(self, message):
if message['symbol'] in self.symbols:
...

I'm no Django developer, but if I understand correctly, the function connect is being called more than once-- and bm.start references the same thread most likely made in bm.start_trade_socket (or somewhere else in connect). In conclusion, when bm.start is called, a thread is started, and when it is done again, you get that error.

Here start() Start the thread’s activity.
It should be called at most once per thread object. You have made a global object of BinanceSocketManager as "bm".
It will always raise a RuntimeError if called more than once on the same thread object.
Please refer the below mentioned code, it may help you
from channels.generic.websocket import WebsocketConsumer, AsyncConsumer, AsyncJsonWebsocketConsumer
from binance.client import Client
import json
from binance.websockets import BinanceSocketManager
import time
import asyncio
class EchoConsumer(AsyncJsonWebsocketConsumer):
client = Client('', '')
trades = client.get_recent_trades(symbol='BNBBTC')
bm = BinanceSocketManager(client)
async def connect(self):
await self.accept()
await self.send_json('test')
self.bm.start_trade_socket('BNBBTC', self.process_message)
self.bm.start()
def process_message(self, message):
JSON1 = json.dumps(message)
JSON2 = json.loads(JSON1)
#define variables
Rate = JSON2['p']
Quantity = JSON2['q']
Symbol = JSON2['s']
Order = JSON2['m']
asyncio.create_task(self.send_json(Rate))
print(Rate)

How can I download the chat history of a group in Telegram?

I would like to download the chat history (all messages) that were posted in a public group on Telegram. How can I do this with python?
I've found this method in the API https://core.telegram.org/method/messages.getHistory which I think looks like what I'm trying to do. But how do I actually call it? It seems there's no python examples for the MTproto protocol they use.
I also looked at the Bot API, but it doesn't seem to have a method to download messages.

You can use Telethon. Telegram API is fairly complicated and with the telethon, you can start using telegram API in a very short time without any pre-knowledge about the API.
pip install telethon
Then register your app (taken from telethon):
the link is: https://my.telegram.org/
Then to obtain message history of a group (assuming you have the group id):
chat_id = YOUR_CHAT_ID
api_id=YOUR_API_ID
api_hash = 'YOUR_API_HASH'
from telethon import TelegramClient
from telethon.tl.types.input_peer_chat import InputPeerChat
client = TelegramClient('session_id', api_id=api_id, api_hash=api_hash)
client.connect()
chat = InputPeerChat(chat_id)
total_count, messages, senders = client.get_message_history(
chat, limit=10)
for msg in reversed(messages):
# Format the message content
if getattr(msg, 'media', None):
content = '<{}> {}'.format( # The media may or may not have a caption
msg.media.__class__.__name__,
getattr(msg.media, 'caption', ''))
elif hasattr(msg, 'message'):
content = msg.message
elif hasattr(msg, 'action'):
content = str(msg.action)
else:
# Unknown message, simply print its class name
content = msg.__class__.__name__
text = '[{}:{}] (ID={}) {}: {} type: {}'.format(
msg.date.hour, msg.date.minute, msg.id, "no name",
content)
print (text)
The example is taken and simplified from telethon example.

With an update (August 2018) now Telegram Desktop application supports saving chat history very conveniently.
You can store it as json or html formatted.
To use this feature, make sure you have the latest version of Telegram Desktop installed on your computer, then click Settings > Export Telegram data.
https://telegram.org/blog/export-and-more

The currently accepted answer is for very old versions of Telethon. With Telethon 1.0, the code can and should be simplified to the following:
# chat can be:
# * int id (-12345)
# * str username (#chat)
# * str phone number (+12 3456)
# * Peer (types.PeerChat(12345))
# * InputPeer (types.InputPeerChat(12345))
# * Chat object (types.Chat)
# * ...and many more types
chat = ...
api_id = ...
api_hash = ...
from telethon.sync import TelegramClient
client = TelegramClient('session_id', api_id, api_hash)
with client:
# 10 is the limit on how many messages to fetch. Remove or change for more.
for msg in client.iter_messages(chat, 10):
print(msg.sender.first_name, ':', msg.text)
Applying any formatting is still possible but hasattr is no longer needed. if msg.media for example would be enough to check if the message has media.
A note, if you're using Jupyter, you need to use async directly:
from telethon import TelegramClient
client = TelegramClient('session_id', api_id, api_hash)
# Note `async with` and `async for`
async with client:
async for msg in client.iter_messages(chat, 10):
print(msg.sender.first_name, ':', msg.text)

Now, you can use TDesktop to export chats.
Here is the blog post about Aug 2018 update.
Original Answer:
Telegram MTProto is hard to use to newbies, so I recommend telegram-cli.
You can use third-party tg-export script, but still not easy to newbies too.

You can use the Telethon library. for this you need to register your app and connect your client code to it (look at this).
Then to obtain message history of a entry (such as channel, group or chat):
from telethon.sync import TelegramClient
from telethon.errors import SessionPasswordNeededError
client = TelegramClient(username, api_id, api_hash, proxy=("socks5", proxy_ip, proxy_port)) # if in your country telegram is banned, you can use the proxy, otherwise remove it.
client.start()
# for login
if not client.is_user_authorized():
client.send_code_request(phone)
try:
client.sign_in(phone, input('Enter the code: '))
except SessionPasswordNeededError:
client.sign_in(password=input('Password: '))
async for message in client.iter_messages(chat_id, wait_time=0):
messages.append(Message(message))
# write your code

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to decode Arrow Flight `FlightData` with a pure gRPC client - python

Related

Sending payload to IoT Hub for using in Azure Digital Twin using an Azure Function

How do I authorize with a Telethon QR code?

Python-binance futures user data websocket

Threads can only be started once in Django Channels

How can I download the chat history of a group in Telegram?

Categories

Resources