Can't send BytesIO to Telegram bot: - python

Situation.
I need to create in-memory csv file and send it to bot.
According to this article and bot documentation I assume I can send csv files. Difference: i use another api wrapper: telebot. According to docs it also allows to send binary files.
So I am trying to test this like that:
def test_buf():
import csv
import io
test_data = [[1, 2, 3], ["a", "b", "c"]]
# csv module can write data in io.StringIO buffer only
s = io.StringIO()
csv.writer(s).writerows(test_data)
s.seek(0)
# python-telegram-bot library can send files only from io.BytesIO buffer
# we need to convert StringIO to BytesIO
buf = io.BytesIO()
# extract csv-string, convert it to bytes and write to buffer
buf.write(s.getvalue().encode())
buf.seek(0)
# set a filename with file's extension
buf.name = 'test_data.csv'
return buf
And then
csv_output = test_buf()
bot.send_document(chat_id, csv_output, caption='Caption_text')
And I get an error:
"ApiTelegramException occurred, args=("A request to the Telegram API was unsuccessful. Error code: 400 Description: Bad Request: can't parse entities: Can't find end of the entity starting at byte offset 7",)
What do i do wrong? Maybe I don't fully understand difference between BytesIO and binary format? If it's different how to transform bytesio to binary in-memory so i could send data
For now I can send file but without telebot library:
requests.post(f'https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendDocument',
data={"chat_id": chat_id, "caption": 'caption_text'},
files={'document': csv_output})
So I do not understand what an issue with library. It like I sure that I just don't understand some basic stuff.

Related

how to use avro with python to serialize dictionary and write as bytes to bytesio to read and deserialize with schema correctly?

I want to use avro to serialize dictionary to produce bytestring, write it to io.BytesIO, read it and deserialize.
Q1: shall I load the schema from avro file as avro.schema.RecordSchema or can i load it from json file as json with json.load?
Q2: when BytesIO used shall I do seek(0) ?
Q3: I use BytesIO just so pass serialized bytestring to read it and deserialize. I want to do this in memory hence why I do not write / read file. is it ok?
import io
import json
import avro.io
import avro.schema
msg = {"name": "foo", "favorite_number": 1, "favorite_color": "pink"}
with open("schema", "rb") as f:
SCHEMA = avro.schema.parse(f.read())
writer = avro.io.DatumWriter(SCHEMA)
bytes_writer = io.BytesIO()
encoder = avro.io.BinaryEncoder(bytes_writer)
writer.write(msg, encoder)
b = bytes_writer.getvalue()
reader = avro.io.DatumReader(SCHEMA)
bytes_reader = io.BytesIO(b)
decoder = avro.io.BinaryDecoder(bytes_reader)
deserialized_json = reader.read(decoder)
EDIT:
Documentation contains the example with serde and file write/read.
https://avro.apache.org/docs/1.8.2/gettingstartedpython.pdf
They use DataFileWriter and it does
verify that the items we write are valid items and write the appropriate fields.
according to documentation. if I don't use it and use DatumWriter only to write to BytesIO am I doing all ok? The documentation says I can use DatumWriter separately.

Minio Python Client: Upload Bytes directly

I read the minio docs and I see two methods to upload data:
put_object() this needs a io-stream
fput_object() this reads a file on disk
I want to test minio and upload some data I just created with numpy.random.bytes().
How to upload data which is stored in a variable in the python interpreter?
Take a look at io.BytesIO. These allow you to wrap byte arrays up in a stream which you can give to minio.
For example:
import io
from minio import Minio
value = "Some text I want to upload"
value_as_bytes = value.encode('utf-8')
value_as_a_stream = io.BytesIO(value_as_bytes)
client = Minio("my-url-here", ...) # Edit this bit to connect to your Minio server
client.put_object("my_bucket", "my_key", value_as_a_stream , length=len(value_as_bytes))
I was in a similar situation: trying to store a pandas DataFrame as a feather file into minio.
I needed to store bytes directly using the Minio client. In the end the code looked like that:
from io import BytesIO
from pandas import df
from numpy import random
import minio
# Create the client
client = minio.Minio(
endpoint="localhost:9000",
access_key="access_key",
secret_key="secret_key",
secure=False
)
# Create sample dataset
df = pd.DataFrame({
"a": numpy.random.random(size=1000),
})
# Create a BytesIO instance that will behave like a file opended in binary mode
feather_output = BytesIO()
# Write feather file
df.to_feather(feather_output)
# Get numver of bytes
nb_bytes = feather_output.tell()
# Go back to the start of the opened file
feather_output.seek(0)
# Put the object into minio
client.put_object(
bucket_name="datasets",
object_name="demo.feather",
length=nb_bytes,
data=feather_output
)
I had to use .seek(0) in order for minio to be able to insert correct amounts of bytes.
#gcharbon: this solution does not work for me. client.put_object() does only accept bytes like objects.
Here is my solution:
from minio import Minio
import pandas as pd
import io
#Can use a string with csv data here as well
csv_bytes = df.to_csv().encode('utf-8')
csv_buffer = io.BytesIO(csv_bytes)
# Create the client
client = Minio(
endpoint="localhost:9000",
access_key="access_key",
secret_key="secret_key",
secure=False
)
client.put_object("bucketname",
"objectname",
data=csv_buffer,
length=len(csv_bytes),
content_type='application/csv')

Send data as .csv email attachmet Python

I have a data, let's say
data = [
['header_1', 'header_2'],
['row_1_!', 'row_1_2'],
['row_2_1', 'row_2_2'],
]
I need to send that data as .csv file attachment to email message.
I can not save it as .csv and then attach existing csv - application is working in Googpe App Engine sandbox environment. so no files can be saved.
As I understand, email attachment consists of file name and file encoded as base64.
I tried to make attachment body in the following way:
import csv
if sys.version_info >= (3, 0):
from io import StringIO
else:
from StringIO import StringIO
in_memory_data = StringIO()
csv.writer(inmemory_data).writerows(data)
encoded = base64.b64encode(inmemory_data.getvalue())
But in result I have received by email not valid file 2 columns and 3 rows, but just one string in file (see the picture).
csv_screen
What I'm doing wrong?
I've found out the mistake. I should have been convert it to bytearray instead of encoding to base64:
encoded = bytearray(inmemory_data.getvalue(), "utf-8")
Worked fine that way.

How do I read a csv stored in S3 with csv.DictReader?

I have code that fetches an AWS S3 object. How do I read this StreamingBody with Python's csv.DictReader?
import boto3, csv
session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>)
s3_resource = session.resource('s3')
s3_object = s3_resource.Object(<bucket>, <key>)
streaming_body = s3_object.get()['Body']
#csv.DictReader(???)
The code would be something like this:
import boto3
import csv
# get a handle on s3
s3 = boto3.resource(u's3')
# get a handle on the bucket that holds your file
bucket = s3.Bucket(u'bucket-name')
# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key=u'test.csv')
# get the object
response = obj.get()
# read the contents of the file and split it into a list of lines
# for python 2:
lines = response[u'Body'].read().split()
# for python 3 you need to decode the incoming bytes:
lines = response['Body'].read().decode('utf-8').split()
# now iterate over those lines
for row in csv.DictReader(lines):
# here you get a sequence of dicts
# do whatever you want with each line here
print(row)
You can compact this a bit in actual code, but I tried to keep it step-by-step to show the object hierarchy with boto3.
Edit Per your comment about avoiding reading the entire file into memory: I haven't run into that requirement so cant speak authoritatively, but I would try wrapping the stream so I could get a text file-like iterator. For example you could use the codecs library to replace the csv parsing section above with something like:
for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])):
print(row)

Working with Python Requests response raw file-like object (process pcap file without saving it to disk)

A pcap file is downloaded from url with the help of Python (2.7.9) Requests library:
import requests
response = requests.get('http://example.com/path/1.pcap', stream=True)
According to documentation response.raw is a file-like object and my goal is to process the downloaded file without saving it to disk.
I first looked at Scapy and Pyshark libraries for .pcap file parsing but their functions (rdpcap and FileCapture) accept file path string as an argument. pcap.Reader from dpkt library accepts a file object. The first try pcap=dpkt.pcap.Reader(resonse.raw) gave an error:
AttributeError: 'HTTPResponse' object has no attribute 'name'
Name attribute was added:
setattr(response.raw,'name', 'test.pcap')
After that pcap=dpkt.pcap.Reader(resonse.raw) didn't give any errors but pcap.readpkts() failed with
io.UsupportedOperation: seek
And indeed response.raw.seekable() returns False.
I tried setting response.raw.decode_content = True but that didn't help.
Is there a solution for processing the object the way I'm trying? Maybe additional request parameters are required to get a seekable response object?
By the way, if response object is written to file (shutil.copyfileobj(response.raw,file)), dpkt succeeds in working with that file afterwards.
Support for StringIO objects was recently added to dpkt. So, now you can create a StringIO object from your string and then pass the to pcap.Reader
To create a StringIO object from a string:
from StringIO import StringIO
data = StringIO("aaaaa..aa")
You can then do
import dpkt
from StringIO import StringIO
import requests
response = requests.get('http://example.com/path/1.pcap', stream=True)
data = StringIO(response.raw)
pcap = dpkt.pcap.Reader(data)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
...

Categories