Writing blobs to the blobstore

Writing blobs to the blobstore - python

I am trying to write to the blobstore using the method described here:
http://code.google.com/appengine/docs/python/blobstore/overview.html#Writing_Files_to_the_Blobstore
I tried using the remote_api to execute the following code:
file_name = files.blobstore.create(mime_type='text/html',_blobinfo_uploaded_filename='sample.txt')
with files.open(file_name, 'a') as f:
f.write('sample text for the sample blob')
files.finalize(file_name)
invariably raises the error (at the third line above):
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 310, in write
self._make_rpc_call_with_retry('Append', request, response)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 388, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 236, in _make_call
_raise_app_error(e)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 179, in _raise_app_error
raise FileNotOpenedError()
FileNotOpenedError
The file i am trying to write is very small (< 20KB) so its not a quota issue. Are there additional steps i am missing?

maybe you can need to add below module but if you dont previously add.
from __future__ import with_statement
-->from google.appengine.api import files
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
file_name = files.blobstore.create(mime_type='text/plain',_blobinfo_uploaded_filename='sample.txt')
with files.open(file_name, 'a') as f:
f.write('sample text for the sample blob')
files.finalize(file_name)

Related

How to read all parquet files from a s3 bucket

I currently have an s3 bucket that has folders with parquet files inside. I want to read all the individual parquet files and concatenate them into a pandas dataframe regardless of the folder they are in.
I am trying the following code:
import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem()
pandas_dataframe = pq.ParquetDataset('s3://vivienda-test/2022/11', filesystem=s3).read_pandas().to_pandas()
print(pandas_dataframe)
I realize that it only works for concatenation the parquets of a specific folder of the bucket and it also gives me the following error:
Traceback (most recent call last):
File "/Users/Documents/inf.py", line 5, in <module>
pandas_dataframe = pq.ParquetDataset('s3://vivienda-test/2022/11', filesystem=s3).read_pandas().to_pandas()
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1790, in __init__
self.validate_schemas()
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1824, in validate_schemas
self._schema = self._pieces[0].get_metadata().schema
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1130, in get_metadata
f = self.open()
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1137, in open
reader = self.open_file_func(self.path)
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1521, in _open_dataset_file
return ParquetFile(
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 286, in __init__
self.reader.open(
File "pyarrow/_parquet.pyx", line 1227, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Parquet file size is 0 bytes
can someone help me?, thanks

You can use the aws wrangler api's to achieve the same.
https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.read_parquet.html
#Reading all Parquet files under a prefix
import awswrangler as wr
df = wr.s3.read_parquet(path='s3://bucket/prefix/')

How Do We Convert HTML to PDF using Python, Is there Any code please share it to me?

I have tried the Library called pytotree, But i didnt get any Answer
This is the code:
import pdftotree
file= open('C:/Users/chaitanya.naidu/Downloads/test.pdf', 'rb')
f = pdftotree.parse(file)
I am getting this error
Traceback (most recent call last):
File "<ipython-input-4-4a9a6b72801d>", line 1, in <module>
f = pdftotree.parse(file)
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\core.py", line 63, in parse
if not extractor.is_scanned():
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 121, in is_scanned
self.parse()
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 91, in parse
for page_num, layout in enumerate(analyze_pages(self.pdf_file)):
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\utils\pdf\pdf_utils.py", line 117, in analyze_pages
with open(os.path.realpath(file_name), "rb") as fp:
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\ntpath.py", line 542, in abspath
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not _io.BufferedReader

You can use pdfkit, example:
import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

Transform docx to html raises python MemoryError

I have a function that converts a docx to html and a large docx file to be converted.
The problem is this function is part of a bigger program and the converted html is parsed afterwards so I cannot afford to use another converter without impacting the rest of the code (which is not wanted). Running on python 2.7.13 installed on 32-bit, but changing to 64-bit is also not desired.
This is the function:
import logging
from ooxml import serialize
def trasnformDocxtoHtml(inputFile, outputFile):
logging.basicConfig(filename='ooxml.log', level=logging.INFO)
dfile = ooxml.read_from_file(inputFile)
with open(outputFile,'w') as htmlFile:
htmlFile.write( serialize.serialize(dfile.document))
and here's the error:
>>> import library
>>> library.trasnformDocxtoHtml(r'large_file.docx', 'output.html')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "library.py", line 9, in trasnformDocxtoHtml
dfile = ooxml.read_from_file(inputFile)
File "C:\Python27\lib\site-packages\ooxml\__init__.py", line 52, in read_from_file
dfile.parse()
File "C:\Python27\lib\site-packages\ooxml\docxfile.py", line 46, in parse
self._doc = parse_from_file(self)
File "C:\Python27\lib\site-packages\ooxml\parse.py", line 655, in parse_from_file
document = parse_document(doc_content)
File "C:\Python27\lib\site-packages\ooxml\parse.py", line 463, in parse_document
document.elements.append(parse_table(document, elem))
File "C:\Python27\lib\site-packages\ooxml\parse.py", line 436, in parse_table
for p in tc.xpath('./w:p', namespaces=NAMESPACES):
File "src\lxml\etree.pyx", line 1583, in lxml.etree._Element.xpath
MemoryError
no mem for new parser
MemoryError
Could I somehow increase the buffer memory in python? Or fix the function without impacting the html output format?

Reading TDMS File with python nptdms, cannot open tdms file

I am having issues with getting basic function of the nptdms module working.
First, I am just trying to open a TDMS file and print the contents of specific channels within specific groups.
Using python 2.7 and the nptdms quick start here
Following this, I will be writing these specific pieces of data into a new TDMS file. Then, my ultimate goal is to be able to take a set of source files, open each, and write (append) to a new file. The source data files contain far more information that is needed, so I am breaking out the specifics into their own file.
The problem I have is that I cannot get past a basic error.
When running this code, I get:
Traceback (most recent call last):
File "PullTDMSdataIntoNewFile.py", line 27, in <module>
tdms_file = TdmsFile(r"C:\\Users\daniel.worts\Desktop\this_is_my_tdms_file.tdms","r")
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 94, in __init__
self._read_segments(f)
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 119, in _read_segments
object._initialise_data(memmap_dir=self.memmap_dir)
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 709, in _initialise_data
mode='w+b', prefix="nptdms_", dir=memmap_dir)
File "C:\Anaconda2\lib\tempfile.py", line 475, in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
File "C:\Anaconda2\lib\tempfile.py", line 244, in _mkstemp_inner
fd = _os.open(file, flags, 0600)
OSError: [Errno 2] No such file or directory: 'r\\nptdms_yjfyam'
Here is my code:
from nptdms import TdmsFile
import numpy as np
import pandas as pd
#set Tdms file path
tdms_file = TdmsFile(r"C:\\Users\daniel.worts\Desktop\this_is_my_tdms_file.tdms","r")
# set variable for TDMS groups
group_nameone = '101'
group_nametwo = '752'
# set objects for TDMS channels
channel_dataone = tdms_file.object(group_nameone 'Payload_1')
channel_datatwo = tdms_file.object(group_nametwo, 'Payload_2')
# set data from channels
data_dataone = channel_dataone.data
data_datatwo = channel_datatwo.data
print data_dataone
print data_datatwo
Big thanks to anyone who may have encountered this before and can help point to what I am missing.
Best,
- Dan
edit:
Solved the read data issue by removing the 'r' argument from the file path.
Now I am having another error I can't trace when trying to write.
from nptdms import TdmsFile, TdmsWriter, RootObject, GroupObject, ChannelObject
import numpy as np
import pandas as pd
newfilepath = r"C:\\Users\daniel.worts\Desktop\Mined.tdms"
datetimegroup101_channel_object = ChannelObject('101', DateTime, data_datetimegroup101)
with TdmsWriter(newfilepath) as tdms_writer:
tdms_writer.write_segment([datetimegroup101_channel_object])
Returns error:
Traceback (most recent call last):
File "PullTDMSdataIntoNewFile.py", line 82, in <module>
tdms_writer.write_segment([datetimegroup101_channel_object])
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 68, in write_segment
segment = TdmsSegment(objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 88, in __init__
paths = set(obj.path for obj in objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 88, in <genexpr>
paths = set(obj.path for obj in objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 254, in path
self.channel.replace("'", "''"))
AttributeError: 'TdmsObject' object has no attribute 'replace'

Python lib execute error

I made this python lib and it had this function with uses urllib and urllib2 but when i execute the lib's functions from python shell i get this error
>>> from sabermanlib import geturl
>>> geturl("roblox.com","ggg.html")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
geturl("roblox.com","ggg.html")
File "sabermanlib.py", line 21, in geturl
urllib.urlretrieve(Address,File)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 240, in retrieve
fp = self.open(url, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 463, in open_file
return self.open_local_file(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 477, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] The system cannot find the file specified: 'roblox.com'
>>>
and here's the code for the lib i made:
import urllib
import urllib2
def geturl(Address,File):
urllib.urlretrieve(Address,File)
EDIT 2
I cant understand why i get this error in the python shell executing:
geturl(Address,File)

You don't want urllib.urlretrieve. This takes a file-like object. Instead, you want urllib.urlopen:
>>> help(urllib.urlopen)
urlopen(url, data=None, proxies=None)
Create a file-like object for the specified URL to read from.
Additionally, if you want to download and save a document, you'll need a more robust geturl function:
def geturl(Address, FileName):
html_data = urllib.urlopen(Address).read() # Open the URL
with open(FileName, 'wb') as f: # Open the file
f.write(html_data) # Write data from URL to file
geturl(u'http://roblox.com') # URL's must contain the full URI, including http://

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing blobs to the blobstore - python

Related

How to read all parquet files from a s3 bucket

How Do We Convert HTML to PDF using Python, Is there Any code please share it to me?

Transform docx to html raises python MemoryError

Reading TDMS File with python nptdms, cannot open tdms file

Python lib execute error

Categories

Resources