Writing blobs to the blobstore - python

I am trying to write to the blobstore using the method described here:
http://code.google.com/appengine/docs/python/blobstore/overview.html#Writing_Files_to_the_Blobstore
I tried using the remote_api to execute the following code:
file_name = files.blobstore.create(mime_type='text/html',_blobinfo_uploaded_filename='sample.txt')
with files.open(file_name, 'a') as f:
f.write('sample text for the sample blob')
files.finalize(file_name)
invariably raises the error (at the third line above):
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 310, in write
self._make_rpc_call_with_retry('Append', request, response)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 388, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 236, in _make_call
_raise_app_error(e)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\file
s\file.py", line 179, in _raise_app_error
raise FileNotOpenedError()
FileNotOpenedError
The file i am trying to write is very small (< 20KB) so its not a quota issue. Are there additional steps i am missing?

maybe you can need to add below module but if you dont previously add.
from __future__ import with_statement
-->from google.appengine.api import files
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
file_name = files.blobstore.create(mime_type='text/plain',_blobinfo_uploaded_filename='sample.txt')
with files.open(file_name, 'a') as f:
f.write('sample text for the sample blob')
files.finalize(file_name)

Related

How to read all parquet files from a s3 bucket

I currently have an s3 bucket that has folders with parquet files inside. I want to read all the individual parquet files and concatenate them into a pandas dataframe regardless of the folder they are in.
I am trying the following code:
import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem()
pandas_dataframe = pq.ParquetDataset('s3://vivienda-test/2022/11', filesystem=s3).read_pandas().to_pandas()
print(pandas_dataframe)
I realize that it only works for concatenation the parquets of a specific folder of the bucket and it also gives me the following error:
Traceback (most recent call last):
File "/Users/Documents/inf.py", line 5, in <module>
pandas_dataframe = pq.ParquetDataset('s3://vivienda-test/2022/11', filesystem=s3).read_pandas().to_pandas()
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1790, in __init__
self.validate_schemas()
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1824, in validate_schemas
self._schema = self._pieces[0].get_metadata().schema
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1130, in get_metadata
f = self.open()
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1137, in open
reader = self.open_file_func(self.path)
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 1521, in _open_dataset_file
return ParquetFile(
File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 286, in __init__
self.reader.open(
File "pyarrow/_parquet.pyx", line 1227, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Parquet file size is 0 bytes
can someone help me?, thanks
You can use the aws wrangler api's to achieve the same.
https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.read_parquet.html
#Reading all Parquet files under a prefix
import awswrangler as wr
df = wr.s3.read_parquet(path='s3://bucket/prefix/')

How Do We Convert HTML to PDF using Python, Is there Any code please share it to me?

I have tried the Library called pytotree, But i didnt get any Answer
This is the code:
import pdftotree
file= open('C:/Users/chaitanya.naidu/Downloads/test.pdf', 'rb')
f = pdftotree.parse(file)
I am getting this error
Traceback (most recent call last):
File "<ipython-input-4-4a9a6b72801d>", line 1, in <module>
f = pdftotree.parse(file)
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\core.py", line 63, in parse
if not extractor.is_scanned():
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 121, in is_scanned
self.parse()
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 91, in parse
for page_num, layout in enumerate(analyze_pages(self.pdf_file)):
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\utils\pdf\pdf_utils.py", line 117, in analyze_pages
with open(os.path.realpath(file_name), "rb") as fp:
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\ntpath.py", line 542, in abspath
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not _io.BufferedReader
You can use pdfkit, example:
import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

Transform docx to html raises python MemoryError

I have a function that converts a docx to html and a large docx file to be converted.
The problem is this function is part of a bigger program and the converted html is parsed afterwards so I cannot afford to use another converter without impacting the rest of the code (which is not wanted). Running on python 2.7.13 installed on 32-bit, but changing to 64-bit is also not desired.
This is the function:
import logging
from ooxml import serialize
def trasnformDocxtoHtml(inputFile, outputFile):
logging.basicConfig(filename='ooxml.log', level=logging.INFO)
dfile = ooxml.read_from_file(inputFile)
with open(outputFile,'w') as htmlFile:
htmlFile.write( serialize.serialize(dfile.document))
and here's the error:
>>> import library
>>> library.trasnformDocxtoHtml(r'large_file.docx', 'output.html')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "library.py", line 9, in trasnformDocxtoHtml
dfile = ooxml.read_from_file(inputFile)
File "C:\Python27\lib\site-packages\ooxml\__init__.py", line 52, in read_from_file
dfile.parse()
File "C:\Python27\lib\site-packages\ooxml\docxfile.py", line 46, in parse
self._doc = parse_from_file(self)
File "C:\Python27\lib\site-packages\ooxml\parse.py", line 655, in parse_from_file
document = parse_document(doc_content)
File "C:\Python27\lib\site-packages\ooxml\parse.py", line 463, in parse_document
document.elements.append(parse_table(document, elem))
File "C:\Python27\lib\site-packages\ooxml\parse.py", line 436, in parse_table
for p in tc.xpath('./w:p', namespaces=NAMESPACES):
File "src\lxml\etree.pyx", line 1583, in lxml.etree._Element.xpath
MemoryError
no mem for new parser
MemoryError
Could I somehow increase the buffer memory in python? Or fix the function without impacting the html output format?

Reading TDMS File with python nptdms, cannot open tdms file

I am having issues with getting basic function of the nptdms module working.
First, I am just trying to open a TDMS file and print the contents of specific channels within specific groups.
Using python 2.7 and the nptdms quick start here
Following this, I will be writing these specific pieces of data into a new TDMS file. Then, my ultimate goal is to be able to take a set of source files, open each, and write (append) to a new file. The source data files contain far more information that is needed, so I am breaking out the specifics into their own file.
The problem I have is that I cannot get past a basic error.
When running this code, I get:
Traceback (most recent call last):
File "PullTDMSdataIntoNewFile.py", line 27, in <module>
tdms_file = TdmsFile(r"C:\\Users\daniel.worts\Desktop\this_is_my_tdms_file.tdms","r")
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 94, in __init__
self._read_segments(f)
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 119, in _read_segments
object._initialise_data(memmap_dir=self.memmap_dir)
File "C:\Anaconda2\lib\site-packages\nptdms\tdms.py", line 709, in _initialise_data
mode='w+b', prefix="nptdms_", dir=memmap_dir)
File "C:\Anaconda2\lib\tempfile.py", line 475, in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
File "C:\Anaconda2\lib\tempfile.py", line 244, in _mkstemp_inner
fd = _os.open(file, flags, 0600)
OSError: [Errno 2] No such file or directory: 'r\\nptdms_yjfyam'
Here is my code:
from nptdms import TdmsFile
import numpy as np
import pandas as pd
#set Tdms file path
tdms_file = TdmsFile(r"C:\\Users\daniel.worts\Desktop\this_is_my_tdms_file.tdms","r")
# set variable for TDMS groups
group_nameone = '101'
group_nametwo = '752'
# set objects for TDMS channels
channel_dataone = tdms_file.object(group_nameone 'Payload_1')
channel_datatwo = tdms_file.object(group_nametwo, 'Payload_2')
# set data from channels
data_dataone = channel_dataone.data
data_datatwo = channel_datatwo.data
print data_dataone
print data_datatwo
Big thanks to anyone who may have encountered this before and can help point to what I am missing.
Best,
- Dan
edit:
Solved the read data issue by removing the 'r' argument from the file path.
Now I am having another error I can't trace when trying to write.
from nptdms import TdmsFile, TdmsWriter, RootObject, GroupObject, ChannelObject
import numpy as np
import pandas as pd
newfilepath = r"C:\\Users\daniel.worts\Desktop\Mined.tdms"
datetimegroup101_channel_object = ChannelObject('101', DateTime, data_datetimegroup101)
with TdmsWriter(newfilepath) as tdms_writer:
tdms_writer.write_segment([datetimegroup101_channel_object])
Returns error:
Traceback (most recent call last):
File "PullTDMSdataIntoNewFile.py", line 82, in <module>
tdms_writer.write_segment([datetimegroup101_channel_object])
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 68, in write_segment
segment = TdmsSegment(objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 88, in __init__
paths = set(obj.path for obj in objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 88, in <genexpr>
paths = set(obj.path for obj in objects)
File "C:\Anaconda2\lib\site-packages\nptdms\writer.py", line 254, in path
self.channel.replace("'", "''"))
AttributeError: 'TdmsObject' object has no attribute 'replace'

Python lib execute error

I made this python lib and it had this function with uses urllib and urllib2 but when i execute the lib's functions from python shell i get this error
>>> from sabermanlib import geturl
>>> geturl("roblox.com","ggg.html")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
geturl("roblox.com","ggg.html")
File "sabermanlib.py", line 21, in geturl
urllib.urlretrieve(Address,File)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 240, in retrieve
fp = self.open(url, data)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 463, in open_file
return self.open_local_file(url)
File "C:\Users\Andres\Desktop\ddd\Portable Python 2.7.5.1\App\lib\urllib.py", line 477, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] The system cannot find the file specified: 'roblox.com'
>>>
and here's the code for the lib i made:
import urllib
import urllib2
def geturl(Address,File):
urllib.urlretrieve(Address,File)
EDIT 2
I cant understand why i get this error in the python shell executing:
geturl(Address,File)
You don't want urllib.urlretrieve. This takes a file-like object. Instead, you want urllib.urlopen:
>>> help(urllib.urlopen)
urlopen(url, data=None, proxies=None)
Create a file-like object for the specified URL to read from.
Additionally, if you want to download and save a document, you'll need a more robust geturl function:
def geturl(Address, FileName):
html_data = urllib.urlopen(Address).read() # Open the URL
with open(FileName, 'wb') as f: # Open the file
f.write(html_data) # Write data from URL to file
geturl(u'http://roblox.com') # URL's must contain the full URI, including http://

Categories