os.pipe() function in google app engine - python

I am currently trying to zip a large file (> 1GB) using python on google app engine and I have used the following solution due to the limitations google app engine places on the memory cache for a process. Create a zip file from a generator in Python?
When I run the code on the app engine, I get the following error
Traceback (most recent call last):
File "/base/data/home/apps/s~whohasfiles/frontend.379535120592235032/gluon/restricted.py", line 212, in restricted
exec ccode in environment
File "/base/data/home/apps/s~whohasfiles/frontend.379535120592235032/applications/onefile/controllers/page.py", line 742, in <module>
File "/base/data/home/apps/s~whohasfiles/frontend.379535120592235032/gluon/globals.py", line 194, in <lambda>
self._caller = lambda f: f()
File "/base/data/home/apps/s~whohasfiles/frontend.379535120592235032/applications/onefile/controllers/page.py", line 673, in download
zip_response = page_store.gcs_zip_page(page, visitor)
File "applications/onefile/modules/page_store.py", line 339, in gcs_zip_page
w = z.start_entry(ZipInfo('%s-%s' %(file.created_on, file.name) ))
File "applications/onefile/modules/page_store.py", line 481, in start_entry
r, w = os.pipe()
OSError: [Errno 38] Function not implemented
Does the google app engine not support the OS.pipe() function?
How can I get a work around please?

The 'os' module is available but with unsupported features disabled such as pipe() as it operates on file objects [1]. You would need to use a Google Cloud Storage bucket as a temporary object as there is no concept of file objects you can use for storage local to the App Engine runtime. The GCS Client Library will give you file-like access to a bucket which you can use for this purpose [2]. Every app has access to a default storage bucket which you may need to first activate [3].
[1] https://cloud.google.com/appengine/docs/python/#Python_Pure_Python
[2] https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
[3] https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/activate

Related

Unable to append new dataframe in previous dataframe From blobstorage

After HTTP trigger, I want to read .csv file from blob storage and append new data in that file.
and want to save data in .csv format to blob storage. Please help me out.....
scriptpath=os.path.abspath(__file__)
scriptdir=os.path.dirname(scriptpath)
train=os.path.join(scriptdir,'train.csv')
train_df=pd.read_csv(train)
train_df=train_df.append(test_df)
train_df.to_csv(scriptdir+'tt.csv')
block_blob_service.create_blob_from_path(
'files',
'mytest.csv',
scriptdir+'tt.csv',
content_settings=ContentSettings(content_type='application/CSV')
)
my problem is after appending data, I have to save that data to blob storage. So that, I have to save all data in csv file but the above error comes. Https trigger don't give me permission to save csv file. error shows
Exception: PermissionError: [Errno 13] Permission denied:
'C:\\Users\\Shiva\\Desktop\\project\\loumus\\Imagetrigger'
Stack: File "C:\Program Files\Microsoft\Azure Functions Core Tools\workers\python\3.7/WINDOWS/X64\azure_functions_worker\dispatcher.py", line 357, in _handle__invocation_request
self.__run_sync_func, invocation_id, fi.func, args)
File "C:\Users\Shiva\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\Program Files\Microsoft\Azure Functions Core Tools\workers\python\3.7/WINDOWS/X64\azure_functions_worker\dispatcher.py", line 542, in __run_sync_func
return func(**params)
File "C:\Users\Shiva\Desktop\project\loumus\Imagetrigger\__init__.py", line 276, in main
mm.to_csv(scriptdir,'tt.csv')
File "C:\Users\Shiva\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3403, in to_csv
storage_options=storage_options,
File "C:\Users\Shiva\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\formats\format.py", line 1083, in to_csv
csv_formatter.save()
File "C:\Users\Shiva\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\formats\csvs.py", line 234, in save
storage_options=self.storage_options,
File "C:\Users\Shiva\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 647, in get_handle
newline="",
There are several issues.
Azure Functions run inside a managed runtime environment, you do not have same level of access to a local storage/disk like you would when running on a laptop. Not to say you don't have local disk. STW and RTM:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#service-limits (see storage)
https://github.com/Azure/Azure-Functions/issues/179
how much local disk is available in an Azure Function execution context
https://github.com/Azure/azure-functions-host/wiki/Retrieving-information-about-the-currently-running-function
Azure Functions Temp storage
Use tempdir to create a temp directory. This would create it in area designated by underlying OS as temp storage.
There is no specific reason to write to local storage and then upload to ADLS. You could:
Write csv file in memory (e.g. StringIO) then use SDK to write that str to a ADLS.
Install appropriate drivers (not sure if you're using pyspark or pandas or something else) and directly write Dataframe to ADLS. E.g. see one example.

ReadFromKafka throws ValueError: Unsupported signal: 2

Currently I try to get the hang of apache beam together with apache kafka.
The Kafka service is running (locally) and I write with the kafka-console-producer some test messages.
First I wrote this Java Codesnippet to test apache beam with a language that I know. And it works as expected.
public class Main {
public static void main(String[] args) {
Pipeline pipeline = Pipeline.create();
Read<Long, String> kafkaReader = KafkaIO.<Long, String>read()
.withBootstrapServers("localhost:9092")
.withTopic("beam-test")
.withKeyDeserializer(LongDeserializer.class)
.withValueDeserializer(StringDeserializer.class);
kafkaReader.withoutMetadata();
pipeline
.apply("Kafka", kafkaReader
).apply(
"Extract words", ParDo.of(new DoFn<KafkaRecord<Long, String>, String>() {
#ProcessElement
public void processElement(ProcessContext c){
System.out.println("Key:" + c.element().getKV().getKey() + " | Value: " + c.element().getKV().getValue());
}
})
);
pipeline.run();
}
}
My goal is to write that same in python and this is what I´m currently at:
def run_pipe():
with beam.Pipeline(options=PipelineOptions()) as p:
(p
| 'Kafka Unbounded' >> ReadFromKafka(consumer_config={'bootstrap.servers' : 'localhost:9092'}, topics=['beam-test'])
| 'Test Print' >> beam.Map(print)
)
if __name__ == '__main__':
run_pipe()
Now to the problem. When I try to run the python code, I get the following error:
(app) λ python ArghKafkaExample.py
Traceback (most recent call last):
File "ArghKafkaExample.py", line 22, in <module>
run_pipe()
File "ArghKafkaExample.py", line 10, in run_pipe
(p
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\transforms\ptransform.py", line 1028, in __ror__
return self.transform.__ror__(pvalueish, self.label)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\transforms\ptransform.py", line 572, in __ror__
result = p.apply(self, pvalueish, label)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\pipeline.py", line 648, in apply
return self.apply(transform, pvalueish)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\pipeline.py", line 691, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\runners\runner.py", line 198, in apply
return m(transform, input, options)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\runners\runner.py", line 228, in apply_PTransform
return transform.expand(input)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\transforms\external.py", line 322, in expand
self._expanded_components = self._resolve_artifacts(
File "C:\Users\gamef\AppData\Local\Programs\Python\Python38\lib\contextlib.py", line 120, in __exit__
next(self.gen)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\transforms\external.py", line 372, in _service
yield stub
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\transforms\external.py", line 523, in __exit__
self._service_provider.__exit__(*args)
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\utils\subprocess_server.py", line 74, in __exit__
self.stop()
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\utils\subprocess_server.py", line 133, in stop
self.stop_process()
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\utils\subprocess_server.py", line 179, in stop_process
return super(JavaJarServer, self).stop_process()
File "C:\Users\gamef\git\BeamMeScotty\app\lib\site-packages\apache_beam\utils\subprocess_server.py", line 143, in stop_process
self._process.send_signal(signal.SIGINT)
File "C:\Users\gamef\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1434, in send_signal
raise ValueError("Unsupported signal: {}".format(sig))
ValueError: Unsupported signal: 2
From googling I found out, that it has something to do with program exit codes (like Strg+C), but overall I have absolut no idea what the problem is.
Any advice would be helpful!
Greetings Pascal
Your pipeline code seems correct here. The issue is due to the requirements of the Kafka IO in the Python SDK. From the module documentation:
These transforms are currently supported by Beam portable runners (for example, portable Flink and Spark) as well as Dataflow runner.
Transforms provided in this module are cross-language transforms implemented in the Beam Java SDK. During the pipeline construction, Python SDK will connect to a Java expansion service to expand these transforms. To facilitate this, a small amount of setup is needed before using these transforms in a Beam Python pipeline.
Kafka IO is implemented in Python as a cross-language transform in Java and your pipeline is failing because you haven't set up your environment to execute cross-language transforms. To explain what a cross-language transform is in layman's terms: it means that the Kafka transform is actually executing on the Java SDK rather than the Python SDK, so it can make use of the existing Kafka code on Java.
There are two barriers preventing your pipeline from working. The easier one to fix is that only the runners I quoted above support cross-language transforms, so if you're running this pipeline with the Direct runner it won't work, you'll want to switch to either the Flink or Spark runner in local mode.
The more tricky barrier is that you need to start up an Expansion Service to be able to add external transforms to your pipeline. The stacktrace you're getting is happening because Beam is attempting to expand the transform but is unable to connect to the expansion service, and the expansion fails.
If you still want to try running this with cross-language despite the extra setup, the documentation I linked contains instructions for running an expansion service. At the time I am writing this answer this feature is still new, and there might be blind spots in the documentation. If you run into problems, I encourage you to ask questions on the Apache Beam users mailing list or Apache Beam slack channel.

Logging into GCP SQL: How to Ensure PromptSession Is Imported or Otherwise Resolve

I am trying to use the Cloud Shell to update some user permissions. I am logging in using gcloud sql connect my-instance --user=root
gcloud sql connect my-instance
Whitelisting your IP for incoming connection for 5 minutes...done.
Connecting to database with SQL user [sqlserver].********************************************************************************
Python command will soon point to Python v3.7.3.
Python 2 will be sunsetting on January 1, 2020.
See http://https://www.python.org/doc/sunset-python-2/
Until then, you can continue using Python 2 at /usr/bin/python2, but soon
/usr/bin/python symlink will point to /usr/local/bin/python3.
To suppress this warning, create an empty ~/.cloudshell/no-python-warning file.
The command will automatically proceed in seconds or on any key.
********************************************************************************
> Password:
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/mssqlcli/main.py", line 117, in <module>
main()
File "/usr/local/lib/python2.7/dist-packages/mssqlcli/main.py", line 110, in main
run_cli_with(mssqlcli_options)
File "/usr/local/lib/python2.7/dist-packages/mssqlcli/main.py", line 43, in run_cli_with
from mssqlcli.mssql_cli import MssqlCli
File "/usr/local/lib/python2.7/dist-packages/mssqlcli/mssql_cli.py", line 18, in <module>
from prompt_toolkit.shortcuts import PromptSession, CompleteStyle
ImportError: cannot import name PromptSession
A) I have made the root user's password so insecure and easy there is no way I am mistyping it.
B) It is the third of January, so I really don't know what this Python version error is on about. I made the file but FYI ~/.cloudshell did not exist so I had to make it first. Even so, it just suppresses the version warning, the main error persists when I try to log in.
The documentation acknowledges there are a couple other login methods using glcoud beta sql connect, but that gets me another error
2020/01/04 18:38:41 Rlimits for file descriptors set to {&{8500 1048576}}
2020/01/04 18:38:41 invalid json file "/tmp/tmp.s38C662KKr/legacy_credentials/me#gmail.com/adc.json": open /tmp/tmp.s38C662KKr/legacy_credentials/me#gmail.com/adc.json: no such file or directory
ERROR: (gcloud.beta.sql.connect) Failed to start the Cloud SQL Proxy.
Same for alpha.
This is the first thing I have typed into Cloud Shell, so I can't imagine what could have broken PromptSession.
How can I resolve this error and log into SQL Server using Cloud Shell?
There most likely is an issue while attempting to connect from the Cloud Shell (I managed to connect from a Compute Engine instance with this command); possibly related to the Python run-time / environment variable. It has been reported here. Engineering is aware and are looking into it.

Error running Google's Cloud Vision API Example (Face Detection)

I am trying to run the Face Detection Example in Google's Cloud Vision API. I am trying to run [faces.py here][1].
When I run the following:
faces.py demo-picture.jpg
below is the error I get:
ubuntu#ubuntu-VirtualBox:~/Documents/code/python- stuff/googleapis/cloudvisionapi/cloud-vision/python/face_detection$ python faces.py demo-image.jpg
Traceback (most recent call last):
File "faces.py", line 121, in <module>
main(args.input_image, args.output, args.max_results)
File "faces.py", line 98, in main
faces = detect_face(image, max_results)
File "faces.py", line 62, in detect_face
service = get_vision_service()
File "faces.py", line 35, in get_vision_service
credentials = GoogleCredentials.get_application_default()
File "/home/ubuntu/.local/lib/python2.7/site- packages/oauth2client/client.py", line 1398, in get_application_default
return GoogleCredentials._get_implicit_credentials()
File "/home/ubuntu/.local/lib/python2.7/site- packages/oauth2client/client.py", line 1388, in _get_implicit_credentials
raise ApplicationDefaultCredentialsError(ADC_HELP_MSG)
oauth2client.client.ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application- default-credentials for more information.
ubuntu#ubuntu-VirtualBox:~/Documents/code/python- stuff/googleapis/cloudvisionapi/cloud-vision/python/face_detection$
[1]: https://github.com/GoogleCloudPlatform/cloud- vision/tree/master/python/face_detection
I guess my question is -- how do I do this:
Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials.
You need to download the service-account key; typically a JSON file.
If you have not created the credentials/obtained the key, follow the steps:
Go to your API manager;
Create credentials;
Choose "Service Account Key";
Select "Key Type" as JSON.
After this point, you should obtain a JSON file.
Once you obtain the key, go to your BASHRC (~/.bashrc) and add the following:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/JSON
Then restart your bash by
exec bash
Now, re-run your faces.py.

SSLError on Google App Engine (local dev-server)

When I try to use boto library on App Engine, I get the next error:
Traceback (most recent call last):
File "C:\Program Files\Google\google_appengine\google\appengine\ext\webapp\_webapp25.py", line 701, in __call__
handler.get(*groups)
File "E:\Probes\pruebas\pruebasAWS\main.py", line 26, in get
conn = S3Connection('<KEY1>','<KEY2>')
File "E:\Probes\pruebas\pruebasAWS\boto\s3\connection.py", line 148, in __init__
path=path, provider=provider)
File "E:\Probes\pruebas\pruebasAWS\boto\connection.py", line 231, in __init__
self.http_unretryable_exceptions.append(ssl.SSLError)
AttributeError: 'module' object has no attribute 'SSLError'
I´ve installed OpenSSL and Python 2.7. OpenSSL and SSL library for python are running, and when i deploy the App to the Google Infrastructure, it works fine. The problem comes when I try to execute the app on my local machine.
The code is:
from google.appengine.ext import webapp
from google.appengine.ext.webapp import util
from boto.s3.connection import S3Connection
import hashlib
class MainHandler(webapp.RequestHandler):
def get(self):
conn = S3Connection('<KEY1>','<KEY2>')
bucket = conn.create_bucket(hashlib.md5('noTRePeaTedBuCket').hexdigest()+"probe")
if bucket:
self.response.out.write('Bucket creado')
else:
self.response.out.write('Bucket NO creado')
The actual issue here is that AppEngine mucks around with things to make it impossible to import certain standard, builtin python modules such as ssl.
There was some conversation about this on the boto IRC and one of the users came up with this patch:
https://github.com/samba/boto/commit/6f1ab73d92ff6fb2589362bbdadf6bbe66811e7e
Some form of this will probably be merged into boto master soon.

Categories