Fetching .zip files from database - python

I have a database which contains my released project in.zip format, so the problem here is that, after certain period of time , i have to download those .zip files from my db and mail it to customers. Its a manual process.So is there any way/framework available in python that can automate this process? Any leads suggestion will be very helpful. I can handle the sending mail part ,the main thing i have to ask to automate the process of fetching files from db.

Related

Transfer file from S3 to Windows server

I have just been introduced to Python (PySpark). I have a requirement to achieve the following steps:-
Extract data from a Hive table (on EMR) into a csv file on AWS S3
Transfer the csv file created on S3 (EMR cluster running Spark on YARN) to a
remote Windows server (at a certain folder path)
Once the file has been transferred, trigger a batch file that exists
on the Windows server at a certain folder path
The Windows batch script when triggered, updates/enriches the transferred file with additional information. So transfer/copy the updated csv file back to S3
The final step is to load the updated file to a Hive table once it is transferred back on S3. However, I have figured out how to extract the data from the table into a csv file on S3 and also to load the file to the table. However, I am struggling to get a bearing on how to perform the file transfer/copy between the servers, and most importantly, how to trigger the Windows Batch script on the remote machine.
Could someone please help me and point me in the right direction, and hint as to where I should be starting from ? I searched the internet but couldn't get a concrete answer. I understand that I have to use Boto3 library to interact with S3, however, if there is any other established solution please share those with me (code snippets, articles etc). Also any specific configurations that I might have to incorporate to achieve the result.
Thanks

Heroku - CSV Files and TXT Logfiles

I want to deploy a python bot to heroku.
The bot writes all logging data to txt files and also exports a CSV file to the filesystem where data is stored that is important for the next run of the bot and as well makes it possible to track past performance of the bot.
As I know that it is not possible to store any files at a heroku dyno persistent the question is - how/where to store the data?
A database for the data in the csv file is not suitable for me because I have to edit the file sometimes between two runs and to do this via a database would be to much effort for me.
Any suggestions?
You need to save the file(s) on an external storage like S3, DropBox, GitHub, etc..
Check Files on Heroku to see options and examples.
You can decide to read/write the files directly from the storage (i.e. no local copy), or also keep the files locally making sure they are saved at some point (every 10 min, before every restart).

Concurrent file upload/download and running background processes

I want to create a minimal webpage where concurrent users can upload a file and I can process the file (which is expected to take some hours) and email back to the user later on.
Since I am hosting this on AWS, I was thinking of invoking some background process once I receive the file so that even if the user closes the browser window, the processing keeps taking place and I am able to send the results after few hours, all through some pre-written scripts.
Can you please help me with the logistics of how should I do this?
Here's how it might look like (hosting-agnostic):
A user uploads a file on the web server
The file is saved in a storage that can be accessed later by the background jobs
Some metadata (location in the storage, user's email etc) about the file is saved in a DB/message broker
Background jobs tracking the DB/message broker pick up the metadata and start handling the file (this is why it needs to be accessible by it in p.2) and notify the user
More specifically, in case of python/django + aws you might use the following stack:
Lets assume you're using python + django
You can save the uploaded files in a private AWS S3 bucket
Some meta might be saved in the db or use celery + AWS SQS or AWS SQS directly or bring up something like rabbitmq or redis(+pubsub)
Have python code handling the job - depends on what your opt for in p.3. The only requirement is that it can pull data from your S3 bucket. After the job is done notify the user via AWS SES
The simplest single-server setup that doesn't require any intermediate components:
Your python script that simply saves the file in a folder and gives it a name like someuser#yahoo.com-f9619ff-8b86-d011-b42d-00cf4fc964ff
Cron job looking for any files in this folder that would handle found files and notify the user. Notice if you need multiple background jobs running in parallel you'll need to slightly complicate the scheme to avoid race conditions (i.e. rename the file being processed so that only a single job would handle it)
In a prod app you'll likely need something in between depending on your needs

GAE better output information of appcfg.py bulkupload on daily routine

I have a web service on Google App Engine (programmed in Python) and everyday I have to update with data from a ftp source.
My daily job, that´s outside of GAE, downloads the data from the ftp server, parse and enrich this data with other information sources and this process take nearly 2 hours.
After all this, I upload all this data to my server using the bulk upload function of the appcfg.py (command line).
Since I want to have better reports of this process, I need to know how many records were really uploaded by each call to appcfg (there more then 10)
My question is: Can I get this number of lines uploaded from the appcfg.py without having to parse its output?
Bonus question: Does anyone else do this kind of daily routine? or is it a bad practice?

How to export data into html or txt file using python script runnig on server?

The main purpose of my python script is to parse website, and then save results as html ot txt file on server. And also I want this script to repeat this operation every 15 minutes without my action.
Google App Engine doesn't allow to save files on server, instead I should use DataBase. Is it real to save txt or HTML in DB? And how to make script running without stopping?
Thanks for helping in advance
You are correct in saying that it is impossible to save files directly to the server. Your only option is the datastore as you say. The data type best suited to you is probably the "Text string (long)", however you are limited to 1mb. See https://developers.google.com/appengine/docs/python/datastore/entities for more information.
Regarding the scheduling, you are looking for Cron jobs. You can setup a Cron job to run at any configurable period. See https://developers.google.com/appengine/docs/python/config/cron for details describing how Cron jobs work.

Categories