In portia, I want to save the data to the database like Mysql or do something to clean the data, but I don't know how to do that, can you give me some suggestion.
I'm new in scrapy, and I'll wait online, thank you very much!
You need to add a new item processing pipeline for storing data to MySQL. To do this you need to go to the Portia project folder, add a new pipelines.py file that can save data to MySQL and edit the settings.py file to use this pipeline.
Here is an example of an item pipeline for storing data in MySQL
https://github.com/darkrho/dirbot-mysql/blob/master/dirbot/pipelines.py#L36
Here is the documentation on how to enable the pipeline and how it works
http://doc.scrapy.org/en/latest/topics/item-pipeline.html
Related
After running the mongodb docker image I created a database called MovieFlix and some collections with items inside the database.
What I want is to find a way to store all the data from the MovieFlix db in a json file to have it saved for later user for docker-compose.
Should I do it with python code using pymongo or is there a simpler way ?
Mongodb has a command line tool for exporting to json or csv. The keyword for this is dump. Hope it helps!
I give up. Migrations are effed up. encountering error after error. I'm so sick and tired of this right now.
I just want to move on. So I guess it better to wipe all data because of one table is messed up.
Is it possible to backup ONE table, not a database, before wiping? And then after import this back to that table?
I use postgres db.
you can use Django dump data for a single model
it will export json file from app or single model then you can get it back by load data
Django dump data doc
I added 'models' and created a file using 'makemigrations'.
I want to have the initial data in the database at the same time as 'migrate'.
However, no matter how much I edit the 'migrations' file, there is an error that says no because there is no 'table' in the database before 'migrate'.
Help me...
This will help https://docs.djangoproject.com/en/2.2/topics/migrations/#data-migrations
This is also the nice blog where you can create data migration similar to how you create database migration.
https://simpleisbetterthancomplex.com/tutorial/2017/09/26/how-to-create-django-data-migrations.html
You might want to look into Django data migrations:
https://docs.djangoproject.com/en/2.2/topics/migrations/#data-migrations
In the operations, run your actual table creation before running data initialization. Please give code example if you run into problems.
I download the some of pdf and stored in directory. Need to insert them into mongo database with python code so how could i do these. Need to store them by making three columns (pdf_name, pdf_ganerateDate, FlagOfWork)like that.
You can use GridFS. Please check this URL.
It will help you to store any file to mongoDb and get them. In other collection you can save file metadata.
Background
I am loading files from local machine to BigQuery.Each file has variable number of fields.So,i am using 'autodetect=true' while running load job.
Issue is,when load job is run for first time and if the destination table doesn't exsist,Bigquery creates the table ,by infering the fields present in our file and that becomes New table's schema.
Now,when i run load job with a different file,which contains some extra (Eg:"Middile Name":"xyz")fields ,bigQuery throws error saying "field doesn't exsist in table")
From this post::BigQuery : add new column to existing tables using python BQ API,i learnt that columns can be added dynamically.However what i don't understand is,
Query
How will my program come to know,that the file being uploaded ,contains extra fields and schema mismatch will occur.(Not a problem ,if table doesn't exsist bcoz. new table will be created).
If my program can somehow infer the extra fields present in file being uploaded,i could add those columns to the exsisting table and then run the load job.
I am using python BQ API.
Any thoughts on how to automate this process ,would be helpful.
You should check schema update options. There is an option named as "ALLOW_FIELD_ADDITION" that will help you.
A naive solution would be:
1.get the target table schema using
service.tables().get(projectId=projectId, datasetId=datasetId, tableId=tableId)
2.Generate schema of your data in the file.
3.Compare the schemas (kind of a "diff") and then add those columns to the target table ,which are extra in your data schema
Any better ideas or approaches would be highly appreciated!