Allow flexible csv structure import into partitioned bigquery table from gcs - python

I have a partitioned table in bigquery that gets refreshed weekly with csv files from google cloud storage. During the refresh process the entire table is deleted and rewritten.
A new column was added to the csv files and the table structure was updated accordingly, but now during the refresh I get a column mismatch error because the old csv files don't contain the new column.
Is there a way to use gsutil or bq command line to smoothly add the column to the old csv files during the import into the partitioned table without having to download each file onto my local computer and alter the csv's one by one?

Related

Bigquery table to GCS

I am trying to automate a job where files are written in gcs using data from queried data in BQ.
I have a bigquery table and I need to export files out to GCS, named according to a particular field.
field1 file_name
w filea
x fileb
y filec
z filed
So in this case, I need to produce 4 csv files, filea.csv, fileb.csv, filec.csv, filed.csv.
Is there a way I can automate this with python, so that if a new file shows up (fileE) in the BQ table, the job can export it to gcs with proper name fileE.csv?
Thank you!
I tried exporting one by one using BQ data export and it worked, but I was looking for a python solution.
Thanks

Create automatic table columns SQL code after a .CSV file in PostgreSQL

I have some large (+500 Mbytes) .CSV files that I need to import into a Postgres SQL database.
I am looking for a script or tool that helps me to:
Generate the table columns SQL CREATE code, ideally taking into account the data in the .CSV file in order to create the optimal data types for each column.
Use the header of the .CSV as the name of the column.
It would be perfect if such functionality existed in PostgreSQL or could be added as an add-on.
Thank you very much
you can use this open source tool called pgfutter to create table from your csv file.
git hub link
also postgresql has COPY functionality however copy expect that the table already exists.

Using Python: Extracting tables from postgres DB as CSV files and adding current date column to the CSV file

I have 25 tables in a single Schema as part of my Postgres DB. I would like to extract these tables as CSV files. These tables doesn't have a current date column to it. so we would like to add cuurent date column to the CSV file.
Planning to do this in Python and the code will be executed in a ec2 instance.
One of the option i explored is COPY Command

Python/Pandas/BigQuery: How to efficiently update existing tables with a lot of new time series data?

I have one program that downloads time series (ts) data from a remote database and saves the data as csv files. New ts data is appended to old ts data. My local folder continues to grow and grow and grow as more data is downloaded. After downloading new ts data and saving it, I want to upload it to a Google BigQuery table. What is the best way to do this?
My current work-flow is to download all of the data to csv files, then to convert the csv files to gzip files on my local machine and then to use gsutil to upload those gzip files to Google Cloud Storage. Next, I delete whatever tables are in Google BigQuery and then manually create a new table by first deleting any existing table in Google BigQuery and then creating a new one by uploading data from Google Cloud Storage. I feel like there is room for significant automation/improvement but I am a Google Cloud newbie.
Edit: Just to clarify, the data that I am downloading can be thought of downloading time series data from Yahoo Finance. With each new day, there is fresh data that I download and save to my local machine. I have to uploading all of the data that I have to Google BigQUery so that I can do SQL analysis on it.
Consider breaking up your data into daily tables (or partitions). Then you only need to upload the CVS from the current day.
The script you have currently defined otherwise seems reasonable.
Extract your new day of CSVs from your source of timeline data.
Gzip them for fast transfer.
Copy them to GCS.
Load the new CVSs into the current daily table/partition.
This avoids the need to delete existing tables and reduces the amount of data and processing that you need to do. As a bonus, it is easier to backfill a single day if there is an error in processing.

import a set of text file to mysql using mysql or python code

I have a folder which contains a set of file .txt with the same structure.
The folder directory is E:\DataExport
Which contain 1000 .txt file: text1.txt, text2.txt,....
Each txt file contain price data of one. The data is updated on a daily. An example of one .txt file is as below
Ticker,Date/Time,Open,High,Low,Close,Volume
AAA,7/15/2010,19.581,20.347,18.429,18.698,174100
AAA,7/16/2010,19.002,19.002,17.855,17.855,109200
AAA,7/19/2010,19.002,19.002,17.777,17.777,104900
....
My question is, I want to:
Load all the file into mysql data base through a line of code, I can do it one by one in MySql command line but do not know how to import all at once, into one table in MySql (I have already create the table)
Everyday, when I get new data, how could I only update the new data to the table in MySql
I would like to get solution using either Python or MySql. I have tried to google and apply some solution but cannot success, the data do not load into mySQL
You could use python package pandas.
Read data with read_csv into DataFrame and use to_sql method with proper con and schema.
You will have to keep track of what is imported. You could for example keep it in file or database that last imported 54th line on 1032th file. And perform an update that reads the rest and imports that.

Categories