Truncate MySQL table using Pandas+PyMySQL+SQLalchemy - python

Is there a way to truncate a table using pandas? I'm using a config.json to transfer the db config.
with open("config.json") as jsonfile:
db_config = load(jsonfile)['database']
engine = create_engine(db_config['config'], echo = False)
{
"database": {
"config":"mysql+pymysql://root:password#localhost:3306/some_db
}
}
Like:
sql = "TRUNCATE TABLE some_db.some_table;"
pd.read_sql(sql=sql, con=engine)
Error:
sqlalchemy.exc.ResourceClosedError: This result object does not return
rows. It has been closed automatically

Related

Upload table to bigquery using Colab, specifying schema in job_config

I am trying to write a table to bigquery using Colab. The best way to do it I find is using client and job_config. It is important that I maintain control over how data is written as I plan to use below code for different tasks. The last step that eludes me is setting up schema. I do not want someone's query to crash as say Year is suddenly integer instead of a string. The below code should work? Or perhaps I need to use "job_config.schema_update_options", but I am not sure how schema object should look ? I cannot use pandas gbq as it is too slow to write to dataframe first. Table would be overwritten each month, that is why write_truncate. Thanks
schema_1 = [
{ "name": "Region", "type": "STRING", "mode": "NULLABLE" },
{ "name": "Product", "type": "STRING", "mode": "NULLABLE" } ]
schemma2 = [('Region', 'STRING', 'NULLABLE', None, ()),
('Product', 'STRING', 'NULLABLE', None, ())]
"""Create a Google BigQuery input table.
In the code below, the following actions are taken:
* A new dataset is created "natality_regression."
* A query is run, the output of which is stored in a new "regression_input" table.
"""
from google.cloud import bigquery
# Create a new Google BigQuery client using Google Cloud Platform project defaults.
project_id = 'nproject'
client = bigquery.Client(project=project_id)
# Prepare a reference to a new dataset for storing the query results.
dataset_id = "natality_regression"
table_id = "regression_input"
table_id_full = f"{project_id}.{dataset_id}.{table_id}"
# Configure the query job.
job_config = bigquery.QueryJobConfig()
# Set the destination table to where you want to store query results.
job_config.destination = table_id_full
job_config.write_disposition = 'WRITE_TRUNCATE' # WRITE_APPEND
job_config.schema = schemma2
#job_config.schema_update_options = ???
#job_config.schema = schema_1
# Set up a query in Standard SQL
query = """
SELECT * FROM `nproject.SalesData1.Sales1` LIMIT 15
"""
# Run the query.
query_job = client.query(query, job_config=job_config)
query_job.result() # Waits for the query to finish
print('danski')
This needs to be doing effectively by the job_config but the syntaxis for python is like this:
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("mode", "STRING"),
]
)
You can find more details here: https://cloud.google.com/bigquery/docs/schemas?hl=es_419
You can create the table first too, as you mentioned in the query:
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"
schema = [
bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),
bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
]
table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table) # Make an API request.
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

how to Remove _id field while extracing mongodb document to json python

how to remove _id field while extracting mongodb document to json python. I have written the code but getting nothing in json format.
mongodb document looks like
db.collection.find().pretty()
{
"_id" : ObjectId("612334997e2f032b9f077eb7"),
"sourceAttribute" : "first_name",
"domainAttribute" : "First_Name"
}
Code tried
myclient = pymongo.MongoClient('mongodb://localhost:27017/')
mydb = myclient["guid"]
mycol = mydb["mappedfields"]
cursor = mydb.mycol.find({},{'_id':False})
list_cur = list(cursor)
json_data = dumps(list_cur, indent=1)
with open('mapping_files/mapping_from_mongodb.json', 'w') as file:
file.write(json_data)
Output Getting
[]
Expected output
[
{
"sourceAttribute": "first_name",
"domainAttribute": "First_Name"
}
]
cursor = mycol.find({},{'_id':False})
mycol -> collection name.
_id should be in second braces.

psycopg2.ProgrammingError: can't adapt type 'dict'

I have sql class in python which inserts data to my DB. In my table, one column is jsonfield and when I insert data to that table , i get error (psycopg2.ProgrammingError: can't adapt type 'dict') .
I have used json.load , json.loads , json.dump , json.dumps. None of them worked. Even I tried string formatting. It did not work, either.
Any idea how to do?
my demo code is
json_data = {
"key": "value"
}
query = """INSERT INTO table(json_field) VALUES(%s)"""
self.cursor.execute(query, ([json_data,]))
self.connection.commit()
Below block of code worked for me
import psycopg2
import json
json_data = {
"key": "value"
}
json_object = json.dumps(json_data, indent = 4)
query = """INSERT INTO json_t(field) VALUES(%s)"""
dbConn = psycopg2.connect(database='test', port=5432, user='username')
cursor=dbConn.cursor()
cursor.execute(query, ([json_object,]))
dbConn.commit()

Create table from dictionary data in a safe way

I have a problem where i have a list of dictionaries with for example the following data:
columns = [{
'name': 'column1',
'type': 'varchar'
},
{
'name': 'column2',
'type': 'decimal'
},
.
.
.
]
From that list i need to dynamically create a CREATE TABLE statement based on each dictionary in the list which contains the name of the column and the type and execute it on a PostgreSQL database using the psycopg2 adapter.
I managed to do it with:
columns = "(" + ",\n".join(["{} {}".format(col['name'], col['type']) for col in columns]) + ")"
cursor.execute("CREATE TABLE some_table_name\n {}".format(columns))
But this solution is vulnerable to SQL injection. I tried to do the exact same thing with the sql module from psycopg2 but without luck. Always getting syntax error, because it wraps the type in quotes.
Is there some way this can be done safely?
You can make use of AsIs to get the column types added non-quoted:
import psycopg2
from psycopg2.extensions import AsIs
import psycopg2.sql as sql
conn = psycopg2.connect("dbname=mf port=5959 host=localhost user=mf_usr")
columns = [{
'name': "column1",
'type': "varchar"
},
{
'name': "column2",
'type': "decimal"
}]
# create a dict, so we can use dict placeholders in the CREATE TABLE query.
column_dict = {c['name']: AsIs(c['type']) for c in columns}
createSQL = sql.SQL("CREATE TABLE some_table_name\n ({columns})").format(
columns = sql.SQL(',').join(
sql.SQL(' ').join([sql.Identifier(col), sql.Placeholder(col)]) for col in column_dict)
)
print(createSQL.as_string(conn))
cur = conn.cursor()
cur.execute(createSQL, column_dict)
cur.execute("insert into some_table_name (column1) VALUES ('foo')")
cur.execute("select * FROM some_table_name")
print('Result: ', cur.fetchall())
Output:
CREATE TABLE some_table_name
("column1" %(column1)s,"column2" %(column2)s)
Result: [('foo', None)]
Note:
psycopg2.sql is safe to SQL injection, AsIs probably not.
Testing using 'type': "varchar; DROP TABLE foo" resulted in Postgres syntax error:
b'CREATE TABLE some_table_name\n ("column1" varchar; DROP TABLE foo,"column2" decimal)'
Traceback (most recent call last):
File "pct.py", line 28, in <module>
cur.execute(createSQL, column_dict)
psycopg2.errors.SyntaxError: syntax error at or near ";"
LINE 2: ("column1" varchar; DROP TABLE foo,"column2" decimal)
Expanding on my comment, a complete example:
import psycopg2
from psycopg2 import sql
columns = [{
'name': 'column1',
'type': 'varchar'
},
{
'name': 'column2',
'type': 'decimal'
}]
con = psycopg2.connect("dbname=test host=localhost user=aklaver")
cur = con.cursor()
col_list = sql.SQL(',').join( [sql.Identifier(col["name"]) + sql.SQL(' ') + sql.SQL(col["type"]) for col in columns])
create_sql = sql.SQL("CREATE TABLE tablename ({})").format(col_list)
print(create_sql.as_string(con))
CREATE TABLE tablename ("column1" varchar,"column2" decimal)
cur.execute(create_sql)
con.commit()
test(5432)=> \d tablename
Table "public.tablename"
Column | Type | Collation | Nullable | Default
---------+-------------------+-----------+----------+---------
column1 | character varying | | |
column2 | numeric |
Iterate over the column list of dicts and assign the column names as SQL identifiers and the column types as straight SQL into sql.SQL construct. Use this as parameter to CREATE TABLE SQL.
Caveat: sql.SQL() does not do escaping, so those values would have to be validated before they where used.

Google BigQuery: In Python, column addition makes all the other columns Nullable

I have a table that already exists with the following schema:
{
"schema": {
"fields": [
{
"mode": "required",
"name": "full_name",
"type": "string"
},
{
"mode": "required",
"name": "age",
"type": "integer"
}]
}
}
It already contains entries like:
{'full_name': 'John Doe',
'age': int(33)}
I want to insert a new record with a new field and have the load job automatically add the new column as it loads. The new format looks like this:
record = {'full_name': 'Karen Walker',
'age': int(48),
'zipcode': '63021'}
My code is as follows:
from google.cloud import bigquery
client = bigquery.Client(project=projectname)
table = client.get_table(table_id)
config = bigquery.LoadJobConfig()
config.autoedetect = True
config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
config.schema_update_options = [
bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION,
]
job = client.load_table_from_json([record], table, job_config=config)
job.result()
This results in the following error:
400 Provided Schema does not match Table my_project:my_dataset:mytable. Field age has changed mode from REQUIRED to NULLABLE
I can fix this by changing config.schema_update_options as follows:
bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION,
bigquery.SchemaUpdateOption.ALLOW_FIELD_RELAXATION
]
This allows me to insert the new record, with zipcode added to the schema, but it causes both full_name and age to become NULLABLE, which is not the behavior I want. Is there a way to prevent schema auto-detect from changing the existing columns?
If you need to add fields to your schema, you can do the following:
from google.cloud import bigquery
client = bigquery.Client()
table = client.get_table("your-project.your-dataset.your-table")
original_schema = table.schema # Get your current table's schema
new_schema = original_schema[:] # Creates a copy of the schema.
# Add new field to schema
new_schema.append(bigquery.SchemaField("new_field", "STRING"))
# Set new schema in your table object
table.schema = new_schema
# Call API to update your table with the new schema
table = client.update_table(table, ["schema"])
After updating your table's schema you can load your new records with this additional field ignoring any schema configurations.

Categories