I'm trying to extract the column names when pulling data from a vertica data base in python via an sql query. I am using vertica-python 0.6.8. So far I am creating a dictionary of the first line, but I was wondering if there is an easier way of doing it. This is how I am doing it right now:
import vertica_python
import csv
import sys
import ssl
import psycopg2
conn_info = {'host': '****',
'port': 5433,
'user': '****',
'password': '****',
'database': '****',
# 10 minutes timeout on queries
'read_timeout': 600,
# default throw error on invalid UTF-8 results
'unicode_error': 'strict',
# SSL is disabled by default
'ssl': False}
connection = vertica_python.connect(**conn_info)
cur = connection.cursor('dict')
str = "SELECT * FROM something WHERE something_happens LIMIT 1"
cur.execute(str)
temp = cur.fetchall()
ColumnList = []
for column in temp[0]:
ColumnList.append(column)
cheers
Two ways:
First, you can just access the dict's keys if you want the column list, this is basically like what you have, but shorter:
ColumnList = temp[0].keys()
Second, you can access the cursor's field list, which I think is what you are really looking for:
ColumnList = [d.name for d in cur.description]
The second one is better because it'll let you see the columns even if the result is empty.
If I am not wrong you are asking about the title of each column.
You can do that by using data descriptors of "class hp_vertica_client.cursor".
It can be found here :
https://my.vertica.com/docs/7.2.x/HTML/Content/python_client/cursor.html
Related
I am trying to translate a set of columns in my MySQL database using Python's googletrans library.
Sample MySQL table Data:
Label Answer Label_Translated Answer_Translated
cómo estás Wie heißen sie? NULL NULL
wie gehts per favore rivisita NULL NULL
元気ですか Cuántos años tienes NULL NULL
Below is my sample code:
import pandas as pd
import googletrans
from googletrans import Translator
import sqlalchemy
import pymysql
import numpy as np
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import sessionmaker
engine = create_engine("mysql+pymysql:.....")
Session = sessionmaker(bind = engine)
session = Session()
translator = Translator()
I read the database table using:
sql_stmt = "SELECT * FROM translate"
data = session.execute(sql_stmt)
I perform the translation steps using:
for to_translate in data:
to_translate.Answer_Translated = translator.translate(to_translate.Answer, dest = 'en')
to_translate.Label_Translated = translator.translate(to_translate.Label, dest = 'en')
I tried session.commit() but the changes are not reflected in the database. Could someone please let me know how to make the changes permanent in the database.
Also when I try:
for rows in data:
print(rows)
I don't see any output. Before enforcing the changes in the database, is there a way we can view the changes in Python ?
Rewriting my answer because I missed OP was using a raw query to get his set.
Your issue seems to be that there is no real update logic in your code (although you might have missed that out. Here is what you could do. Keep in mind that it's not the most efficient or elegant way to deal with this, but this might get you in the right direction.
# assuming import sqlalchemy as sa
for to_translate in data:
session = Session()
print(to_translate)
mappings = {}
mappings['Label'] = to_translate[0]
mappings['Answer_Translated'] = translator.translate(to_translate.Answer, dest="en")
mappings['Label_Translated'] = translator.translate(to_translate.Label, dest="en")
update_str = "update Data set Answer_Translated=:Answer_Translated, set Label_Translated=:Label_Translated where Label == :Label"
session.execute(sa.text(update_str), mappings)
session.commit()
This will update your db. Now I can't guarantee it will work out of the box, because your actual table might differ from the sample you posted, but the print statement should be able to guide you in fixing update_str. Note that using the ORM would make this a lot nicer.
I'm trying to get the values of all databases existing in mongodb, iterate over all databases and collections for than print it documents. I can to print the document passing the collection as a variable, but can`t do it iterating over all databases and collections (as the value of variable). Someone knows if pymongo supports to do it dynamically passing as value and not passing the collection and the database as the variable itself??
client = MongoClient('mongodb://localhost:27017/')
names = client.database_names()
for dbName in names:
print(dbName)
db = client.dbName
collectionNames = client[dbName].collection_names()
for colecao in collectionNames:
print(colecao)
cursor = db.colecao # choosing the collection you need
print(cursor)
cursor2 = cursor.find() # get documents
for document in cursor2:
pprint(document)
The database names and collection names print normally, but the print cursor returns:
"Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'dbName'), u'colecao')"
It goes with the name of variables.
Instead of
client.dbName
use
client.get_database(dbName)
and instead of
cursor = db.colecao
use
cursor = db.get_collection(colecao)
I'm trying to insert into a MySQL table from data in this Excel sheet: https://www.dropbox.com/s/w7m282386t08xk3/GA.xlsx?dl=0
The script should start from the second sheet "Daily Metrics" at row 16. The MySQL table already has the fields called date, campaign, users, and sessions.
Using Python 2.7, I've already created the MySQL connection and opened the sheet, but I'm not sure how to loop over those rows and insert into the database.
import MySQLdb as db
from openpyxl import load_workbook
wb = load_workbook('GA.xlsx')
sheetranges = wb['Daily Metrics']
print(sheetranges['A16'].value)
conn = db.connect('serverhost','username','password','database')
cursor = conn.cursor()
cursor.execute('insert into test_table ...')
conn.close()
Thank you for you help!
Try this and see if it does what you are looking for. You will need to update to the correct workbook name and location. Also, udate the range that you want to iterate over in for rw in wb["Daily Metrics"].iter_rows("A16:B20"):
from openpyxl import load_workbook
wb = load_workbook("c:/testing.xlsx")
for rw in wb["Daily Metrics"].iter_rows("A16:B20"):
for cl in rw:
print cl.value
Only basic knowledge of MySQL and Openpyxl is needed, you can solve it by reading tutorials on your own.
Before executing the script, you need to create database and table. Assuming you've done it.
import openpyxl
import MySQLdb
wb = openpyxl.load_workbook('/path/to/GA.xlsx')
ws = wb['Daily Metrics']
# map is a convenient way to construct a list. you can get a 2x2 tuple by slicing
# openpyxl.worksheet.worksheet.Worksheet instance and last row of worksheet
# from openpyxl.worksheet.worksheet.Worksheet.max_row
data = map(lambda x: {'date': x[0].value,
'campaign': x[1].value,
'users': x[2].value,
'sessions': x[3].value},
ws[16: ws.max_row])
# filter is another builtin function. Filter blank cells out if needed
data = filter(lambda x: None not in x.values(), data)
db = MySQLdb.connect('host', 'user', 'password', 'database')
cursor = db.cursor()
for row in data:
# execute raw MySQL syntax by using execute function
cursor.execute('insert into table (date, campaign, users, sessions)'
'values ("{date}", "{campaign}", {users}, {sessions});'
.format(**row)) # construct MySQL syntax through format function
db.commit()
I'm retrieving a json file from online that I want to insert into my database using peewee. The problem is that some of the rows may already exist in my database. The solution should be to either ignore or replace the duplicate rows.
The InsertQuery function supports adding multiple rows, but I cannot figure out how to either suppress errors that the instance already exists or to replace the existing instance.
Starting with an empty database test, I run the following code
from peewee import *
from peewee import InsertQuery
database = MySQLDatabase('test', **{'password': 'researchServer', 'user': 'root'})
class BaseModel(Model):
class Meta:
database = database
class Image(BaseModel):
url = CharField(unique=True)
database.connect()
database.create_tables([Image])
images= [{'url': 'one'}, {'url':'two'}]
try:
image_entry = InsertQuery(Image, rows=images)
image_entry.execute()
except:
print 'error'
This produces no errors and successfully adds 'one' and 'two' to my table.
If I then run,
images= [{'url':'three'}, {'url': 'one'}, {'url':'four'}]
try:
image_entry = InsertQuery(Image, rows=images)
image_entry.execute()
except:
print 'error'
The execute function throws an error and neither 'three' or 'four' get added to the database.
I suppose one solution would be to check each row before adding it to the database, but this seems like it would be more inefficient.
You can use on_conflict() or upsert() on your InsertQuery.
on_conflict() will add an SQL ON CONFLICT clause with the given argument but only works with SQLite. upsert() basically turns the query into a REPLACE INTO on MySQL. You still need to call execute() after.
Image.insert_many(images).upsert(True).execute()
peewee doc upsert
I haven't been able to find a solution in peewee, but here's one that I wrote for SQLAlchemy.
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy import create_engine
#Make the table
metadata = MetaData()
image = Table('image', metadata,
Column('url', String(250), primary_key=True))
db = create_engine('mysql://root:researchServer#localhost/test3')
metadata.create_all(db)
conn = engine.connect()
#Insert the first set of rows
images = [{'url': 'one'}, {'url': 'two'}]
inserter = db.image.insert()
conn.execute(inserter, images)
#Insert the second set of rows with some duplicates
images = [{'url': 'three'}, {'url': 'one'}, {'url':'four'}]
try:
inserter = db.image.insert().prefix_with("IGNORE")
conn.execute(inserter, images)
except:
print 'error'
The key is using the 'prefix_with' method to add the 'ignore' to the SQL expression. I was greatly helped by SQLAlchemy INSERT IGNORE
I'm trying to upload a pandas data frame to an SQL table. It seemed to me that pandas to_sql function is the best solution for larger data frames, but I can't get it to work. I can easily extract data, but get an error message when trying to write it to a new table:
# connect to Exasol DB
exaString='DSN=exa'
conDB = pyodbc.connect(exaString)
# get some data from somewhere, works without error
sqlString = "SELECT * FROM SOMETABLE"
data = pd.read_sql(sqlString, conDB)
# now upload this data to a new table
data.to_sql('MYTABLENAME', conDB, flavor='mysql')
conDB.close()
The error message I get is
pyodbc.ProgrammingError: ('42000', "[42000] [EXASOL][EXASolution driver]syntax error, unexpected identifier_chain2, expecting
assignment_operator or ':' [line 1, column 6] (-1)
(SQLExecDirectW)")
Unfortunately I have no idea how the query that caused this syntax error looks like or what else is wrong. Can someone please point me in the right direction?
(Second) EDIT:
Following Humayuns and Joris suggestions, I now use Pandas version 0.14 and SQLAlchemy in combination with the Exasol dialect (?). Since I am connecting to a defined schema, I am using the meta data option, but the programm crashes with "Bus error (core dumped)".
engine = create_engine('exa+pyodbc://uid:passwd#exa/mySchemaName', echo=True)
# get some data
sqlString = "SELECT * FROM SOMETABLE" # SOMETABLE is a view in mySchemaName
df = pd.read_sql(sqlString, con=engine) # works
print engine.has_table('MYTABLENAME') # MYTABLENAME is a view in mySchemaName
# prints "True"
# upload it to a new table
meta = sqlalchemy.MetaData(engine, schema='mySchemaName')
meta.reflect(engine, schema='mySchemaName')
pdsql = sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'MYTABLENAME')
I am not sure about setting "mySchemaName" in create_engine(..), but the outcome is the same.
Pandas does not support the EXASOL syntax out of the box, so it need to be changed a bit, here is a working example of your code without SQLAlchemy:
import pyodbc
import pandas as pd
con = pyodbc.connect('DSN=EXA')
con.execute('OPEN SCHEMA TEST2')
# configure pandas to understand EXASOL as mysql flavor
pd.io.sql._SQL_TYPES['int']['mysql'] = 'INT'
pd.io.sql._SQL_SYMB['mysql']['br_l'] = ''
pd.io.sql._SQL_SYMB['mysql']['br_r'] = ''
pd.io.sql._SQL_SYMB['mysql']['wld'] = '?'
pd.io.sql.PandasSQLLegacy.has_table = \
lambda self, name: name.upper() in [t[0].upper() for t in con.execute('SELECT table_name FROM cat').fetchall()]
data = pd.read_sql('SELECT * FROM services', con)
data.to_sql('SERVICES2', con, flavor = 'mysql', index = False)
If you use the EXASolution Python package, then the code would look like follows:
import exasol
con = exasol.connect(dsn='EXA') # normal pyodbc connection with additional functions
con.execute('OPEN SCHEMA TEST2')
data = con.readData('SELECT * FROM services') # pandas data frame per default
con.writeData(data, table = 'services2')
The problem is that also in pandas 0.14 the read_sql and to_sql functions cannot deal with schemas, but using exasol without schemas makes no sense. This will be fixed in 0.15. If you want to use it now look at this pull request https://github.com/pydata/pandas/pull/7952