I want to insert Hindi language sentences to mysql database.
But I encountered a problem: Hindi language sentences that are inserted into mysql database has become garbled.
I have set the encoding format to UTF-8, then my code is as follows.
Thanks a lot!
#coding = utf-8
import MySQLdb
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
dbs = MySQLdb.connect(host='x.x.x.x', user='x', passwd = 'x', db='x',port=x)
cursor = dbs.cursor()
with open('hindi.wiki.set','r') as file:
count = 1
for line in file.readlines():
if count == 5:
break
sql = """insert into `lab_ime_test_set_2` (id_, type_, lang_, text_, anno_) values(%s, %s, %s,'%s', %s)""" % ("null", "'wiki'", "'hindi'", MySQLdb.escape_string(line.strip()), "'not_anno'")
try:
cursor.execute(sql)
dbs.commit()
except Exception as eh:
print("error")
print("total count", count)
cursor.close()
dbs.close()
since the sql can bu run in navicat for mysql and the hindi language can be shown correctly.
But when I run this code, the sentences can be inserted in mysql database as well, but can't be shown correctly.
such as "संतरे के जायके वाले मूल टैंग को 1957 में जनरल फूडà¥à¤¸ कॉरपोरेशन के लिठविलियम à¤"
Related
when running this code i am getting a Error while connecting to MySQL Not all parameters were used in the SQL statement
I have tried also to ingest these with another technique
import mysql.connector as msql
from mysql.connector import Error
import pandas as pd
empdata = pd.read_csv('path_to_file', index_col=False, delimiter = ',')
empdata.head()
try:
conn = msql.connect(host='localhost', user='test345',
password='test123')
if conn.is_connected():
cursor = conn.cursor()
cursor.execute("CREATE DATABASE timetheft")
print("Database is created")
except Error as e:
print("Error while connecting to MySQL", e)
try:
conn = msql.connect(host='localhost', database='timetheft', user='test345', password='test123')
if conn.is_connected():
cursor = conn.cursor()
cursor.execute("select database();")
record = cursor.fetchone()
print("You're connected to database: ", record)
cursor.execute('DROP TABLE IF EXISTS company;')
print('Creating table....')
create_contracts_table = """
CREATE TABLE company ( ID VARCHAR(40) PRIMARY KEY,
Company_Name VARCHAR(40),
Country VARCHAR(40),
City VARCHAR(40),
Email VARCHAR(40),
Industry VARCHAR(30),
Employees VARCHAR(30)
);
"""
cursor.execute(create_company_table)
print("Table is created....")
for i,row in empdata.iterrows():
sql = "INSERT INTO timetheft.company VALUES (%S, %S, %S, %S, %S,%S,%S,%S)"
cursor.execute(sql, tuple(row))
print("Record inserted")
# the connection is not auto committed by default, so we must commit to save our changes
conn.commit()
except Error as e:
print("Error while connecting to MySQL", e)
second technique I tried
LOAD DATA LOCAL INFILE 'path_to_file'
INTO TABLE copmany
FIELDS TERMINATED BY ';'
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
worked better but many errors. only 20% of rows ingested.
Finally here is an excerpt from the .csv (data is consistent throughout all 1K rows)
"ID";"Company_Name";"Country";"City";"Email";"Industry";"Employees"
217520699;"Enim Corp.";"Germany";"Bamberg";"posuere#diamvel.edu";"Internet";"51-100"
352428999;"Lacus Vestibulum Consulting";"Germany";"Villingen-Schwenningen";"egestas#lacusEtiambibendum.org";"Food Production";"100-500"
371718299;"Dictum Ultricies Ltd";"Germany";"Anklam";"convallis.erat#sempercursus.co.uk";"Primary/Secondary Education";"100-500"
676789799;"A Consulting";"Germany";"Andernach";"massa#etrisusQuisque.ca";"Government Relations";"100-500"
718526699;"Odio LLP";"Germany";"Eisenhüttenstadt";"Quisque.varius#euismod.org";"E-Learning";"11-50"
I fixed these issues to get the code to work:
make the number of placeholders in the insert statement equal to the number of columns
the placeholders should be lower-case '%s'
the cell delimiter appears to be a semi-colon, not a comma.
For simply reading a csv with ~1000 rows Pandas is overkill (and iterrows seems not to behave as you expect). I've used the csv module from the standard library instead.
import csv
...
sql = "INSERT INTO company VALUES (%s, %s, %s, %s, %s, %s, %s)"
with open("67359903.csv", "r", newline="") as f:
reader = csv.reader(f, delimiter=";")
# Skip the header row.
next(reader)
# For large files it may be more efficient to commit
# rows in batches.
cursor.executemany(sql, reader)
conn.commit()
If using the csv module is not convenient, the dataframe's itertuples method may be used to iterate over the data:
empdata = pd.read_csv('67359903.csv', index_col=False, delimiter=';')
for tuple_ in empdata.itertuples(index=False):
cursor.execute(sql, tuple_)
conn.commit()
Or the dataframe can be dumped to the database directly.
import sqlalchemy as sa
engine = sa.create_engine('mysql+mysqlconnector:///test')
empdata.to_sql('company', engine, index=False, if_exists='replace')
this is my first time using sqlite, I've only worked on MySQL before. I have a program where I stream live twitter tweets and store them in a db. The program creates a database, and then starts running tweepy to fetch the data from twitter. I'm having trouble trying to print out my data for data exploration from my db file, twitter.db. I do however see the tweets stream real-time on my console, I just cannot seem to call the data from the db.
Below is my database.
conn = sqlite3.connect('twitter.db')
c = conn.cursor()
def create_table():
try:
c.execute("CREATE TABLE IF NOT EXISTS sentiment(unix REAL, tweet TEXT, sentiment REAL)")
c.execute("CREATE INDEX fast_unix ON sentiment(unix)")
c.execute("CREATE INDEX fast_tweet ON sentiment(tweet)")
c.execute("CREATE INDEX fast_sentiment ON sentiment(sentiment)")
conn.commit()
except Exception as e:
print(str(e))
create_table()
After I run the program once, I hashtag the def create_table() function out to allow the flow of data to stream without having the program run another create_table(). Below is how I stream the data to my db.
def on_data(self, data):
try:
data = json.loads(data)
tweet = unidecode(data['text'])
time_ms = data['timestamp_ms']
analysis = TextBlob(tweet)
sentiment = analysis.sentiment.polarity
print(time_ms, tweet, sentiment)
c.execute("INSERT INTO sentiment (unix, tweet, sentiment) VALUES (?, ?, ?)",
(time_ms, tweet, sentiment))
conn.commit()
except KeyError as e:
print(str(e))
return(True)
The streaming from twitter API seems to work well, however when I want to print out my rows for data exploration and check if the data is being stored, I receive this error: OperationalError: no such table: sentiment. The code below produces said error:
import sqlite3
conn = sqlite3.connect('twitter.db')
c = conn.cursor()
c.execute("SELECT * FROM sentiment")
print(c.fetchall())
When I run c.execute("SELECT * FROM sqlite_master") ...I get a [] printed on screen. Which I assume and know that something is very wrong. What is wrong with the code above?
Thanks.
are you executing the scripts from the same directory?
If not sure I suggest to write in both scripts
import os
print("I am in following directory: ", os.getcwd())
conn = sqlite3.connect('twitter.db')
instead of
conn = sqlite3.connect('twitter.db')
and check if both really look in the same directory for twitter.db
If they do, then go to the command line change into this directory and type
sqlite3 twitter.db
and type then
.tables
and look what tables will be listed.
you can then even type queries (if the table exists) to check in more detail
SELECT * FROM sentiment;
I think gelonida is right on the money and comand-line sqlite3 is your friend.
I slightly modified your code and it works fine when the db is in the same location:
import os
import sqlite3
os.remove('twitter.db')
conn = sqlite3.connect('twitter.db')
c = conn.cursor()
def create_table():
try:
c.execute("CREATE TABLE IF NOT EXISTS sentiment(unix REAL, tweet TEXT, sentiment REAL)")
c.execute("CREATE INDEX fast_unix ON sentiment(unix)")
c.execute("CREATE INDEX fast_tweet ON sentiment(tweet)")
c.execute("CREATE INDEX fast_sentiment ON sentiment(sentiment)")
conn.commit()
except Exception as e:
print(str(e))
create_table()
def on_data():
try:
tweet = 'text'
time_ms = 123.2
sentiment = 1.234
print(time_ms, tweet, sentiment)
c.execute("INSERT INTO sentiment (unix, tweet, sentiment) VALUES (?, ?, ?)",
(time_ms, tweet, sentiment))
conn.commit()
except KeyError as e:
print(str(e))
return(True)
on_data()
conn.close()
# Now check the result
conn = sqlite3.connect('twitter.db')
c = conn.cursor()
c.execute("SELECT * FROM sentiment")
print(c.fetchall())
running it prints:
123.2 text 1.234
[(123.2, 'text', 1.234)]
I'm trying to make an dbf to mysql connector in python. So far i have got it to connect the mysql server and read the dbf file but when I run the program it shows that none of the data has replicated in the sql.
Heres my code so far.
from dbfpy import dbf
import MySQLdb
source = dbf.Dbf("foxpro.Dbf")
db = MySQLdb.connect(host = "localhost", user = "root", passwd = "", db = "mydb")
cur = db.cursor()
for r in source:
query = """INSERT mytb SET column1 = %s, column2 = %s, column3 = %s"""
values = (r["column1"], r["column2"], r["column3"])
print r["column1"], r["column2"], r["column3"]
You've written the query to insert but you haven't execute()d it.
# since your `values` is already a tuple
cur.execute(query, values)
# otherwise can be written as...
cur.execute(query, (r["column1"], r["column2"], r["column3"]))
This question already has answers here:
Python & MySql: Unicode and Encoding
(3 answers)
Closed 9 years ago.
I have a python program wherein I access a url and extract data. I then input this data into a mysql table. The mysql table has columns pid ,position,club, points,s,availability, rating,name . I have no issues with the python program ( I hope) but the database apparently does not seem to accept names with UTF alphabets ex: Jääskeläinen . How do I make the database to accept these names? I tried using the answer given here. But the program still gives me the following error:
Traceback (most recent call last):
File "C:\Users\GAMER\Desktop\Padai\Fall 13\ADB\player_extract.py", line 49, in <module>
sql += "('{0}', '{1}', '{2}', '{3}', '{4}','{5}','{6}','{7}')".format(count,position,club, points,s,availability, rating,name)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
excepted Goalkeepers Jääskeläinen West Ham 67 £5.5
My python code is this"
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from urllib import urlopen
from pprint import pprint
import MySQLdb
import traceback
import re
#fetch players list from site
url = "http://fantasy.premierleague.com/player-list/"
html = urlopen(url).read()
soup = BeautifulSoup(html)
h2s = soup.select("h2")
tables = soup.select("table")
first = True
title =""
players = []
for i,table in enumerate(tables):
if first:
title = h2s[int(i/2)].text
for tr in table.select("tr"):
player = (title,)
for td in tr.select("td"):
player = player + (td.text,)
if len(player) > 1:
players.append(player)
first = not first
##SQL connectivity and data entry
db = MySQLdb.connect(host="localhost", user="root", passwd="hassan28", db = "adbpro")
cur = db.cursor()
try:
count = 1
for i in players:
position, name, club, points, price = i
s = price[1:]
name = name.replace("'"," ")
rating = 4
availability = 1
sql = "INSERT INTO players (pid,position,club,points,price,availability,rating,name) VALUES "
try:
sql += "('{0}', '{1}', '{2}', '{3}', '{4}','{5}','{6}','{7}')".format(count,position,club, points,s,availability, rating,name)
cur.execute(sql)
count +=1
except UnicodeError:
traceback.print_exc()
print "excepted", position, name, club, points, price
continue
#print sql
db.commit()
except:
print sql
traceback.print_exc()
db.rollback()
cur.execute("SELECT * FROM PLAYERS")
print "done"
Any help will be greatly appreciated.
This is not a database problem; you are trying to interpolate Unicode values into a byte string, triggering an implicit encoding.
Don't use string formatting here, use SQL parameters instead:
sql = "INSERT INTO players (pid,position,club,points,price,availability,rating,name) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)"
params = (count, position, club, points, s, availability, rating, name)
cur.execute(sql, params)
Here the %s tell MySQLdb where to expect SQL parameters, and you pass in the parameters as a separate list to cursor.execute().
Do remember to tell the database connection that you want to use UTF-8 for Unicode values:
db = MySQLdb.connect(host="localhost", user="root", passwd="hassan28",
db="adbpro", charset='utf8')
Seems like a duplicate of this question. Just for others, the solution is "When you connect() to your database, pass the charset='utf8' parameter."
inserting Chinese character into Sqlite3 through Cgi script is not working for me. I can insert and select Chinese character from same database using Query browser tool but when I use python script for this, it's show error. This is the query i have used for create database
CREATE TABLE registrations (
m_username VARCHAR PRIMARY KEY
COLLATE 'BINARY',
m_identity VARCHAR,
m_updatetime DATETIME
);
and then this is the cgi script i have used for update and select values form the database
#! /Python26/python
dbFile = 'D:/sqlite/registrations'
import cgi
import sqlite3
import xml.sax.saxutils
query = cgi.parse()
db = sqlite3.connect(dbFile)
user = query.get('username', [None])[0]
identity = query.get('identity', [None])[0]
friends = query.get('friends', [])
print 'Content-type: text/plain\n\n<?xml version="1.0" encoding="utf-8"?>\n'
print "<result>"
if user:
try:
c = db.cursor()
c.execute("insert or replace into registrations values (?, ?, datetime('now'))", (user, identity))
print "\t<update>true</update>"
except:
print '\t<update>false</update>'
for f in friends:
print "\t<friend>\n\t\t<user>%s</user>" % (xml.sax.saxutils.escape(f), )
c = db.cursor()
c.execute("select m_username, m_identity from registrations where m_username = ? and m_updatetime > datetime('now', '-1 hour')", (f, ))
for result in c.fetchall():
eachIdent = result[1]
if not eachIdent:
eachIdent = ""
print "\t\t<identity>%s</identity>" % (xml.sax.saxutils.escape(eachIdent), )
if f != result[0]:
print "\t\t<registered>%s</registered>" % (xml.sax.saxutils.escape(result[0]), )
print "\t</friend>"
db.commit()
print "</result>"
I think, i need to set CHARSET as UTF-8 something, but I don't know how to do it. i was googled, but couldn't find good way to solve this issue. kindly some one help me please.
I have done this through the client side. I just used EncodeBase64 and encoded the chinese data and send to the db. I think, this one is not straight way. but I couldn't find any other way.