Storing the data from text file into mysql table - python

I have a text file and a MySQL table. the text file look like below.
new.txt
apple| 3
ball | 4
cat | 2
like this. from this text file I want to store data in the below MySQL table.
| query | count | is_prod_ready | time_of_created | last_updated |
I want to store apple,ball,cat in the column query, and all the number 3,4,2 in count column. in is_prod_ready column will be false by default, in time_of_created will take the current time. and last_updated_column will take the update time.
I have already made the table. and i am not able to store all the data into database from the text file. i have tried the below code.
import MySQLdb
con = MySQLdb.connect(host="localhost",user="root",passwd="9090547207",db="Test")
cur = con.cursor()
query = 'load data local infile "new.txt" into table data field terminated by "|" lines terminated by "\n" '
cur.execute(query)
con.commit()
con.close()
here my data base name is Test and table name is data.

Related

Invalid Date when inserting to Teradata using Python

I'm working on a python piece that will insert a dataframe into a teradata table using pyodbc. The error I can't get past is...
File "file.py", line 33, in <module>
cursor.execute("INSERT INTO DB.TABLE (MASDIV,TRXTYPE,STATION,TUNING_EVNT_START_DT,DOW,MOY,TRANSACTIONS)VALUESrow['MASDIV'],'trx_chtr',row['STATION'],row['TUNING_EVNT_START_DT'],row['DOW'],row['MOY'],row['TRANSACTIONS'])
pyodbc.DataError: ('22008', '[22008] [Teradata][ODBC Teradata Driver][TeradataDatabase] Invalid date supplied for Table.TUNING_EVNT_START_DT. (-2666) (SQLExecDirectW)')
To fill you in... I've got a Teradata table that I want to take a dataframe and insert it into. That table is made as.
CREATE SET TABLE DB.TABLE, FALLBACK
(PK decimal(10,0) NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 1
--MAXVALUE 2147483647
NO CYCLE),
TRXTYPE VARCHAR(10),
MASDIV VARCHAR(30),
STATION VARCHAR(50),
TUNING_EVNT_START_DT DATE format 'MM/DD/YYYY',
DOW VARCHAR(3),
MOY VARCHAR(10),
TRANSACTIONS INT,
ANOMALY_FLAG INT NOT NULL DEFAULT 1)
PRIMARY INDEX (PK);
The primary key and anomaly_flag will be automatically filled in. Below is the script that I am using and running into the error. It is reading in a csv and creating a dataframe. The first two lines (including a header) of the csv look like...
MASDIV | STATION | TUNING_EVNT_START_DT | DOW | MOY | TRANSACTIONS
Staten Island | WFUTDT4 | 9/12/18 | Wed | September | 538
San Fernando Valley | American Heroes Channel HD | 6/28/2018 | Thu | June | 12382
Here is the script that I am using...
'''
Written by Bobby October 1st, 2018
REFERENCE
https://tomaztsql.wordpkress.com/2018/07/15/using-python-pandas-dataframe-to-read-and-insert-data-to-microsoft-sql-server/
'''
import pandas as pd
import pyodbc
from datetime import datetime
#READ IN CSV TEST DATA
df = pd.read_csv('Data\\test_set.csv')
print('CSV LOADED')
#ADJUST DATE FORMAT
df['TUNING_EVNT_START_DT'] = pd.to_datetime(df.TUNING_EVNT_START_DT)
#df['TUNING_EVNT_START_DT'] =
df['TUNING_EVNT_START_DT'].dt.strftime('%m/%d/%Y')
df['TUNING_EVNT_START_DT'] = df['TUNING_EVNT_START_DT'].dt.strftime('%Y-%m-%d')
print('DATE FORMAT CHANGED')
print(df)
#PUSH TO DATABASE
conn = pyodbc.connect('dsn=ConnectR')
cursor = conn.cursor()
# Database table has columns...
# PK | TRXYPE | MASDIV | STATION | TUNING_EVNT_START_DT | DOW | MOY |
TRANSACTIONS | ANOMALY_FLAG
# PK is autoincrementing, TRXTYPE needs to be specified on insert command,
and ANOMALY_FLAG defaults to 1 for yes
for index, row in df.iterrows():
cursor.execute("INSERT INTO DLABBUAnalytics_Lab.Anomaly_Detection_SuperSet(MASDIV,TRXTYPE,STATION,TUNING_EVNT_START_DT,DOW,MOY,TRANSACTIONS)VALUES(?,?,?,?,?,?,?)", row['MASDIV'],'trx_chtr',row['STATION'],row['TUNING_EVNT_START_DT'],row['DOW'],row['MOY'],row['TRANSACTIONS'])
conn.commit()
print('RECORD ENTERED')
print('DF SUCCESSFULLY WRITTEN TO DB')
#PULL FROM DATABASE
sql_conn = pyodbc.connect('dsn=ConnectR')
query = 'SELECT * FROM DLABBUAnalytics_Lab.Anomaly_Detection_SuperSet;'
df = pd.read_sql(query, sql_conn)
print(df)
So in this I am converting the date format and trying to insert row by row into the Teradata table. The first record reads in and is in the database. The second record throws the error that is at the top. The date is 6/28/18 and I've changed it to 6/11/18 just to see if there was a mix up with day and month, but that still had the same problem. Are the columns getting off somewhere and it is trying to insert a different column's value into the date column.
Any ideas or help is much appreciated!
So the issue was in the format of the table. Initially it was built to have the MM/DD/YYYY format from the CSV, but changing it to the YYYY-MM-DD format made the script run perfectly.
Thanks!

How to update all column values with new values in a MySQL database

I am using Python to pull data from an API and update a MySQL database with those values. I realized that as my code is right now, the data I want from the API is only being INSERTED into the database the first time the code is ran but the database needs to be updated with new values whenever the code is executed after the first time. The API provides close to real - time values of position, speed, altitude etc. of current airlines in flight. My database looks like this:
table aircraft:
+-------------+------------+------------+-----------+-----------+
| longitude | latitude | velocity | altitude | heading |
+-------------+------------+------------+-----------+-----------+
| | | | | |
| | | | | |
I am quite new to MySQL and I am having trouble finding how to do this the right way. The point is to run the code in order to update the table whenever I want. Or possibly every 10 seconds or so. I am using Python's MySQLdb module in order to execute SQL commands within the python code. Here is the main part of what I currently have.
#"states" here is a list of state vectors from the api that have the data I want
states = api.get_states()
#creates a cursor object to execute SQL commands
#the parameter to this function should be an SQL command
cursor = db.cursor()
#"test" is the database name
cursor.execute("USE test")
print("Adding states from API ")
for s in states.states:
if( s.longitude is not None and s.latitude is not None):
cursor.execute("INSERT INTO aircraft(longitude, latitude) VALUES (%r, %r);", (s.longitude, s.latitude))
else:
("INSERT INTO aircraft(longitude, latitude) VALUES (NULL, NULL);")
if(s.velocity is not None):
cursor.execute("INSERT INTO aircraft(velocity) VALUES (%r);" % s.velocity)
else:
cursor.execute("INSERT INTO aircraft(velocity) VALUES (NULL);")
if(s.altitude is not None):
cursor.execute("INSERT INTO aircraft(altitude) VALUES (%r);" % s.altitude)
else:
cursor.execute("INSERT INTO aircraft(altitude) VALUES (NULL);")
if(s.heading is not None):
cursor.execute("INSERT INTO aircraft(heading) VALUES (%r);" % s.heading)
else:
cursor.execute("INSERT INTO aircraft(heading) VALUES (NULL);")
db.commit()

Check if the data in a database is empty

I'm fetching the data from the database and I would like to check the data above of the data in the same rows as the per channel to see if the cell is empty.
Here is for example table:
---------------------------
| channel | program_id
---------------------------
| ITV |
| ITV | 3021
| ITV | 3022
| ITV | 3023
Here is the code:
def update_in_database(self):
profilePath = xbmc.translatePath(os.path.join('special://userdata/addon_data/script.tvguide', 'source.db'))
conn = database.connect(profilePath)
cur = conn.cursor()
program_id = ''.join(str(x) for x in self.program_id)
cur.execute('SELECT channel, program_id FROM programs WHERE program_id=?;', (program_id,))
data = cur.fetchone()
if data:
#check if the data in a database is empty
Here is the output for the data:
(u'103 ITV', u'3021')
I have got a string of program_id which it is 3021, so I want to check the string in a database to see if the data above of the 3021 is empty so I could do something.
How I can check in a database to see if the data above of the string is empty or not?
Relational databases and SQL are not really designed to be used in this way. When using a database you should not have to fetch a given row above another row.
I would advise you to change the design of your database. This will solve your problem and will make it probably easier to work with the database.
If you can given more information on all the data that needs to be saved in the database I can help you to make a better design.

Mapping rows ids with external csv file?

I have a csv file with addresses information: zip, city, state, country, street, house_no (the last one is house number). This is being Imported throught OpenERP import interface. So you can import related data by providing one of three - name, database id or external id. The simplest is by providing name.
For example for city I don't need to specifically provide it's id (and change column from street to street_id and then that street id), but just its real name like Some city. If such city name exists in city table, then everything will be imported without problems.
But problems arise when there are more than one city with same name. Then to solve name clashes I need to specifically provide those cities ids. But the problem is, there are so many addresses that is nearly impossible to just look and manually change names to ids.
So I'm wondering if it's possible to write some script or pass that csv file to postgresql (or OpenERP using ORM) as condition so it would return me list of ids that matches conditions from csv file.
In my database there is already imported all needed streets in street table and cities in city table.
city table has this structure (with example data):
id| name| state_id|
1 | City1| 1
2 | City1| 2
3 | City2| 2|
state table example:
id| name|
1 | State1
2 | State2
So as you can see same names can be distinguished by their id or by state_id or state name if you would go to state table.
And an example of adddresses csv file (also in database there is table to import that information)
|zip| city | state_id| country | street| house_no
123 | City1| 1 | Country1| Street1| 25a
124 | City1| 2 | Country1| Street2| 34
125 | City2| 2
If I validate such csv file through OpenERP interface, I get warning that there two cities with same name. And if I proceed, then it chooses city that was first imported in database and then some addresses will have city assigned for them with wrong state (keep in mind that column city is also used for various villages etc, so thats why there are same names in different states.
So there I need to change from city names to there ids, but as I said there are hundreds of thousands of lines and doing manually is nearly impossible and would take lots of time.
Finally what I need is to somehow pass all that information from addresses csv file into database, specifically into city table and get return of ids list.
For example if I would input (as condition for city table):
name | state_id|
City1| 1
City1| 2
City2| 2
City1| 1
It should output to me this:
1
2
3
1
Could someone suggest me how to get such result?
I was able to solve this problem by writing this script:
# -*- encoding: utf-8 -*-
#!/usr/bin/python
import psycopg2
import csv
#Connect to database
conn = psycopg2.connect(database="db_name",
user="user", password="password", host="127.0.0.1", port="5432")
cur = conn.cursor()
#Get all cities ids and names with specific state
cur.execute("SELECT id, name from res_country_state_city WHERE state_id = 53")
rows = cur.fetchall()
rows_dict = {}
#Generate dict from data provided
for row in rows:
rows_dict[row[1]] = row[0]
#Check which name from cities-names.csv match with name in database
#(match returns that cities id
with open('cities-names.csv') as csvfile:
with open('cities-ids.csv', 'wb') as csvfile2:
reader = csv.reader(csvfile)
writer = csv.writer(csvfile2)
#create ids csv file and write ids that were matched
for row in reader:
if rows_dict.get(row[0]):
writer.writerow([rows_dict.get(row[0])])
conn.close()

How to get matched Rows from MySQLdb.cursors.Cursor python2.6

I'm working with python2.6 and MySQLdb. I have a table with this data
+----+--------+
| id | status |
+----+--------+
| 1 | A |
| 2 | B |
| 3 | B |
+----+--------+
I want to do an mysql update like this example:
UPDATE my_table SET status = "A" where id in (1,2,3,10001);
Query OK, 2 rows affected (0.03 sec)
Rows matched: 3 Changed: 2 Warnings: 0
And I need to know if all the ids in the update exits in the database. My idea to get this information was to compare the number of items I tried to update vs the number of matched rows. In the example the numbers are 4 vs 3.
The problem is that i don't know how to get the "Matched Rows" from the cursor information. I only see this information in cursor._info = 'Rows matched: 3 Changed: 2 Warnings: 0'.
The cursor.rowcount is the number of changed rows, so =(
Thanks!
If cursor._info contains that string, then you can just extract the 3 with a regex: re.search(r'Rows matched: (\d+)', cursor._info).group(1)
Alternatively, if you are using InnoDB tables (which support transactions), you can execute two queries: first just SELECT id FROM my_table WHERE id in (1,2,3,10001) and then get cursor.rowcount which will return the number of matching rows. Then execute your update. All queries run in the same cursors are part of the same transaction, so you are guaranteed that no other process will write the database between the queries.
Sources: see http://zetcode.com/databases/mysqlpythontutorial/
The FOUND_ROWS option makes cursor.rowcount return the number of matched rows instead:
db_connection = MySQLdb.connect(
host = settings['dbHost'],
user = settings['dbUser'],
passwd = settings['dbPass'],
db = settings['dbName'],
client_flag = MySQLdb.constants.CLIENT.FOUND_ROWS
)
Docs:
http://mysql-python.sourceforge.net/MySQLdb-1.2.2/public/MySQLdb.constants.CLIENT-module.html
http://dev.mysql.com/doc/refman/5.6/en/mysql-real-connect.html
(There's a typo in the MySQLdb docs. "client_flags" should be "client_flag")

Categories