Mapping rows ids with external csv file? - python

I have a csv file with addresses information: zip, city, state, country, street, house_no (the last one is house number). This is being Imported throught OpenERP import interface. So you can import related data by providing one of three - name, database id or external id. The simplest is by providing name.
For example for city I don't need to specifically provide it's id (and change column from street to street_id and then that street id), but just its real name like Some city. If such city name exists in city table, then everything will be imported without problems.
But problems arise when there are more than one city with same name. Then to solve name clashes I need to specifically provide those cities ids. But the problem is, there are so many addresses that is nearly impossible to just look and manually change names to ids.
So I'm wondering if it's possible to write some script or pass that csv file to postgresql (or OpenERP using ORM) as condition so it would return me list of ids that matches conditions from csv file.
In my database there is already imported all needed streets in street table and cities in city table.
city table has this structure (with example data):
id| name| state_id|
1 | City1| 1
2 | City1| 2
3 | City2| 2|
state table example:
id| name|
1 | State1
2 | State2
So as you can see same names can be distinguished by their id or by state_id or state name if you would go to state table.
And an example of adddresses csv file (also in database there is table to import that information)
|zip| city | state_id| country | street| house_no
123 | City1| 1 | Country1| Street1| 25a
124 | City1| 2 | Country1| Street2| 34
125 | City2| 2
If I validate such csv file through OpenERP interface, I get warning that there two cities with same name. And if I proceed, then it chooses city that was first imported in database and then some addresses will have city assigned for them with wrong state (keep in mind that column city is also used for various villages etc, so thats why there are same names in different states.
So there I need to change from city names to there ids, but as I said there are hundreds of thousands of lines and doing manually is nearly impossible and would take lots of time.
Finally what I need is to somehow pass all that information from addresses csv file into database, specifically into city table and get return of ids list.
For example if I would input (as condition for city table):
name | state_id|
City1| 1
City1| 2
City2| 2
City1| 1
It should output to me this:
1
2
3
1
Could someone suggest me how to get such result?

I was able to solve this problem by writing this script:
# -*- encoding: utf-8 -*-
#!/usr/bin/python
import psycopg2
import csv
#Connect to database
conn = psycopg2.connect(database="db_name",
user="user", password="password", host="127.0.0.1", port="5432")
cur = conn.cursor()
#Get all cities ids and names with specific state
cur.execute("SELECT id, name from res_country_state_city WHERE state_id = 53")
rows = cur.fetchall()
rows_dict = {}
#Generate dict from data provided
for row in rows:
rows_dict[row[1]] = row[0]
#Check which name from cities-names.csv match with name in database
#(match returns that cities id
with open('cities-names.csv') as csvfile:
with open('cities-ids.csv', 'wb') as csvfile2:
reader = csv.reader(csvfile)
writer = csv.writer(csvfile2)
#create ids csv file and write ids that were matched
for row in reader:
if rows_dict.get(row[0]):
writer.writerow([rows_dict.get(row[0])])
conn.close()

Related

Invalid Date when inserting to Teradata using Python

I'm working on a python piece that will insert a dataframe into a teradata table using pyodbc. The error I can't get past is...
File "file.py", line 33, in <module>
cursor.execute("INSERT INTO DB.TABLE (MASDIV,TRXTYPE,STATION,TUNING_EVNT_START_DT,DOW,MOY,TRANSACTIONS)VALUESrow['MASDIV'],'trx_chtr',row['STATION'],row['TUNING_EVNT_START_DT'],row['DOW'],row['MOY'],row['TRANSACTIONS'])
pyodbc.DataError: ('22008', '[22008] [Teradata][ODBC Teradata Driver][TeradataDatabase] Invalid date supplied for Table.TUNING_EVNT_START_DT. (-2666) (SQLExecDirectW)')
To fill you in... I've got a Teradata table that I want to take a dataframe and insert it into. That table is made as.
CREATE SET TABLE DB.TABLE, FALLBACK
(PK decimal(10,0) NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 1
--MAXVALUE 2147483647
NO CYCLE),
TRXTYPE VARCHAR(10),
MASDIV VARCHAR(30),
STATION VARCHAR(50),
TUNING_EVNT_START_DT DATE format 'MM/DD/YYYY',
DOW VARCHAR(3),
MOY VARCHAR(10),
TRANSACTIONS INT,
ANOMALY_FLAG INT NOT NULL DEFAULT 1)
PRIMARY INDEX (PK);
The primary key and anomaly_flag will be automatically filled in. Below is the script that I am using and running into the error. It is reading in a csv and creating a dataframe. The first two lines (including a header) of the csv look like...
MASDIV | STATION | TUNING_EVNT_START_DT | DOW | MOY | TRANSACTIONS
Staten Island | WFUTDT4 | 9/12/18 | Wed | September | 538
San Fernando Valley | American Heroes Channel HD | 6/28/2018 | Thu | June | 12382
Here is the script that I am using...
'''
Written by Bobby October 1st, 2018
REFERENCE
https://tomaztsql.wordpkress.com/2018/07/15/using-python-pandas-dataframe-to-read-and-insert-data-to-microsoft-sql-server/
'''
import pandas as pd
import pyodbc
from datetime import datetime
#READ IN CSV TEST DATA
df = pd.read_csv('Data\\test_set.csv')
print('CSV LOADED')
#ADJUST DATE FORMAT
df['TUNING_EVNT_START_DT'] = pd.to_datetime(df.TUNING_EVNT_START_DT)
#df['TUNING_EVNT_START_DT'] =
df['TUNING_EVNT_START_DT'].dt.strftime('%m/%d/%Y')
df['TUNING_EVNT_START_DT'] = df['TUNING_EVNT_START_DT'].dt.strftime('%Y-%m-%d')
print('DATE FORMAT CHANGED')
print(df)
#PUSH TO DATABASE
conn = pyodbc.connect('dsn=ConnectR')
cursor = conn.cursor()
# Database table has columns...
# PK | TRXYPE | MASDIV | STATION | TUNING_EVNT_START_DT | DOW | MOY |
TRANSACTIONS | ANOMALY_FLAG
# PK is autoincrementing, TRXTYPE needs to be specified on insert command,
and ANOMALY_FLAG defaults to 1 for yes
for index, row in df.iterrows():
cursor.execute("INSERT INTO DLABBUAnalytics_Lab.Anomaly_Detection_SuperSet(MASDIV,TRXTYPE,STATION,TUNING_EVNT_START_DT,DOW,MOY,TRANSACTIONS)VALUES(?,?,?,?,?,?,?)", row['MASDIV'],'trx_chtr',row['STATION'],row['TUNING_EVNT_START_DT'],row['DOW'],row['MOY'],row['TRANSACTIONS'])
conn.commit()
print('RECORD ENTERED')
print('DF SUCCESSFULLY WRITTEN TO DB')
#PULL FROM DATABASE
sql_conn = pyodbc.connect('dsn=ConnectR')
query = 'SELECT * FROM DLABBUAnalytics_Lab.Anomaly_Detection_SuperSet;'
df = pd.read_sql(query, sql_conn)
print(df)
So in this I am converting the date format and trying to insert row by row into the Teradata table. The first record reads in and is in the database. The second record throws the error that is at the top. The date is 6/28/18 and I've changed it to 6/11/18 just to see if there was a mix up with day and month, but that still had the same problem. Are the columns getting off somewhere and it is trying to insert a different column's value into the date column.
Any ideas or help is much appreciated!
So the issue was in the format of the table. Initially it was built to have the MM/DD/YYYY format from the CSV, but changing it to the YYYY-MM-DD format made the script run perfectly.
Thanks!

Append sqlite3 data from csv to a table whose 1 column is - id INTEGER PRIMARY KEY AUTOINCREMENT

So I have a table, which has the first column called id as autoincrement.
Now, Suppose I have data in the table with ids- 1,2,3
And I also have some data in the csv that starts with id 1,2,3
This is the code that I am trying to use-
cur.execute("CREATE TABLE IF NOT EXISTS sub_features (id INTEGER PRIMARY KEY AUTOINCREMENT,featureId INTEGER, name TEXT, FOREIGN KEY(featureId) REFERENCES features(id))")
df = pd.read_csv(csv_location+'/sub_features_table.csv')
df.to_sql("sub_features", con, if_exists='append', index=False)
I am getting this error-
sqlite3.IntegrityError: UNIQUE constraint failed: sub_features.id
How do I make sure that data gets appended and the id gets set as per requirement and in case the entire row is a duplicate then it gets ignored?
To explain further, Say I have a table:
id | Name
1 | Abhisek
2 | Amit
And I am trying to import this csv to the same table:
id | Name
1 | Abhisek
2 | Rahul
Then my resultant table should be:
id | Name
1 | Abhisek
2 | Amit
3 | Rahul

How to split comma delimited values into multiple rows using Sqlite

I'm using Python and SQLite to manipulate a string in android.
I have a SQLite Table that looks like this:
| ID | Country
+----------------+-------------
| 1 | USA, Germany, Mexico
| 2 | Brazil, Canada
| 3 | Peru
I would like to split the comma delimited values of Country column and insert them into another table countries so that Countries table looks like this
| ID | Country
+----------------+-------------
| 1 | USA
| 1 | Germany
| 1 | Mexico
| 2 | Brazil
| 2 | Canada
| 3 | Peru
How do I do split the values from Country column in one table and insert them into Country column of another table?
There is no split function in SQLite.
There is of course the substring function but it's not suitable for your needs since every row could contain more than 1 commas.
If you were an expert in SQLite I guess you could create a recursive statement using substring to split each row.
If you're not use Python to read the data, split each row and write it back to the db.
You can use a recursive common table expression to split the comma-delimited column by extracting substrings of the Country column recursively.
CREATE TABLE country_split AS
WITH RECURSIVE split(id, value, rest) AS (
SELECT ID, '', Country||',' FROM country
UNION ALL SELECT
id,
substr(rest, 0, instr(rest, ',')),
substr(rest, instr(rest, ',')+1)
FROM split WHERE rest!=''
)
SELECT id, value
FROM split
WHERE value!='';
im solved
im using python
import sqlite3
db = sqlite3.connect(':memory:')
db = sqlite3.connect('mydb.db')
cursor = db.cursor()
cursor.execute("""Select * from Countries""")
all_data = cursor.fetchall()
cursor.execute("""CREATE TABLE IF NOT EXISTS Countriess
(ID TEXT,
Country TEXT)""")
for single_data in all_data:
countriess = single_data[1].split(",")
for single_country in countriess :
cursor.execute("INSERT INTO Countriess VALUES(:id,:name)", { "id": single_data[0], "name": single_country })
db.commit()
and after use sqlite db another project; :)

How to select only certain sections from a tab-delimited file to put into a Database using sqlite3 in Python

I have a tab-delimited file that I need to only select certain columns from it to populate a database using the sqlite3 module in Python. So in my file I have the vertical line symbol "|" and in other cases a column is empty (I tried showing those empty fields by using the space many times but it doesn't show, so just keep in mind that sometimes some fields can be empty instead of having a number). So for example here are some rows of how the file looks like:
78 | 43 | texret | 453 | 4321 | 32 | 433 |
20 | 291 | texttt 372 | 228 | 19 | 999
121 | 46 | textee | 3882 | 322 | 432 | 63 |
You can see the rows (which need to be treated as one row of the file) take up two lines.
I would only like to put in the new database that I will build using sqlite3 the 2nd field (e.g. from the first row is number 43), the text field (i.e. the 5th field in my understanding), and the 13th field (e.g. from the same row is no. 433). So this is what I have in terms of my code:
import sqlite3
con = sqlite3.connect('database_1.db')
cur = con.cursor()
cur.execute('CREATE TABLE Number10(address_no INT, area TEXT, street_no INT, PRIMARY KEY (address_no))')
for line in open('my_tab_text.txt', 'r'):
fields = line.strip().split("\t")
address_no = fields[2]
area = fields[4]
street_no = fields[12]
data = [row for row in fields]
cur.executemany("INSERT INTO Number10 (address_no, area, street_no) VALUES (?, ?, ?);", (data))
con.commit()
So from the above code, when I run it I get the error:
"sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 3, and there are 14 supplied. In other words, it counts in the "|" symbol in as well. I don't know how to work around it. I even tried substituting this line with:
fields = line.strip("|").split("\t")
But it still gave me a ProgrammingError but with less number of fields supplied.
The expected output would be to create a database named as "Number10.db" and it would look like this:
address_no area street_no
78 textret 433
20 textttt 19
121 textee 63
So please notice that there are some rows that have empty fields at certain times. I guess, I would need to put a default value of 0 there. Any help would be greatly appreciated.
Instead of data = [row for row in fields] (a list containing all fields), you want to create a list (or tuple) containing only the rows you want:
data = (address_no, area, street_no)
then pass data in as you have done.

Storing the data from text file into mysql table

I have a text file and a MySQL table. the text file look like below.
new.txt
apple| 3
ball | 4
cat | 2
like this. from this text file I want to store data in the below MySQL table.
| query | count | is_prod_ready | time_of_created | last_updated |
I want to store apple,ball,cat in the column query, and all the number 3,4,2 in count column. in is_prod_ready column will be false by default, in time_of_created will take the current time. and last_updated_column will take the update time.
I have already made the table. and i am not able to store all the data into database from the text file. i have tried the below code.
import MySQLdb
con = MySQLdb.connect(host="localhost",user="root",passwd="9090547207",db="Test")
cur = con.cursor()
query = 'load data local infile "new.txt" into table data field terminated by "|" lines terminated by "\n" '
cur.execute(query)
con.commit()
con.close()
here my data base name is Test and table name is data.

Categories