Querying a csv file with default values to columns - python

I have to query this table with the query of type:
query(id,mtype,competition,gender,team1,team2,venue,date)
If every of the parameters in query is given, we can use if statements to store the results. But some of the parameters may not be provided. In that case, we have to consider all the column values.
Also, I have these data in a csv file. I want to read the csv file into a list and then query it. The only catch is that if the user doesn't provide a parameter in the query, it should consider all the values in the column.
Can someone suggest a way to do this with only few if-else statements or suggest some other way?

You can use pandas with read_csv and query , i.e.:
import pandas as pd
# csv file should have the field names on the first row
# id,mtype,competition,gender,team1,team2,venue,date
df = pd.read_csv("the_file.csv", sep=",")
df['date'] = pd.to_datetime(df['date']) # convert date to a datetime object
mtype = "ODM"
sd = "2017-02-18"
ed = "2017-02-20"
df_query = df.query("mtype == '{}' and date > '{}' and date < '{}'".format(mtype, sd, ed))
print df_query
Option 2:
You can also convert the csv file into an sqlite db and issue the queries there, something like:
Import csv to sqlite:
import csv
import sqlite3
import os.path
csv_file = "csv_to_db.csv"
db_file = "csv_to_db.sqlite"
if not os.path.exists(db_file): # if no db_file we create one
con = sqlite3.Connection(db_file,detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
# csv fields: id,mtype,competition,gender,team1,team2,venue,date
cur.execute('CREATE TABLE "venues" ("id" int primary key, "mtype" text,'
' "competition" text, "gender" text, "team1" text, '
'"team2" text, "venue" text, "venue_date" date);')
f = open(csv_file)
csv_reader = csv.reader(f, delimiter=',')
cur.executemany('INSERT INTO venues VALUES (?, ?, ?, ?, ?, ?, ?, ?)', csv_reader)
cur.close()
con.commit()
con.close()
f.close()
Now we can start querying the db. You've asked:
Can you provide an example of type query(mtype,start_date,end_date)
with all other parameter missing?
For that you can use:
conn = sqlite3.connect(db_file,detect_types=sqlite3.PARSE_DECLTYPES)
c = conn.cursor()
start_date = "2017-02-15"
end_date = "2017-02-20"
c.execute("SELECT * FROM {table} WHERE mtype='{query}' AND venue_date BETWEEN date('{start_date}') AND date('{end_date}')".format(table="venues", query="ODM", start_date=start_date, end_date=end_date))
all_rows = c.fetchall()
print( all_rows)
Grab the complete gist

You can use Pandas, which provides a way to filter rows.

Related

trying to build database for stock prices in SQLITE3 but getting this error

#Import libraries in VSCode**
import sqlite3
import investpy
import pandas as pd
connection = sqlite3.connect('app.db')
connection.row_factory = sqlite3.Row
cursor = connection.cursor()
**#Executing database ?**
cursor.execute("""
SELECT ID, Symbol, Name FROM Stock
""")
rows = cursor.fetchall()
symbols = [row['symbol'] for row in rows]
stock_dict = {}
for row in rows:
symbol = row['symbol']
symbols.append(symbol)
stock_dict[symbol] = row['ID']
**#reading symbols from CSV file**
with open('C:/companies.csv') as f:
companies = f.read().splitlines()
looping through all stocks?
for company in companies:
try:
stock = company.split(',')[0]
print(f"processing symbol {stock}")
df = investpy.get_stock_historical_data(stock = stock, country='pakistan',
from_date='01/06/2021',to_date='11/06/2021',
interval = 'Daily')
df = pd.DataFrame(df)
df = df.reset_index(inplace=False)
#print(df)
Stock_ID = stock_dict[symbol]
cursor.execute("""
INSERT INTO Stock_Price (Stock_ID, Date, Open, High, Low, Close, Volume, Currency)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", (Stock_ID, df['Date'], df['Open'], df['High'],
df['Low'],df['Close'],df['Volume'],
df['Currency']))
except Exception as e:
print(symbol)
print(e)
connection.commit()
error
processing symbol ELCM
INTR
Error binding parameter 1 - probably unsupported type.
processing symbol BEST
INTR
Error binding parameter 1 - probably unsupported type.
processing symbol CYAN
INTR
Error binding parameter 1 - probably unsupported type.
Does your csv has a header line? If yes you should skip it.
For example if you have a csv like this:
id, name
1, John
When you will try to insert in your database the first insert will take the id and the name as parameters. That will cause a type error since id would be an integer in your table definition but your are giving it a string.
Code sample:
import csv
with open("test.csv", "r") as my_file:
reader = csv.reader(my_file)
next(reader)
for line in reader:
# do stuff
EDIT
Can you check your table definition? Seems your are passing text. Is this correct?

Error in python showing : Error binding parameter 0 - probably unsupported type

I cant to figure out what I am missing in this code.
import pandas as pd
import sqlite3
conn = sqlite3.connect('test.sqlite')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS testing')
cur.execute('CREATE TABLE testing (Name TEXT, Tutorial TEXT, Datetime Text, Duration TEXT)')
url='source.csv'
data = pd.read_csv(url,sep=",") # use sep="," for coma separation.
print(data)
cur.execute('INSERT INTO testing (Name, Tutorial, Datetime, Duration) VALUES (?, ?, ? ,? )',
(data['Name'], data['Tutorial'], data['Datetime'], data['Duration'] ))
conn.commit()
It is because you are passing in a a pandas Series object instead of a list of tuples as parameter. cursor.execute expects a list of tuples as parameter.
You have to first convert the data frame into a list of tuples. You can use the itertuples function:
data_tuples = list(data.itertuples(index=False, name=None))
cur.execute('INSERT INTO testing (Name, Tutorial, Datetime, Duration) VALUES (?, ?, ? ,? )', data_tuples)
Or you can simply use the DataFrame.to_sql() function:
data.to_sql('testing', conn)
Using the Pandas library to load the CSV file is overkill and the cause of your issue. csv is part of the standard library and is better suited to your use case.
import csv
import sqlite3
import os
DB_FILE = os.path.join(os.getcwd(), "db_filename.db")
CSV_FILE = os.path.join(os.getcwd(), "source.csv")
con = sqlite3.connect(DB_FILE)
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS testing;")
cur.execute("CREATE TABLE testing (Name TEXT, Tutorial TEXT, Datetime TEXT, Duration TEXT);")
with open(CSV_FILE,"r") as f:
csv_data = csv.DictReader(f)
for_db = [(r["Name"], r["Tutorial"], r["Datetime"], r["Duration"]) for r in csv_data]
cur.executemany("INSERT INTO testing (Name, Tutorial, Datetime, Duration) VALUES (?, ?, ?, ?);", for_db)
con.commit()
con.close()

Optimizing reading very large csv and writing it to SQLite

I have a 10gb csv file of userIDs and genders which are sometimes duplicated.
userID,gender
372,f
37261,m
23,m
4725,f
...
Here's my code for importing csv and writing it to SQLite database:
import sqlite3
import csv
path = 'genders.csv'
user_table = 'Users'
conn = sqlite3.connect('db.sqlite')
cur = conn.cursor()
cur.execute(f'''DROP TABLE IF EXISTS {user_table}''')
cur.execute(f'''CREATE TABLE {user_table} (
userID INTEGER NOT NULL,
gender INTEGER,
PRIMARY KEY (userID))''')
with open(path) as csvfile:
datareader = csv.reader(csvfile)
# skip header
next(datareader, None)
for counter, line in enumerate(datareader):
# change gender string to integer
line[1] = 1 if line[1] == 'f' else 0
cur.execute(f'''INSERT OR IGNORE INTO {user_table} (userID, gender)
VALUES ({int(line[0])}, {int(line[1])})''')
conn.commit()
conn.close()
For now, it takes 10 seconds to process 1MB file (In reality, I have more columns and also create more tables.).
I don't think pd.to_sql can be used because I want to have a primary key.
Instead of using cursor.execute for every line, use cursor.executemany and insert all data at once.
Store your values in format _list=[(a,b,c..),(a2,b2,c2...),(a3,b3,c3...)......]
cursor.executemany('''INSERT OR IGNORE INTO {user_table} (userID, gender,...)
VALUES (?,?,...)''',(_list))
conn.commit()
Info:
https://docs.python.org/2/library/sqlite3.html#module-sqlite3

csv into sqlite table python

Using python, I am trying to import a csv into an sqlite table and use the headers in the csv file to become the headers in the sqlite table. The code runs but the table "MyTable" does not appear to be created. Here is the code:
with open ('dict_output.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
#Strips white space in header
columns = [h.strip() for h in columns]
#reader = csv.DictReader(f, fieldnames=columns)
for row in reader:
print(row)
con = sqlite3.connect("city_spec.db")
cursor = con.cursor()
#Inserts data from csv into table in sql database.
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print(query)
cursor = con.cursor()
for row in reader:
cursor.execute(query, row)
#cursor.commit()
con.commit()
con.close()
Thanks in advance for any help.
You can use Pandas to make this easy (you may need to pip install pandas first):
import sqlite3
import pandas as pd
# load data
df = pd.read_csv('dict_output.csv')
# strip whitespace from headers
df.columns = df.columns.str.strip()
con = sqlite3.connect("city_spec.db")
# drop data into database
df.to_sql("MyTable", con)
con.close()
Pandas will do all of the hard work for you, including create the actual table!
You haven't marked your answer solved yet so here goes.
Connect to the database just once, and create a cursor just once.
You can read the csv records only once.
I've added code that creates a crude form of the database table based on the column names alone. Again, this is done just once in the loop.
Your insertion code works fine.
import sqlite3
import csv
con = sqlite3.connect("city_spec.sqlite") ## these statements belong outside the loop
cursor = con.cursor() ## execute them just once
first = True
with open ('dict_output.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
columns = [h.strip() for h in columns]
if first:
sql = 'CREATE TABLE IF NOT EXISTS MyTable (%s)' % ', '.join(['%s text'%column for column in columns])
print (sql)
cursor.execute(sql)
first = False
#~ for row in reader: ## we will read the rows later in the loop
#~ print(row)
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print(query)
cursor = con.cursor()
for row in reader:
cursor.execute(query, row)
con.commit()
con.close()
You can also do it easy with peewee orm. For this you only use an extension from peewee, the playhouse.csv_loader:
from playhouse.csv_loader import *
db = SqliteDatabase('city_spec.db')
Test = load_csv(db, 'dict_output.csv')
You created the database city_spec.db with the headers as fields and the data from the dict_output.csv
If you don't have peewee you can install it with
pip install peewee

How to insert huge CSV file at once into SQL Server in python?

I have a large CSV file and I want to insert it all at once, instead of row by row. This is my code:
import pypyodbc
import csv
con = pypyodbc.connect('driver={SQL Server};' 'server=server_name;' 'database=DB-name;' 'trusted_connection=true')
cur = con.cursor()
csfile = open('out2.csv','r')
csv_data = csv.reader(csfile)
for row in csv_data:
try:
cur.execute("BULK INSERT INTO Table_name(Attribute, error, msg, Value, Success, TotalCount, SerialNo)" "VALUES (?, ?, ?, ?, ?, ?, ?)", row)
except Exception:
time.sleep(60)
cur.close()
con.commit()
con.close()
Bulk Insert should do it for you.
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
SELECT *
FROM CSVTest
GO
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
Also, check out this link.
https://www.simple-talk.com/sql/learn-sql-server/bulk-inserts-via-tsql-in-sql-server/
It really depends on your system resources. You can store CSV file in memory and then insert it into database. But if your CSV file is larger than your RAM there should be some Time issue. You can save each row of csv file as an element in python List.here is my code:
csvRows = []
csvFileObj = open('yourfile.csv', 'r')
readerObj = csv.reader(csvFileObj)
for row in readerObj:
element1 = row[0]
.......
csvRows.append((element1,element2,...))
after that read element of the list and insert it to your db. I don't think there is a direct way to insert All csv rows into sqldb at once. you need some preprocessing.

Categories