I am currently learning how to modify data with python using visual studios and sqlite. My assignment is to count how many times emails are found in a file, organize them in a way that each email is then counted. Then I must input these into SQLite as a table named Counts with two rows (org,count). I have wrote a code that runs the program and outputs it onto the visual studios output screen but not the database.
this is my program:
import sqlite3
conn = sqlite3.connect('database3.db')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS Counts')
cur.execute('''CREATE TABLE Counts (email TEXT, count INTEGER)''')
#cur.execute("INSERT INTO Counts Values('mlucygray#gmail.com',1)")
# Save (commit) the changes
conn.commit()
fname = input('Enter file name: ')
if (len(fname) < 1): fname = 'mbox-short.txt'
fh = open(fname)
for line in fh:
if not line.startswith('From: '): continue
pieces = line.split()
email = pieces[1]
cur.execute('SELECT count FROM Counts WHERE email = ? ', (email,))
row = cur.fetchone()
if row is None:
cur.execute('''INSERT INTO Counts (email, count) VALUES (?, 1)''', (email,))
else:
cur.execute('UPDATE Counts SET count = count + 1 WHERE email = ?',(email,))
cur.execute('SELECT * FROM Counts')
# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT email, count FROM Counts ORDER BY count DESC LIMIT 10'
conn.commit()
for row in cur.execute(sqlstr):
print(str(row[0]), row[1])
conn.commit()
cur.close()
click here for the link to the output of the above code
Thank you for any suggestions
You need to commit changes with insert/update and DONT need to commit after executing select statements.
for line in fh:
if not line.lower().startswith('from: '): continue
pieces = line.split()
email = pieces[1]
cur.execute('SELECT count FROM Counts WHERE email = ?', (email,))
row = cur.fetchone()
if row is None:
cur.execute('''INSERT INTO Counts (email, count) VALUES (?, 1)''', (email,))
else:
cur.execute('UPDATE Counts SET count = count + 1 WHERE email = ?',(email,))
conn.commit()
sqlstr = 'SELECT email, count FROM Counts ORDER BY count DESC LIMIT 10'
for row in cur.execute(sqlstr):
print(str(row[0]), row[1])
cur.close()
Related
I am doing an assignment from Coursera Course "Using Databases with Python" and in one of the assignment I ran into this issue where the column of my database result returned has brackets and quotation marks around it. (It should return org as iupui.edu instead of my current result ['iupui.edu']
Please refer to my code below:
import sqlite3
import re
conn = sqlite3.connect('emaildb.sqlite')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS Counts')
cur.execute('''
CREATE TABLE Counts (org TEXT, count INTEGER)''')
fname = input('Enter file name: ')
if (len(fname) < 1): fname = 'mbox-short.txt'
fh = open(fname)
for line in fh:
if not line.startswith('From: '): continue
pieces = line.split()
email = pieces[1]
org = str(re.findall('#(\S+)', email))
cur.execute('SELECT count FROM Counts WHERE org = ? ', (org,))
row = cur.fetchone()
if row is None:
cur.execute('''INSERT INTO Counts (org, count)
VALUES (?, 1)''', (org,))
else:
cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?',
(org,))
conn.commit()
# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'
for row in cur.execute(sqlstr):
print(str(row[0]), row[1])
cur.close()
The mbox file is here: https://www.py4e.com/code3/mbox.txt
I have a feeling I shouldn't convert org to string class but I don't know what else to convert it to because
I would greatly appreciate your help as I've been trying to fix it for hours!
This has to do with how you're saving it:
org = str(re.findall('#(\S+)', email))
Here, you find all email orgs, right? but how do you process them? Instead of taking the first value, you cast it into a string. Here's the problem. findall returns a list, even if there is only one result. This is what you can do:
org = re.findall('#(\S+)', email)[0]
Now, org is still a string, but it no longer has brackets, as you are not casting a list to a string.
I need help with this python code. I am making an application that will read the mailbox data (mbox.txt) and count the number of email messages per organization (i.e. domain name of the email address) using a database with the following schema to maintain the counts. The top organizational count is 536.
This is the Schema: CREATE TABLE Counts (org TEXT, count INTEGER)
I've tried so many times I just can't get the count of 536. Here's my code below:
import sqlite3
conn = sqlite3.connect('emaildb.sqlite')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS Counts')
cur.execute('''
CREATE TABLE Counts (org TEXT, count INTEGER)''')
fname = input('Enter file name: ')
if (len(fname) < 1): fname = 'mbox.txt'
fh = open(name)
for line in fh:
if not line.startswith('From: '): continue
pieces = line.split()
org = pieces[1]
cur.execute('SELECT count FROM Counts WHERE org = ? ', (org,))
row = cur.fetchone()
if row is None:
cur.execute('''INSERT INTO Counts (org, count)
VALUES (?, 1)''', (org,))
else:
cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?',
(org,))
conn.commit()
# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'
for row in cur.execute(sqlstr):
print(str(row[0]), row[1])
cur.close()
The highest number that I got is 195. Here is the output of the code above:
Enter the file name:
zqian#umich.edu 195
mmmay#indiana.edu 161
cwen#iupui.edu 158
chmaurer#iupui.edu 111
aaronz#vt.edu 110
ian#caret.cam.ac.uk 96
jimeng#umich.edu 93
rjlowe#iupui.edu 90
dlhaines#umich.edu 84
david.horwitz#uct.ac.za 67
Here's the link where I got the text file and wrote it to a text file called mbox.txt
(https://www.py4e.com/code3/mbox.txt)
You're not extracting the domain from the email. So multiple emails at the same domain are being treated as different organizations.
for line in fh:
if not line.startswith('From: '): continue
pieces = line.split()
email = pieces[1]
pieces = email.splot('#')
org = pieces[1]
...
Also, you might want to use the code in SQLite INSERT - ON DUPLICATE KEY UPDATE (UPSERT) so you don't have to do a SELECT query to see if the organization already exists.
Your retrieved results are email addresses, not email domains. You have to split the email addresses at the '#' symbol to get domain names:
if not line.startswith('From: '):
continue
pieces = line.split('#') # this is what you want
org = pieces[1]
cur.execute('SELECT count FROM Counts WHERE org = ? ', (org,))
Explanation: instead of splitting the string at every space, which is the default behaviour of the Python str.split() function, we split the string at the '#' sign. So an line in your text file like 'From: name#email.com' would become a list with two parts: ['From: name', 'email.com']
Then you can use the second part and keep track of that part instead, and hopefully the code will work.
I am trying to read some JSON file from web and create a SQL database with the data. I am using ijson to read data as stream. But when the code fails I need to start over to retrieve data. Are there any way to continue reading JSON file from where the program is failed?
I can read the whole document with json.loads but I am assuming the data is too big to read at a time.
You can see my code below.
import sqlite3
import ssl
import urllib.request
import json
import ijson
import time
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
conn = sqlite3.connect('rawdata.sqlite')
cur = conn.cursor()
cur.execute('''DROP TABLE IF EXISTS DailyData ''')
cur.execute('''DROP TABLE IF EXISTS Countries ''')
cur.execute('''DROP TABLE IF EXISTS Continents ''')
cur.execute('''CREATE TABLE IF NOT EXISTS DailyData
(id INTEGER, Day TEXT, Month TEXT, Year TEXT, country_id INTEGER, continent_id INTEGER, Cases TEXT, Deaths TEXT)''')
cur.execute('''CREATE TABLE IF NOT EXISTS Countries
(id INTEGER, CountryCode TEXT UNIQUE, Country TEXT UNIQUE, Population TEXT, continent_id INTEGER)''')
cur.execute('''CREATE TABLE IF NOT EXISTS Continents
(id INTEGER, Continent TEXT UNIQUE)''')
url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/json/"
f = urllib.request.urlopen(url, context=ctx)
reports = ijson.items(f, 'records.item')
sum = 0
count = 0
# error = 0
for item in reports :
iDataRep = item.get('dateRep')
iCases = item.get('cases')
iDeaths = item.get('deaths')
iCountryCode = item.get('countryterritoryCode')
iCountry = item.get('countriesAndTerritories')
iPopulation = item.get('popData2018')
iContinent = item.get('continentExp')
if len(iDataRep) < 0: iDataRep = 0
if len(iCases) < 0: iCases = 0
if len(iDeaths) < 0: iDeaths = 0
if len(iCountryCode) < 0: iCountryCode = 0
if len(iCountry) < 0: iCountry = 0
if len(iPopulation) < 0: iPopulation = 0
if len(iContinent) < 0: iContinent = 0
Spl = iDataRep.split('/')
iDay = Spl[0]
iMonth = Spl[1]
iYear = Spl[2]
id = count + 1
cur.execute('''INSERT OR IGNORE INTO Continents (id, Continent)
VALUES ( ?, ? )''', (id, iContinent))
cur.execute('''SELECT id FROM Continents WHERE Continent = ? ''', (iContinent, ))
continent_id = cur.fetchone()[0]
cur.execute('''INSERT OR IGNORE INTO Countries (id, CountryCode, Country, Population, continent_id)
VALUES ( ?, ?, ?, ?, ? )''', (id, iCountryCode, iCountry, iPopulation, continent_id) )
cur.execute('''SELECT id FROM Countries WHERE Country = ? ''', (iCountry, ))
country_id = cur.fetchone()[0]
cur.execute('''INSERT OR IGNORE INTO DailyData (id, Day, Month, Year, country_id, continent_id, Cases, Deaths)
VALUES ( ?, ?, ?, ?, ?, ?, ? ,?)''', (id, iDay, iMonth, iYear, country_id, continent_id, iCases, iDeaths) )
conn.commit()
# except:
# error = error + 1
# print(error)
# continue
count = count + 1
print(count, 'data retrieved...')
if count % 95 == 0:
time.sleep(1)
print('Program slept a second.')
numCountry = cur.execute('SELECT max(id) FROM Countries' )
numContinent = cur.execute('SELECT max(id) FROM Continents' )
print('From', numCountry, 'different countries and', numContinent, 'continents', count, 'data retrieved.')
cur.close()
I wrote a simple script for parsing csv and insert data into SQL Server.
So, the very strange issue is that some variables are lost if I call them in a if condition.
This is the script:
# DB connection
conn = pypyodbc.connect('DRIVER={SQL Server};SERVER=xxx.xxx.xxx.xxx;DATABASE=SCAN;UID=user;PWD=password')
cursor = conn.cursor()
def main() :
reader = csv.reader(file(filename, "rb"), delimiter=';')
for row in reader :
ip = row[0]
host = row[1]
domain = row[2]
# get Operating System ID
os_id = getOperatingSystem(row[3])
manufacturer = row[4]
model = row[5]
# get computer_manufacturer ID
computer_manufacturer = getManufacturer(manufacturer, computer_model)
arch = getArch(row[6])
values = [ip, host, domain, os_id, manufacturer, arch]
hostIP = getHostIP(ip)
print "hostIP: " +str(hostIP)
if hostIP == 0:
print values
# insert values in DB
cursor.execute(
"""
INSERT INTO dbo.hosts (ip, host, domain, os_id, manufacturer, arch_id)
VALUES (?, ?, ?, ?, ?, ?)
""", values)
cursor.commit()
# return host IP ID
def getHostIP(hostIP) :
cursor.execute("SELECT id FROM mytable WHERE ip = ?", [hostIP])
row = cursor.fetchone()
if row is not None :
return row[0]
return 0
# return ID of Computer Manufacturer
def getComputerManufacturer(manufacturer, computer_model) :
cursor.execute("SELECT id FROM manufacturer WHERE manufacturer = ? AND computer_model = ?", [manufacturer, computer_model])
row = cursor.fetchone()
if row is not None:
return row[0]
else :
return setComputerManufacturer(manufacturer, computer_model)
If I commented cursor_execute and cursor_commit lines the print values correctly shows data, else it shows only the same csv line.
Can you give me a little help?
Thanks
I've created a database using my sql/python programming knowledge. However, I wanted to know how I would be able to clear all data in a given column.
The code I tried to use is below:
#Creating the Table
import sqlite3 as lite
import sys
con = lite.connect('Test.db')
with con:
cur = con.cursor()
cur.execute('SELECT SQLITE_VERSION()')
data = cur.fetchone()
print("SQLite version: %s" % data)
#Adding data
with con:
cur = con.cursor()
cur.execute("CREATE TABLE Users(User_Id INTEGER PRIMARY KEY, Username STRING, Password STRING, Acc_Type STRING, First_Name STRING, Surname STRING, Class STRING, FullName STRING)")
cur.execute("INSERT INTO Users VALUES(1, 'Admin', 'PassWord567', 'Admin', '', 'Admin', 'None', 'Admin')")
cur.execute("INSERT INTO Users VALUES(2, 'HamzahA12', 'password', 'Student', 'Hamzah', 'Akhtar', '13E2', 'Hamzah Akhtar')")
#Clearing a Column
column_length = []
with con:
cur = con.cursor()
cur.execute("SELECT Username FROM Users")
rows = cur.fetchall()
for row in rows:
row = str(row)
column_length.append(row)
length = 1
for item in column_length:
length = str(length)
with con:
cur = con.cursor()
cur.execute("DELETE FROM Users WHERE User_Id = '"+length+"'")
length = int(length)
length = length+1
When i run the code, it clears the table rather than the column. I understand why it does that but i cant find a way around it?!
You don't need the two loops; you can change all column values with a single statement:
con.cursor().execute("UPDATE Users SET Username = NULL")
You need to change the line:
cur.execute("UPDATE Users SET YourColumn=null WHERE User_Id = '"+length+"'")