Get the most common word in a MySQL table using Python - python

I have a table containing full of movie genre, like this:
id | genre
---+----------------------------
1 | Drama, Romance, War
2 | Drama, Musical, Romance
3 | Adventure, Biography, Drama
Im looking for a way to get the most common word in the whole genre column and return it to a variable for further step in python.
I'm new to Python so I really don't know how to do it. Currently, I have these lines to connect to the database but don't know the way to get the most common word mentioned above.
conn = mysql.connect()
cursor = conn.cursor()
most_common_word = cursor.execute()
cursor.close()
conn.close()

First you need get list of words in each column. i.e create another table like
genre_words(genre_id bigint, word varchar(50))
For clues how to do that you may check this question:
SQL split values to multiple rows
You can do that as temporary table if you wish or use transaction and rollback. Which one to choose depend of your data size and PC on which DB running.
After that query will be really simple
select count(*) as c, word from genre_word group by word order by count(*) desc limit 1;
You also can do it using python, but if so it will not be a MySQL question at all. Need read table, create simple list of word+counter. If it new, add it, if exist - increase counter.

from collections import Counter
# Connect to database and get rows from table
rows = ...
# Create a list to hold all of the genres
genres = []
# Loop through each row and split the genre string by the comma character
# to create a list of individual genres
for row in rows:
genre_list = row['genre'].split(',')
genres.extend(genre_list)
# Use a Counter to count the number of occurrences of each genre
genre_counts = Counter(genres)
# Get the most common genre
most_common_genre = genre_counts.most_common(1)
# Print the most common genre
print(most_common_genre)

Related

SQL query of Concatenating Client last names

I'm trying to create an sql query that takes records from a File table and a Customer table. A file can have multiple customers. I want to show only one record per File.id and Concatenate the last names based on alphabetical order of the clients if the names are different or only show one if they are the same.
Below is a picture of the Relationship.
Table Relationship
The results from my query look like this currently.
enter image description here
I would like the query to look like this.
File ID
Name
1
Dick Dipe
2
Bill
3
Lola
Originally I had tried doing a subquery but I had issues that there were multiple results and it couldn't list more than one. If I could do a loop and add to an array, I feel like that would work.
If I were to do it in Python, I would write this but when I try to translate that into SQL, I get errors that either the subquery can only display one result or the second name under file two gets cut off.
clients = ['Dick','Dipe','Bill','Lola', 'Lola']
files = [1,2,3]
fileDetails = [[1,0],[1,1],[2,2],[3,3],[3,4]]
file_clients = {}
for file_id, client_index in fileDetails:
if file_id not in file_clients:
file_clients[file_id] = []
client_name = clients[client_index]
file_clients[file_id].append(client_name)
for file_id, client_names in file_clients.items():
client_names = list(dict.fromkeys(client_names))
client_names_string = " ".join(client_names)
print(f"File {file_id}: {client_names_string}")

How do I get only the value of a column identified by a unique userid in MySQL? pls refer to Question for more detailed explanation

I'm currently writing an rpg game in python that uses a mysql database to store info on players. However, I've come across a problem.
Sample Code of How Database has been Set Up:
playerinfo Table
userID | money | xp |
1 | 200 | 20 |
2 | 100 | 10 |
I'm trying to select the amount of money with only the value. My select query right now is
SELECT money FROM playerinfo WHERE id = 1
The full code/function for collecting selecting the info is
def get_money_stats(user_id):
global monresult
remove_characters = ["{", "}", "'"]
try:
with connection.cursor() as cursor:
monsql = "SELECT money FROM players WHERE userid = %s"
value = user_id
cursor.execute(monsql, value)
monresult = str(cursor.fetchone())
except Exception as e:
print(f"An error Occurred> {e}")
CURRENT OUTPUT:
{'money': 200}
DESIRED OUTPUT:
200
Basically, all I want to select is the INT/DATA from the player's row (identified by unique userid). How do I do that? The only solution I have is to replace the characters with something else but I don't really want to do that as it's incredibly inconvenient and messy. Is there another way to reformat/select the data?
It seems like that fetching one row gives you a dictionary of the selected columns with its values, which seems the correct approach to me. You should simply access the dictionary with the column that you want to retrieve:
monresult = cursor.fetchone()['money']
If you don't want to specify again the column (which you should) you could get the values of the dictionary as a list and retrieve the first one:
monresult = list(cursor.fetchone().values())[0]
I do not recommend the last approach because it's heavily dependent on the current status of the query and it may have to change if the query is changed.

update the last entered value from a selection of values in a database with python , mysql

Okay so i have a table which has student id and the student id is used as identifier to edit the column but what if the same student lends a book twice then all the student value will b edited which i don't want....i want the last entered data of student id to b edited and using a Sl.No is not a solution here because its practically complicated.I am using python connector. Please help :) Thanks in advance
code i use right now :
con = mysql.connect(host='localhost', user='root',
password='monkey123', database='BOOK')
c = con.cursor()
c.execute(
f"UPDATE library set `status`='Returned',`date returned`='{str(cal.selection_get())}' WHERE `STUDENT ID`='{e_sch.get()}';")
c.execute('commit')
con.close()
messagebox.showinfo(
'Success', 'Book has been returned successfully')
If I followed you correctly, you want to update just one record that matches the where condition. For this to be done in a reliable manner, you need a column to define the ordering of the records. It could be a date, an incrementing id, or else. I assume that such column exists in your table and is called ordering_column.
A simple option is to use ORDER BY and LIMIT in the UPDATE statement, like so:
sql = """
UPDATE library
SET status = 'Returned', date returned = %s
WHERE student_id = %s
ORDER BY ordering_column DESC
LIMIT 1
"""
c = con.cursor()
c.execute(sql, (str(cal.selection_get()), e_sch.get(), )
Note that I modified your code so input values are given as parameters rather than concatenated into the query string. This is an important change, that makes your code safer and more efficient.

Comparing multiple tables in Sqlite 3 using python

I am quite new to SQLITE3 as well as python. I a complete beginner in SQLite. I don't understand much. I am right now learning as a go for my project.I am working on a project where I have one database with about 20 tables inside of it. One table is for user input and the other tables are pre-loaded with values. How can I compare and match which values that are in the pre-loaded table with the user table?? For example:
Users Table:
Barcode: Item:
1234 milk
4321 cheese
5678 butter
8765 water
9876 sugar
Pre-Loaded Table:
Barcode: Availability:
1234 1
5678 1
9876 1
1111 1
Now, I want to be able to compare each row in the Pre-Loaded Table to each row in the Users Table. They both have the Barcode column in common to be able to compare. As a result, during the query process, it should check each row:
1234 - milk - 1 (those columns are equal )
5678 - butter - 1 ( those columns are equal)
9876 - sugar - 1 (those columns are equal)
1100 - - 1 ( this barcode does not exist in the Users Table)
so when a Barcode, in this case, 1100 doesn't exist in the Users Table, the code should print: You don't have all the items for the Pre-Loaded Table. How can I get the code to this?
so far I have this: This code does work by the way.
import sqlite3 as sq
connect = sq.connect('Food_Data.db')
con = connect.cursor()
sql = ("SELECT Users_Food.Barcode, Users_Food.Item, Recipe1.Ham_Swiss_Omelet FROM Users_Food INNER JOIN Recipe1 ON Users_Food.Barcode = Recipe1.Barcode WHERE Recipe1.Ham_Swiss_Omelet = '1'")
con.execute(sql)
data = con.fetchall()
print("You can make: Ham Swiss Omelet")
formatted_row = '{:<10} {:<9} {:>9} '
print(formatted_row.format("Barcode", "Ingredients", "Availability"))
for row in data:
print(formatted_row.format(*row))
#print (row[:])
#connect.commit()
It prints:
You can make: Ham Swiss Omelet
Barcode Ingredients Availability
9130849874 butter 1
2870896881 eggs 1
5501066727 water 1
1765023029 salt 1
9118188735 pepper 1
4087256674 ham 1
3009527296 cheese 1
The SQLite code:
sql = ("SELECT Users_Food.Barcode, Users_Food.Item, Recipe1.Ham_Swiss_Omelet FROM Users_Food INNER JOIN Recipe1 ON Users_Food.Barcode = Recipe1.Barcode WHERE Recipe1.Ham_Swiss_Omelet = '1'")
It combines the two tables with the Barcode in common and and the corresponding food names and availability. However, If one of the barcode values is not present in the Pre-Loaded table, when I compare how can I go about coding to know that it is not there while still displaying what is there in common between those two tables? It is like checking to see if the tables are identical.
Perhaps try your luck with LEFT JOIN and a CASE statement.
From sqlite doc
If the join-operator is a "LEFT JOIN" or "LEFT OUTER JOIN", then after
the ON or USING filtering clauses have been applied, an extra row is
added to the output for each row in the original left-hand input
dataset that corresponds to no rows at all in the composite dataset
(if any).
You need the Recipe1 table to be the left-hand table, because you need to select every row in that table. All columns from Users_Food will be null in the extra row. The sample query adds another column "status", which you can use in the python. With a little rearranging:
SELECT Users_Food.Barcode, Users_Food.Item, Recipe1.Ham_Swiss_Omelet,
CASE WHEN (Users_Food.Barcode is null then 'You cannot make this recipe' else ' ' END as status
FROM Recipe1
LEFT JOIN Users_Food ON Users_Food.Barcode = Recipe1.Barcode
WHERE Recipe1.Ham_Swiss_Omelet = '1'
In python you might not want to print("You can make: Ham Swiss Omelet") since you won't know whether that is true until you fetch all the returned rows.
After you get the SQL to return the rows that you want, you can play around with the python to get the desired output.

Storing a List into Python Sqlite3

I am trying to scrape form field IDs using Beautiful Soup like this
for link in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
if link.has_key('id'):
print link['id']
Lets us assume that it returns something like
username
email
password
passwordagain
terms
button_register
I would like to write this into Sqlite3 DB.
What I will be doing down the line in my application is... Use these form fields' IDs and try to do a POST may be. The problem is.. there are plenty of sites like this whose form field IDs I have scraped. So the relation is like this...
Domain1 - First list of Form Fields for this Domain1
Domain2 - Second list of Form Fields for this Domain2
.. and so on
What I am unsure here is... How should I design my column for this kind of purpose? Will it be OK if I just create a table with two columns - say
COL 1 - Domain URL (as TEXT)
COL 2 - List of Form Field IDs (as TEXT)
One thing to be remembered is... Down the line in my application I will need to do something like this...
Pseudocode
If Domain is "http://somedomain.com":
For ever item in the COL2 (which is a list of form field ids):
Assign some set of values to each of the form fields & then make a POST request
Can any one guide, please?
EDITed on 22/07/2011 - Is My Below Database Design Correct?
I have decided to have a solution like this. What do you guys think?
I will be having three tables like below
Table 1
Key Column (Auto Generated Integer) - Primary Key
Domain as TEXT
Sample Data would be something like:
1 http://url1.com
2 http://url2.com
3 http://url3.com
Table 2
Domain (Here I will be using the Key Number from Table 1)
RegLink - This will have the registeration link (as TEXT)
Form Fields (as Text)
Sample Data would be something like:
1 http://url1.com/register field1
1 http://url1.com/register field2
1 http://url1.com/register field3
2 http://url2.com/register field1
2 http://url2.com/register field2
2 http://url2.com/register field3
3 http://url3.com/register field1
3 http://url3.com/register field2
3 http://url3.com/register field3
Table 3
Domain (Here I will be using the Key Number from Table 1)
Status (as TEXT)
User (as TEXT)
Pass (as TEXT)
Sample Data would be something like:
1 Pass user1 pass1
2 Fail user2 pass2
3 Pass user3 pass3
Do you think this table design is good? Or are there any improvements that can be made?
There is a normalization problem in your table.
Using 2 tables with
TABLE domains
int id primary key
text name
TABLE field_ids
int id primary key
int domain_id foreign key ref domains
text value
is a better solution.
Proper database design would suggest you have a table of URLs, and a table of fields, each referenced to a URL record. But depending on what you want to do with them, you could pack lists into a single column. See the docs for how to go about that.
Is sqlite a requirement? It might not be the best way to store the data. E.g. if you need random-access lookups by URL, the shelve module might be a better bet. If you just need to record them and iterate over the sites, it might be simpler to store as CSV.
Try this to get the ids:
ids = (link['id'] for link in
BeautifulSoup(content, parseOnlyThese=SoupStrainer('input'))
if link.has_key('id'))
And this should show you how to save them, load them, and do something to each. This uses a single table and just inserts one row for each field for each domain. It's the simplest solution, and perfectly adequate for a relatively small number of rows of data.
from itertools import izip, repeat
import sqlite3
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table domains
(domain text, linkid text)''')
domain_to_insert = 'domain_name'
ids = ['id1', 'id2']
c.executemany("""insert into domains
values (?, ?)""", izip(repeat(domain_to_insert), ids))
conn.commit()
domain_to_select = 'domain_name'
c.execute("""select * from domains where domain=?""", (domain_to_select,))
# this is just an example
def some_function_of_row(row):
return row[1] + ' value'
fields = dict((row[1], some_function_of_row(row)) for row in c)
print fields
c.close()

Categories