Storing a List into Python Sqlite3 - python

I am trying to scrape form field IDs using Beautiful Soup like this
for link in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
if link.has_key('id'):
print link['id']
Lets us assume that it returns something like
username
email
password
passwordagain
terms
button_register
I would like to write this into Sqlite3 DB.
What I will be doing down the line in my application is... Use these form fields' IDs and try to do a POST may be. The problem is.. there are plenty of sites like this whose form field IDs I have scraped. So the relation is like this...
Domain1 - First list of Form Fields for this Domain1
Domain2 - Second list of Form Fields for this Domain2
.. and so on
What I am unsure here is... How should I design my column for this kind of purpose? Will it be OK if I just create a table with two columns - say
COL 1 - Domain URL (as TEXT)
COL 2 - List of Form Field IDs (as TEXT)
One thing to be remembered is... Down the line in my application I will need to do something like this...
Pseudocode
If Domain is "http://somedomain.com":
For ever item in the COL2 (which is a list of form field ids):
Assign some set of values to each of the form fields & then make a POST request
Can any one guide, please?
EDITed on 22/07/2011 - Is My Below Database Design Correct?
I have decided to have a solution like this. What do you guys think?
I will be having three tables like below
Table 1
Key Column (Auto Generated Integer) - Primary Key
Domain as TEXT
Sample Data would be something like:
1 http://url1.com
2 http://url2.com
3 http://url3.com
Table 2
Domain (Here I will be using the Key Number from Table 1)
RegLink - This will have the registeration link (as TEXT)
Form Fields (as Text)
Sample Data would be something like:
1 http://url1.com/register field1
1 http://url1.com/register field2
1 http://url1.com/register field3
2 http://url2.com/register field1
2 http://url2.com/register field2
2 http://url2.com/register field3
3 http://url3.com/register field1
3 http://url3.com/register field2
3 http://url3.com/register field3
Table 3
Domain (Here I will be using the Key Number from Table 1)
Status (as TEXT)
User (as TEXT)
Pass (as TEXT)
Sample Data would be something like:
1 Pass user1 pass1
2 Fail user2 pass2
3 Pass user3 pass3
Do you think this table design is good? Or are there any improvements that can be made?

There is a normalization problem in your table.
Using 2 tables with
TABLE domains
int id primary key
text name
TABLE field_ids
int id primary key
int domain_id foreign key ref domains
text value
is a better solution.

Proper database design would suggest you have a table of URLs, and a table of fields, each referenced to a URL record. But depending on what you want to do with them, you could pack lists into a single column. See the docs for how to go about that.
Is sqlite a requirement? It might not be the best way to store the data. E.g. if you need random-access lookups by URL, the shelve module might be a better bet. If you just need to record them and iterate over the sites, it might be simpler to store as CSV.

Try this to get the ids:
ids = (link['id'] for link in
BeautifulSoup(content, parseOnlyThese=SoupStrainer('input'))
if link.has_key('id'))
And this should show you how to save them, load them, and do something to each. This uses a single table and just inserts one row for each field for each domain. It's the simplest solution, and perfectly adequate for a relatively small number of rows of data.
from itertools import izip, repeat
import sqlite3
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table domains
(domain text, linkid text)''')
domain_to_insert = 'domain_name'
ids = ['id1', 'id2']
c.executemany("""insert into domains
values (?, ?)""", izip(repeat(domain_to_insert), ids))
conn.commit()
domain_to_select = 'domain_name'
c.execute("""select * from domains where domain=?""", (domain_to_select,))
# this is just an example
def some_function_of_row(row):
return row[1] + ' value'
fields = dict((row[1], some_function_of_row(row)) for row in c)
print fields
c.close()

Related

SQL query of Concatenating Client last names

I'm trying to create an sql query that takes records from a File table and a Customer table. A file can have multiple customers. I want to show only one record per File.id and Concatenate the last names based on alphabetical order of the clients if the names are different or only show one if they are the same.
Below is a picture of the Relationship.
Table Relationship
The results from my query look like this currently.
enter image description here
I would like the query to look like this.
File ID
Name
1
Dick Dipe
2
Bill
3
Lola
Originally I had tried doing a subquery but I had issues that there were multiple results and it couldn't list more than one. If I could do a loop and add to an array, I feel like that would work.
If I were to do it in Python, I would write this but when I try to translate that into SQL, I get errors that either the subquery can only display one result or the second name under file two gets cut off.
clients = ['Dick','Dipe','Bill','Lola', 'Lola']
files = [1,2,3]
fileDetails = [[1,0],[1,1],[2,2],[3,3],[3,4]]
file_clients = {}
for file_id, client_index in fileDetails:
if file_id not in file_clients:
file_clients[file_id] = []
client_name = clients[client_index]
file_clients[file_id].append(client_name)
for file_id, client_names in file_clients.items():
client_names = list(dict.fromkeys(client_names))
client_names_string = " ".join(client_names)
print(f"File {file_id}: {client_names_string}")

Get the most common word in a MySQL table using Python

I have a table containing full of movie genre, like this:
id | genre
---+----------------------------
1 | Drama, Romance, War
2 | Drama, Musical, Romance
3 | Adventure, Biography, Drama
Im looking for a way to get the most common word in the whole genre column and return it to a variable for further step in python.
I'm new to Python so I really don't know how to do it. Currently, I have these lines to connect to the database but don't know the way to get the most common word mentioned above.
conn = mysql.connect()
cursor = conn.cursor()
most_common_word = cursor.execute()
cursor.close()
conn.close()
First you need get list of words in each column. i.e create another table like
genre_words(genre_id bigint, word varchar(50))
For clues how to do that you may check this question:
SQL split values to multiple rows
You can do that as temporary table if you wish or use transaction and rollback. Which one to choose depend of your data size and PC on which DB running.
After that query will be really simple
select count(*) as c, word from genre_word group by word order by count(*) desc limit 1;
You also can do it using python, but if so it will not be a MySQL question at all. Need read table, create simple list of word+counter. If it new, add it, if exist - increase counter.
from collections import Counter
# Connect to database and get rows from table
rows = ...
# Create a list to hold all of the genres
genres = []
# Loop through each row and split the genre string by the comma character
# to create a list of individual genres
for row in rows:
genre_list = row['genre'].split(',')
genres.extend(genre_list)
# Use a Counter to count the number of occurrences of each genre
genre_counts = Counter(genres)
# Get the most common genre
most_common_genre = genre_counts.most_common(1)
# Print the most common genre
print(most_common_genre)

Search SQL request with two tables on PostgreSQL. SQLAlchemy. Python

Need help in request making on SQL or SQLAlchemy
First table named as Rows
sid
unit_sid
ROW_UUID1
UNIT_UUID1
ROW_UUID2
UNIT_UUID1
ROW_UUID3
UNIT_UUID
Second table with name Records
row_sid (==SID from ROWS)
item_sid
content (str)
ROW_UUID1
ITEM_UUID1
Decription 1
ROW_UUID1
ITEM_UUID2
Decription 1
ROW_UUID2
ITEM_UUID1
Description 3
ROW_UUID2
ITEM_UUID2
Description 2
ROW_UUID3
ITEM_UUID1
Description 5
ROW_UUID3
ITEM_UUID2
Description 1
I need an example of a SQL query, where I can specify a search for several content values for different item_sid
For example I need all ROWS where
item_sid == ITEM_UUID1 and content == Description 1
item_sid == ITEM_UUID2 and content == Description 1
Request like bellow will not work for me, because I need search in two item_sid in same time for receiving unique ROWS
select row_sid
from rows
left join record on rows.sid = record.row_sid
where (item_sid = '877aeeb4-c68e-4942-b259-288e7aa3c04b' and
content like '%TEXT%')
and (item_sid = 'cc22f239-db6c-4041-92c6-8705cb621525' and
content like '%TEXT2%') GROUP BY row_sid
Solved like
select row_sid
from rows
left join record on rows.sid = record.row_sid
where (item_sid = '877aeeb4-c68e-4942-b259-288e7aa3c04b' and
content like '%TEXT%')
or (item_sid = 'cc22f239-db6c-4041-92c6-8705cb621525' and
content like '%TEXT2%') GROUP BY row_sid having count(row_sid) = 2
But maybe there are more beautiful solution? I want to request different number of item_sids (2-5) in the same time

update the last entered value from a selection of values in a database with python , mysql

Okay so i have a table which has student id and the student id is used as identifier to edit the column but what if the same student lends a book twice then all the student value will b edited which i don't want....i want the last entered data of student id to b edited and using a Sl.No is not a solution here because its practically complicated.I am using python connector. Please help :) Thanks in advance
code i use right now :
con = mysql.connect(host='localhost', user='root',
password='monkey123', database='BOOK')
c = con.cursor()
c.execute(
f"UPDATE library set `status`='Returned',`date returned`='{str(cal.selection_get())}' WHERE `STUDENT ID`='{e_sch.get()}';")
c.execute('commit')
con.close()
messagebox.showinfo(
'Success', 'Book has been returned successfully')
If I followed you correctly, you want to update just one record that matches the where condition. For this to be done in a reliable manner, you need a column to define the ordering of the records. It could be a date, an incrementing id, or else. I assume that such column exists in your table and is called ordering_column.
A simple option is to use ORDER BY and LIMIT in the UPDATE statement, like so:
sql = """
UPDATE library
SET status = 'Returned', date returned = %s
WHERE student_id = %s
ORDER BY ordering_column DESC
LIMIT 1
"""
c = con.cursor()
c.execute(sql, (str(cal.selection_get()), e_sch.get(), )
Note that I modified your code so input values are given as parameters rather than concatenated into the query string. This is an important change, that makes your code safer and more efficient.

Postgresql: Insert from huge csv file, collect the ids and respect unique constraints

In a postgresql database:
class Persons(models.Model):
person_name = models.CharField(max_length=10, unique=True)
The persons.csv file, contains 1 million names.
$cat persons.csv
Name-1
Name-2
...
Name-1000000
I want to:
Create the names that do not already exist
Query the database and fetch the id for each name contained in the csv file.
My approach:
Use the COPY command or the django-postgres-copy application that implements it.
Also take advantage of the new Postgresql-9.5+ upsert feature.
Now, all the names in the csv file, are also in the database.
I need to get their ids -from the database- either in memory or in another csv file with an efficient way:
Use Q objects
list_of_million_q = <iterate csv and append Qs>
million_names = Names.objects.filter(list_of_million_q)
or
Use __in to filter based on a list of names:
list_of_million_names = <iterate csv and append strings>
million_names = Names.objects.filter(
person_name__in=[list_of_million_names]
)
or
?
I do not feel that any of the above approaches for fetching the ids is efficient.
Update
There is a third option, along the lines of this post that should be a great solution which combines all the above.
Something like:
SELECT * FROM persons;
make a name: id dictionary out of the names recieved from the database:
db_dict = {'Harry': 1, 'Bob': 2, ...}
Query the dictionary:
ids = []
for name in list_of_million_names:
if name in db_dict:
ids.append(db_dict[name])
This way you're using the quick dictionary indexing as opposed to the slower if x in list approach.
But the only way to really know for sure is to benchmark these 3 approaches.
This post describes how to use RETURNING with ON CONFLICT so while inserting into the database the contents of the csv file, the ids will be saved in another table either when an insertion was successful, or when -due to unique constraints- the insertion was omitted.
I have tested it in sqlfiddle where I used a set up that resembles the one used for the COPY command which inserts to the database straight from a csv file, respecting the unique constraints.
The schema:
CREATE TABLE IF NOT EXISTS label (
id serial PRIMARY KEY,
label_name varchar(200) NOT NULL UNIQUE
);
INSERT INTO label (label_name) VALUES
('Name-1'),
('Name-2');
CREATE TABLE IF NOT EXISTS ids (
id serial PRIMARY KEY,
label_ids varchar(12) NOT NULL
);
The script:
CREATE TEMP TABLE tmp_table
(LIKE label INCLUDING DEFAULTS)
ON COMMIT DROP;
INSERT INTO tmp_table (label_name) VALUES
('Name-2'),
('Name-3');
WITH ins AS(
INSERT INTO label
SELECT *
FROM tmp_table
ON CONFLICT (label_name) DO NOTHING
RETURNING id
)
INSERT INTO ids (label_ids)
SELECT
id FROM ins
UNION ALL
SELECT
l.id FROM tmp_table
JOIN label l USING(label_name);
The output:
SELECT * FROM ids;
SELECT * FROM label;

Categories