SQL/Python (Django) - Join each row to entire table - python

I'm currently creating an application which maps peoples skills against various technologies.
I have 3 tables;
Employees
Name
Department
Skill
Skill name
Results
Name (FK)
Skill (FK)
Skill level
I wish to be able to see every single employee with each skill listed in a table. I believe the correct procedure to retrieve this information would be to perform some sort of for loop and select the info from the 3 tables? The alternative is adding rows to the results table each time an employee or skill is added (although this doesn't seem like correct logic to me).

I think this is a correct logic. Since you have to keep the level of the skill for each employee.
Lets say you have created three models.
Employee
skill
Result
when you do
to get the skills of employee with id = 37
emp = Employee.objects.get(pk=37)
#here we will get an array which has tuple all the skills and its level for employee
skill_level_array = [(Skill.objects.filter(pk=x.skill), x.level) for x in Result.objects.filter(employee=emp)]
To get skills for all empoyees
all_emp = Employee.objects.all()
grand_array = {}
for emp in all_emp:
skill_level_array = [(Skill.objects.filter(pk=x.skill), x.level) for x in Result.objects.filter(employee=emp)]
grand_array[emp] = skill_level_array
Now grand_array has an array of dictionary, with key as employee and value as array of tuple

Related

SQL query of Concatenating Client last names

I'm trying to create an sql query that takes records from a File table and a Customer table. A file can have multiple customers. I want to show only one record per File.id and Concatenate the last names based on alphabetical order of the clients if the names are different or only show one if they are the same.
Below is a picture of the Relationship.
Table Relationship
The results from my query look like this currently.
enter image description here
I would like the query to look like this.
File ID
Name
1
Dick Dipe
2
Bill
3
Lola
Originally I had tried doing a subquery but I had issues that there were multiple results and it couldn't list more than one. If I could do a loop and add to an array, I feel like that would work.
If I were to do it in Python, I would write this but when I try to translate that into SQL, I get errors that either the subquery can only display one result or the second name under file two gets cut off.
clients = ['Dick','Dipe','Bill','Lola', 'Lola']
files = [1,2,3]
fileDetails = [[1,0],[1,1],[2,2],[3,3],[3,4]]
file_clients = {}
for file_id, client_index in fileDetails:
if file_id not in file_clients:
file_clients[file_id] = []
client_name = clients[client_index]
file_clients[file_id].append(client_name)
for file_id, client_names in file_clients.items():
client_names = list(dict.fromkeys(client_names))
client_names_string = " ".join(client_names)
print(f"File {file_id}: {client_names_string}")

Postgres/Python subfield sanity check

I am struggling with a sanity check. I have a PostgreSQL database on dbeaver and a table named Album. I am trying to automate a sanity check. In the column ArtistId I have a small click-in window on the right, once I click on it I get a new window with ArtistId and Name. For every distinct ArtistId I wish to check whether the name is the corresponding ArtistId. So far I managed to get the table (Album) in a data-frame but I can not reach the sub-field. Thank you in advance for any help.
Problem 1: I need to access the sub-field (Database window picture).
Problem 2: I need to check for every row of the sub-field whether the artist name is 'Iron Maiden' or the corresponding artist name based on ArtistId that the user provided.
My code retrieves the entries in the database.
def get_entries(artistID):
artistID = int(input("Enter an artist_id from the available in postgresql :"))
df = pd.read_sql("SELECT * FROM Album WHERE ArtistId = %s",connection, params=(artistID,) )
return df
For problem 1) I found the following code SELECT x.* FROM Artist x WHERE x.ArtistId = 4 which can give me access to the sub-field.
Database Database window Database er diagram
The solution to the Problem 1 was a simple query:
SELECT *
FROM Album a
CROSS JOIN Artist b
Where a.ArtistId = b.ArtistId
AND a.ArtistId = 90
The solution to the Problem 2 was python line:
#This line returns the rows in my dataframe that are not same to the user_input
df.loc[df.ArtistId != artistID]

Django: queryset "loosing" a value

I have 2 models, one with a list of clients and the other with a list of sales.
My intention is to add sales rank value to the clients queryset.
all_clients = Contactos.objects.values("id", "Vendedor", "codigo", 'Nombre', "NombrePcia", "Localidad", "FechaUltVenta")
sales = Ventas.objects.all()
Once loaded I aggregate all the sales per client summing the subtotal values of their sales and then order the result by their total sales.
sales_client = sales.values('cliente').annotate(
fact_total=Sum('subtotal'))
client_rank = sales_client .order_by('-fact_total')
Then I set the rank of those clients and store the value in a the "Rank" values in the same client_rank queryset.
a = 0
for rank in client_rank:
a = a + 1
rank['Rank'] = a
Everything fine up to now. When I print the results in the template I get the expected values in the "client_rank" queryset: "client name" + "total sales per client" + "Rank".
{'cliente': '684 DROGUERIA SUR', 'fact_total': Decimal('846470'), 'Rank': 1}
{'cliente': '699 KINE ESTETIC', 'fact_total': Decimal('418160'), 'Rank': 2}
etc....
The problem starts here
First we should take into account that not all the clients in the "all_clients" queryset have actual sales in the "sales" queryset. So I must find which ones do have sales, assign them the "Rank" value and a assign a standard value for the ones who don´t.
for subject in all_clients:
subject_code = str(client["codigo"])
try:
selected_subject = ranking_clientes.get(cliente__icontains=subject_code)
subject ['rank'] = selected_subject['Rank']
except:
subject ['rank'] = "Some value"
The Try always fails because "selected_subject" doesn´t seems to hace the "Rank" value. If I print the "selected_subject" I get the following:
{'cliente': '904 BAHIA BLANCA BASKET', 'fact_total': Decimal('33890')}
Any clues on why I´, lossing the "Rank" value? The original "client_rank" queryset still has that value included.
Thanks!
I presume that ranking_clientes is the same as client_rank.
The problem is that .get will always do a new query against the database. This means that any modifications you made to the dictionaries returned in the original query will not have been applied to the result of the get call.
You would need to iterate through your query to find the one you need:
selected_subject = next(client for client in ranking_clientes if subject_code in client.cliente)
Note, this is pretty inefficient if you have a lot of clients. I would rethink your model structure. Alternatively, you could look into using a database function to return the rank directly as part of the original query.

List/Dict structure issue

I'm confused on how to structure a list/dict I need. I have scraped three pieces of info off ESPN: Conference, Team, and link to team homepage for future stat scrapping.
When the program first runs, id like to build a dictionary/list so that one can type in a school and it would print the conference the school is in OR one could select an entire conference and it would print the corresponding list of schools. The link associated with each school isn't important that the end user know about but it is important that the correct link is associated with the correct school so that future stats from that specific school can be scraped.
For example the info scrapped is:
SEC, UGA, www.linka.com
ACC, FSU, www.linkb.com
etc...
I know i could create a list of dictionaries like:
sec_list=[{UGA: www.linka.com, Alabama: www.linkc.com, etc...}]
acc_list=[{FSU: www.linkb.com, etc...}]
The problem is id have to create about 26 lists here to hold every conference which sounds excessive. Is there a way to lump everything into one list but still have the ability to to extract schools from a specific conference or search for a school and the correct conference is also returned? Of course, the link to the school must also correspond to the correct school.
Python ships with sqlite3 to handle database problems and it has an :memory: mode for in-memory databases. I think it will solve your problem directly and with clear code.
import sqlite3
from pprint import pprint
# Load the data from a list of tuples in the from [(conf, school, link), ...]
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('CREATE TABLE Espn (conf text, school text, link text)')
c.execute('CREATE INDEX Cndx ON Espn (conf)')
c.execute('CREATE INDEX Sndx ON Espn (school)')
c.executemany('INSERT INTO Espn VALUES (?, ?, ?)', data)
conn.commit()
# Run queries
pprint(c.execute('SELECT * FROM Espn WHERE conf = "Big10"').fetchall())
pprint(c.execute('SELECT * FROM Espn WHERE school = "Alabama"').fetchall())
In memory databases are so easy to create and query that they are often the easiest solution to the problem of how to have multiple lookup keys and doing analytics on relational data. Trying to use dicts and lists for this kind of work just makes the problem unnecessarily complicated.
It's true you can do this with a list of dictionaries, but you might find it easier to be able to look up information with named fields. In that case, I'd recommend storing your scraped data in a Pandas DataFrame.
You want it so that "one can type in a school and it would print the conference the school is in OR one could select an entire conference and it would print the corresponding list of schools".
Here's an example of what that would look like, using Pandas and a couple of convenience functions.
First, some example data:
confs = ['ACC','Big10','BigEast','BigSouth','SEC',
'ACC','Big10','BigEast','BigSouth','SEC']
teams = ['school{}'.format(x) for x in range(10)]
links = ['www.{}.com'.format(x) for x in range(10)]
scrape = zip(confs,teams,links)
[('ACC', 'school0', 'www.0.com'),
('Big10', 'school1', 'www.1.com'),
('BigEast', 'school2', 'www.2.com'),
('BigSouth', 'school3', 'www.3.com'),
('SEC', 'school4', 'www.4.com'),
('ACC', 'school5', 'www.5.com'),
('Big10', 'school6', 'www.6.com'),
('BigEast', 'school7', 'www.7.com'),
('BigSouth', 'school8', 'www.8.com'),
('SEC', 'school9', 'www.9.com')]
Now convert to DataFrame:
import pandas as pd
df = pd.DataFrame.from_records(scrape, columns=['conf','school','link'])
conf school link
0 ACC school0 www.0.com
1 Big10 school1 www.1.com
2 BigEast school2 www.2.com
3 BigSouth school3 www.3.com
4 SEC school4 www.4.com
5 ACC school5 www.5.com
6 Big10 school6 www.6.com
7 BigEast school7 www.7.com
8 BigSouth school8 www.8.com
9 SEC school9 www.9.com
Type in school, get conference:
def get_conf(df, school):
return df.loc[df.school==school, 'conf'].values
get_conf(df, school = 'school1')
['Big10']
Type in conference, get schools:
def get_schools(df, conf):
return df.loc[df.conf==conf, 'school'].values
get_schools(df, conf = 'Big10')
['school1' 'school6']
It's unclear from your question whether you also want the links associated with schools returned when searching by conference. If so, just update get_schools() to:
def get_schools(df, conf):
return df.loc[df.conf==conf, ['school','link']].values

How to add users to Model from group in Django?

I have a Company that has juniors and seniors. I would like to add users by adding groups instead of individually. Imagine I have Group 1, made of 3 seniors, instead of adding those 3 individually, I'd like to be able to just add Group 1, and have the 3 seniors automatically added to the list of seniors. I'm a little stuck in my current implementation:
class Company(django.model):
juniors = m2m(User)
seniors = m2m(User)
junior_groups = m2m(Group)
senior_groups = m2m(Group)
# currently, I use this signal to add users from a group when a group is added to company
def group_changed(sender, **kwargs):
if kwargs['action'] != 'post_add': return None
co = kwargs['instance']
group_id = kwargs['pk_set'].pop()
juniors = MyGroup.objects.get(pk=group_id).user_set.all()
co.juniors = co.juniors.all() | juniors
co.save()
m2m_changed.connect(...)
The main problem is this looks messy and I have to repeat it for seniors, and potentially other types of users as well.
Is there a more straightforward way to do what I'm trying to do?
Thanks in advance!
are you trying to optimize and avoid having the group object used in your queries ?
if you are ok with a small join query you could use this syntax to get the juniors in company with id = COMP_ID
this way you don't need to handle the users directly and copy them all the time
juniors = User.objects.filter(groups__company_id = COMP_ID , groups__type = Junior)
seniors = User.objects.filter(groups__company_id = COMP_ID , groups__type = Senior)
assuming that
you add related_name "groups" to your m2m relation between groups and users
your groups have type which you manage
you called your foreign-key field 'company' on you Group model
this query can be added as a Property to the company Model , so it give the same programmatic peace of mind

Categories