mysql statement to count sends - python

I have this mysql table
id | sender | file_id
1 | A | 376482734627836
2 | B | 67387648327648726
3 | B | 8734682346287346
4 | A | 78623186347812
5 | A | 278618762378
6 | C | 287628681682
7 | A | 8389479247
I'm not good on "making a good sql statement", but I'd like to output the SENDER who have more entries
print("the user: "+user+" is the winner with "+sends+" sends!")
how would be the simple way for that?

In mysql you could use
select sender
from my_table
group by sender
order by count(*) desc limit 1

Related

Sqlalchemy many to one array response

Im working with SQLAlchemy and Flask. I have a content table like:
--------------------------------------------
| id | title | description |
--------------------------------------------
| 1 | example | my content |
| 2 | another piece| my other content|
--------------------------------------------
And a status table like this:
--------------------------------------------------------
| id | content_id | status type | date |
--------------------------------------------------------
| 1 | 1 | written | 1/5/2020 |
| 2 | 1 | edited | 1/7/2020 |
--------------------------------------------------------
I want to be able to query the db and get a content with all of the status's in one row instead of have multiple rows of the content repeated. For example I want:
----------------------------------------------------------
| id | title | description | status's |
----------------------------------------------------------
| 1 | example | my content | [1,2] |
----------------------------------------------------------
Is there a way to do this with sqlalchemy?
You can use this query for fetching your answer:
SELECT b.*,
(SELECT GROUP_CONCAT (id) FROM status_table
WHERE content_id = b.id) AS `status's`
FROM status_table a JOIN content_table b
ON a.content_id = b.id
GROUP BY a.content_id;

sqlalchemy orm - change column in a table depending on another table

I have a 3 tables
table 1
| id | name |
|:---:|:----:|
| 1 | name |
table 2
| id | name | status |
|:---:|:----:|:------:|
| 1 | name | True |
table 3
| id_table1 | id_table2 | datetime | status_table2 |
|:----------:|----------:|:--------:|:-------------:|
| 1 | 1 |01/11/2011| True |
How I can change a status in table 2 when I create a link in table 3, with sqlalchemy ORM in python, status must be changed when link in table 3 created and also must be changed when link deleted, who have any cool and simple ideas?
solved problem by use ORM Events

-find top x by count from MySQL in Python?

I have a csv file like this:
nohaelprince#uwaterloo.ca, 01-05-2014
nohaelprince#uwaterloo.ca, 01-05-2014
nohaelprince#uwaterloo.ca, 01-05-2014
nohaelprince#gmail.com, 01-05-2014
I am reading the above csv file and extracting domain name and also the count of emails address by domain name and date as well. All these things I need to insert into MySQL table called domains which I am able to do it successfully.
Problem Statement:- Now I need to use the same table to report the top 50 domains by count sorted by percentage growth of the last 30 days compared to the total. And this is what I am not able to understand how can I do it?
Below is the code in which I am successfully able to insert into MySQL database but not able to do above reporting task as I am not able to understand how to achieve this task?
#!/usr/bin/python
import fileinput
import csv
import os
import sys
import time
import MySQLdb
from collections import defaultdict, Counter
domain_counts = defaultdict(Counter)
# ======================== Defined Functions ======================
def get_file_path(filename):
currentdirpath = os.getcwd()
# get current working directory path
filepath = os.path.join(currentdirpath, filename)
return filepath
# ===========================================================
def read_CSV(filepath):
with open('emails.csv') as f:
reader = csv.reader(f)
for row in reader:
domain_counts[row[0].split('#')[1].strip()][row[1]] += 1
db = MySQLdb.connect(host="localhost", # your host, usually localhost
user="root", # your username
passwd="abcdef1234", # your password
db="test") # name of the data base
cur = db.cursor()
q = """INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES(%s, %s, STR_TO_DATE(%s, '%%d-%%m-%%Y'))"""
for domain, data in domain_counts.iteritems():
for email_date, email_count in data.iteritems():
cur.execute(q, (domain, email_count, email_date))
db.commit()
# ======================= main program =======================================
path = get_file_path('emails.csv')
read_CSV(path) # read the input file
What is the right way to do the reporting task while using domains table.
Update:
Here is my domains table:
mysql> describe domains;
+----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| domain_name | varchar(20) | NO | | NULL | |
| cnt | int(11) | YES | | NULL | |
| date_of_entry | date | NO | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
And here is data I have in them:
mysql> select * from domains;
+----+---------------+-------+------------+
| id | domain_name | count | date_entry |
+----+---------------+-------+------------+
| 1 | wawa.com | 2 | 2014-04-30 |
| 2 | wawa.com | 2 | 2014-05-01 |
| 3 | wawa.com | 3 | 2014-05-31 |
| 4 | uwaterloo.ca | 4 | 2014-04-30 |
| 5 | uwaterloo.ca | 3 | 2014-05-01 |
| 6 | uwaterloo.ca | 1 | 2014-05-31 |
| 7 | anonymous.com | 2 | 2014-04-30 |
| 8 | anonymous.com | 4 | 2014-05-01 |
| 9 | anonymous.com | 8 | 2014-05-31 |
| 10 | hotmail.com | 4 | 2014-04-30 |
| 11 | hotmail.com | 1 | 2014-05-01 |
| 12 | hotmail.com | 3 | 2014-05-31 |
| 13 | gmail.com | 6 | 2014-04-30 |
| 14 | gmail.com | 4 | 2014-05-01 |
| 15 | gmail.com | 8 | 2014-05-31 |
+----+---------------+-------+------------+
Your needed report can be done in SQL on the MySQL side and Python can be used to call the query, import the resultset, and print out the results.
Consider the following aggregate query with subquery and derived table which follow the percentage growth formula:
((this month domain total cnt) - (last month domain total cnt))
/ (last month all domains total cnt)
SQL
SELECT domain_name, pct_growth
FROM (
SELECT t1.domain_name,
# SUM OF SPECIFIC DOMAIN'S CNT BETWEEN TODAY AND 30 DAYS AGO
(Sum(CASE WHEN t1.date_of_entry >= (CURRENT_DATE - INTERVAL 30 DAY)
THEN t1.cnt ELSE 0 END)
-
# SUM OF SPECIFIC DOMAIN'S CNT AS OF 30 DAYS AGO
Sum(CASE WHEN t1.date_of_entry < (CURRENT_DATE - INTERVAL 30 DAY)
THEN t1.cnt ELSE 0 END)
) /
# SUM OF ALL DOMAINS' CNT AS OF 30 DAYS AGO
(SELECT SUM(t2.cnt) FROM domains t2
WHERE t2.date_of_entry < (CURRENT_DATE - INTERVAL 30 DAY))
As pct_growth
FROM domains t1
GROUP BY t1.domain_name
) As derivedTable
ORDER BY pct_growth DESC
LIMIT 50;
Python
cur = db.cursor()
sql = "SELECT * FROM ..." # SEE ABOVE
cur.execute(sql)
for row in cur.fetchall():
print(row)
If I understand correctly, you just need the ratio of the past thirty days to the total count. You can get this using conditional aggregation. So, assuming that cnt is always greater than 0:
select d.domain_name,
sum(cnt) as CntTotal,
sum(case when date_of_entry >= date_sub(now(), interval 1 month) then cnt else 0 end) as Cnt30Days,
(sum(case when date_of_entry >= date_sub(now(), interval 1 month) then cnt else 0 end) / sum(cnt)) as Ratio30Days
from domains d
group by d.domain_name
order by Ratio30Days desc;

How to substitute a list in python/mysql

This is using the python mysql.connector from MySQL.
I'm wanting to write an update query where id is in a list, e.g.
UPDATE tbl SET thing=1 WHERE id IN (1,2,3,4,5);
If I was placeholding single elements, I would write:
qry = ("UPDATE tbl SET thing=1 WHERE id=%s")
cur.execute (qry,(var,))
I don't know how long my list is each time so I can't go with %s, %s, %s ...n etc. I could ",".join(list) and just write a query with a raw string each time but feels like a hack.
Is there a preferred way to do something like this? This might be a wider question about using placeholders in queries in general but I'm not sure.
If you query requires IN then use IN, there is no need to replace it with = for example:
mysql> select * from foo;
+------+-------+
| id | thing |
+------+-------+
| 1 | 5 |
| 2 | 5 |
| 3 | 5 |
| 4 | 5 |
| 5 | 5 |
+------+-------+
5 rows in set (0.00 sec)
>>> cur = conn.cursor()
>>> my_ids
[1, 3, 5]
>>> sql
'UPDATE foo SET thing=1 WHERE id IN %s'
>>> cur.execute(sql, (my_ids,))
3L
>>> conn.commit()
Then all rows will be updated:
mysql> select * from foo;
+------+-------+
| id | thing |
+------+-------+
| 1 | 1 |
| 2 | 5 |
| 3 | 1 |
| 4 | 5 |
| 5 | 1 |
+------+-------+
5 rows in set (0.00 sec)

BigQuery streaming insertAll appears to lose data - why?

Im trying to use the streaming insert_all method to insert data to a table using the google-api-client gem in ruby.
So I start with creating a new table in Bigquery (read and write priveleges are correct)
with the following contents:
+-----+-----------+-------------+
| Row | person_id | person_name |
+-----+-----------+-------------+
| 1 | 1 | ABCD |
| 2 | 2 | EFGH |
| 3 | 3 | IJKL |
+-----+-----------+-------------+
This is my code in ruby: (I discovered earlier today that tabledata.insert_all is ruby for tabledata.insertAll - google docs / example need to be updated)
def streaming_insert_data_in_table(table, dataset=DATASET)
body = {"rows"=>[
{"json"=> {"person_id"=>10,"person_name"=>"george"}},
{"json"=> {"person_id"=>11,"person_name"=>"washington"}}
]}
result = #client.execute(
:api_method=> #bigquery.tabledata.insert_all,
:parameters=> {
:projectId=> #project_id.to_s,
:datasetId=> dataset,
:tableId=>table},
:body_object=>body,
)
puts result.body
end
So I run my code the first time and all appears fine. I see this in the table on Bigquery:
+-----+-----------+-------------+
| Row | person_id | person_name |
+-----+-----------+-------------+
| 1 | 1 | ABCD |
| 2 | 2 | EFGH |
| 3 | 3 | IJKL |
| 4 | 10 | george |
| 5 | 11 | washington |
+-----+-----------+-------------+
Then I change the data in the method to:
body = {"rows"=>[
{"json"=> {"person_id"=>5,"person_name"=>"john"}},
{"json"=> {"person_id"=>6,"person_name"=>"kennedy"}}
]}
Run the method and get this in Bigquery:
+-----+-----------+-------------+
| Row | person_id | person_name |
+-----+-----------+-------------+
| 1 | 1 | ABCD |
| 2 | 2 | EFGH |
| 3 | 3 | IJKL |
| 4 | 10 | george |
| 5 | 6 | kennedy |
+-----+-----------+-------------+
So, what gives? I've lost data.... (ids 11 and id 5 have vanished) The responses for the request do not have errors either.
Could someone tell me if Im doing something incorrectly or why this is happening please?
Any help is much appreciated.
Thanks and have a great day.
Discovered this appears something to do with the ui (row count doesn't populate for a while and trying to extract the data in the table results in an error "Unexpected. Please try again."). However data is actually stored and can be queried. Thanks for the help Jordan

Categories