I have a table in a SQLite database as below:
+-------+------------+---------------+-------+------------+
| ROWID | student_id | qualification | grade | date_stamp |
+-------+------------+---------------+-------+------------+
| 1 | 000001 | Mathematics | A | 2022-04-01 |
| 2 | 000002 | NULL | NULL | 2022-03-01 |
| 3 | 000003 | Physics | B | 2022-03-01 |
| 4 | 000003 | NULL | NULL | 2022-02-01 |
+-------+------------+---------------+-------+------------+
It is a table of student exam results, if a student has a qualification in a subject it appears in the table as ROW #1. If a student has no qualifications it appears in the table as ROW #2.
ROW #3 & #4 refer to a student (id 000003) who previously had no qualifications in the database, but now has a B in Physics. I need to delete ROW #4 based on the fact that this now has a qualification and the NULL values are no longer appropriate. ROW #2 for student 000002 should be unaffected.
The date_stamp column just shows when that record was last updated.
Appreciate any help, thanks in advance.
You may try doing a delete with exists logic:
DELETE
FROM yourTable
WHERE qualification IS NULL AND
EXISTS (
SELECT 1
FROM yourTable t
WHERE t.student_id = yourTable.student_id AND
t.qualification IS NOT NULL AND
t.date_stamp > yourTable.date_stamp
);
Related
The dataset contains data about COVID-19 patients. It is in both EXCEL and CSV file formats, and contains several variables and over 7 thousand records (rows) which has made the problem extremely harder and very time consuming to solve manually. Below are the 4 most important variables (columns) needed in solving the problem; 1: id for identifying each record (row), 2: day_at_hosp for each day a patient remained admitted at the hospital, 3: sex of patient, 4: death for whether the patient eventually died or survived.
I want to create a new variable total_days_at_hosp which should contain a total of days a patient remained admitted at hospital.
Old Table:
_______________________________________
| id | day_at_hosp | sex | death |
|_______|_____________|________|________|
| 1 | 0 | male | no |
| 2 | 1 | | |
| 3 | 2 | | |
| 4 | 0 | female | no |
| 5 | 1 | | |
| 6 | 0 | male | no |
| 7 | 0 | female | no |
| 8 | 0 | male | no |
| 9 | 1 | | |
| 10 | 2 | | |
| 11 | 3 | | |
| 12 | 4 | | |
| ... | ... | ... | ... |
| 7882 | 0 | female | no |
| 7883 | 1 | | |
|_______|_____________|________|________|
New Table:
I want to convert table above into table below:
____________________________________________
| id |total_days_at_hosp| sex | death |
|_______|__________________|________|________|
| 1 | 3 | male | no |
| 4 | 2 | male | yes |
| 6 | 1 | male | yes |
| 7 | 1 | female | no |
| 8 | 5 | male | no |
| ... | ... | ... | ... |
| 2565 | 2 | female | no |
|_______|__________________|________|________|
NOTE: the id column is for every record entered, and multiple records were entered for each patient depending on how long a patient remained admitted at the hospital. The day_at_hosp variable contains days: 0=initial day at hospital, 1=second day at hospital, ... , n=nth last day at hospital.
The record (row) where the variable (column) day_at_hosp is 0 corresponds to all entries in other columns, if the record (row) for day_at_hosp is *not 0, say 1,2,3, ...,5 then it belongs to the patient right above, and all the corresponding variables (columns) are left blank.
However the dataset I need should look like the table below.
It should include a new variable (column) called total_days_at_hosp generated from the variable (column) day_at_hosp. The new variable (column) total_days_at_hosp is more useful in statistical tests to be conducted and will replace variable (column) day_at_hosp, so that all blank rows can be deleted.
To move from old table to new table the needed program should do the following:
day_at_hosp ===> total_days_at_hosp
0
1 ---> 3
2
-------------------------------------
0 ---> 2
1
-------------------------------------
0 ---> 1
-------------------------------------
0 ---> 1
-------------------------------------
0
1
2 ---> 5
3
4
-------------------------------------
...
-------------------------------------
0 ---> 2
1
-------------------------------------
How can I achieve this?
Another formula option without dummy value placed at end of the Old/New Table.
1] Create New Table by >>
Copy and paste all Old Table data to a unused area
Click "Autofilter"
In "days_at_hospital" column select =0 value
Copy and paste filter of admissions to New Table column F
Delete all 0s in rows of Column G
Then,
2] In G2, formula copied down :
=IF(F2="","",IF(F3="",MATCH(9^9,A:A)+1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))
Remark : If your "ID Column" is Text value, formula changed to :
=IF(F2="","",IF(F3="",MATCH("zzz",A:A)+1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))
It is apparent that your data are sorted by patient, and that your desired table will be much 'shorter' - accordingly the starting point for this answer is to apply an AutoFilter to your original data, setting the filter criterion to be days_at_hospital = 0, and then copy this filter of admissions to column F:
after deleting the old column G data, the formula below can then be entered in cell G2 and copied down
=INDEX(B:B,MATCH(F3,A:A,0)-1)+1
to keep the formula simple the same dummy maximum value should be entered at both the end of the old and new tables.
I have to update the database with the CSV files. Consider the database table looks like this:
The CSV file data looks like this:
As you can see the CSV file data some data modified and some new records are added and what I supposed to do is to update only the data which is modified or some new records which are added.
In Table2 the first record of col2 is modified.. I need to update only the first record of col2(i.e, AA) but not the whole records of col2.
I could do this by hardcoding but I don't want to do it by hardcoding as I need to do this with 2000 tables.
Can anyone suggest me the steps to approach my goal.
Here is my code snippet..
df = pd.read_csv('F:\\filename.csv', sep=",", header=0, dtype=str)
sql_query2 = engine.execute('''
SELECT
*
FROM ttcmcs023111temp
''')
df2 = pd.DataFrame(sql_query2)
df.update(df2)
Since I do not have data similar to you, I used my own DB.
The schema of my books table is as follows:
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | varchar(30) | NO | | NULL | |
| author | char(30) | NO | | NULL | |
+--------+-------------+------+-----+---------+-------+
And the table looks like this:
+----+--------------------+------------------+
| id | name | author |
+----+--------------------+------------------+
| 1 | Origin | Dan Brown |
| 2 | River God | Wilbur Smith |
| 3 | Chromosome 6 | Robin Cook |
| 4 | Where Eagles Dare | Alistair Maclean |
| 5 | The Seventh Scroll | Dan Brown | ### Added wrong entry to prove
+----+--------------------+------------------+ ### my point
So, my approach is to create a new temporary table with the same schema as the books table from the CSV using python.
The code I used is as follows:
sql_query = sqlalchemy.text("CREATE TABLE temp (id int primary key, name varchar(30) not null, author varchar(30) not null)")
result = db_connection.execute(sql_query)
csv_df.to_sql('temp', con = db_connection, index = False, if_exists = 'append')
Which creates a table like this:
+----+--------------------+------------------+
| id | name | author |
+----+--------------------+------------------+
| 1 | Origin | Dan Brown |
| 2 | River God | Wilbur Smith |
| 3 | Chromosome 6 | Robin Cook |
| 4 | Where Eagles Dare | Alistair Maclean |
| 5 | The Seventh Scroll | Wilbur Smith |
+----+--------------------+------------------+
Now, you just need to use the update in MySQL using INNER JOIN to update the values you want to update in your original table. (in my case, 'books').
Here's how you'll do this:
statement = '''update books b
inner join temp t
on t.id = b.id
set b.name = t.name,
b.author = t.author;
'''
db_connection.execute(statement)
This query will update the values in table books from the table temp that I've created using the CSV.
You can destroy the temp table after updating the values.
I have a table A with the following columns:
id UUID
str_identifier TEXT
num FLOAT
and a table B with similar columns:
str_identifier TEXT
num FLOAT
entry_date TIMESTAMP
I want to construct a sqlalchemy query that does the following:
finds entries in table B that either do not exist yet in table A, and inserts them
finds entries in table B that do exist in table A but have a different value for the num column
The catch is that table B has the entry_date column, and as a result can have multiple entries with the same str_identifier but different entry dates. So I always want to perform this insert/update query using the latest entry for a given str_identifier (if it has multiple entries in table B).
For example, if before the query runs tables A and B are:
[A]
| id | str_identifier | num |
|-----|-----------------|-------|
| 1 | str_id_1 | 25 |
[B]
| str_identifier | num | entry_date |
|----------------|-----|------------|
| str_id_1 | 89 | 2020-07-20 |
| str_id_1 | 25 | 2020-06-20 |
| str_id_1 | 50 | 2020-05-20 |
| str_id_2 | 45 | 2020-05-20 |
After the update query, table A should look like:
[A]
| id | str_identifier | num |
|-----|-----------------|-----|
| 1 | str_id_1 | 89 |
| 2 | str_id_2 | 45 |
The query I've constructed so far should detect difference, but will adding order_by(B.entry_date.desc()) ensure I only do the exist comparisons with the latest str_identifier values?
My Current Query
query = (
select([B.str_identifier, B.value])
.select_from(
join(B, A, onclause=B.str_identifier == A.str_identifier, isouter=True)
)
.where(
and_(
~exists().where(
and_(
B.str_identifier == A.str_identifier,
B.value == A.value,
~B.value.in_([None]),
)
)
)
)
)
Given a pandas dataframe df which looks like following:
p_id | sales | salesperson | year
1 | 10,000| None | 2017
2 | 15,000| None | 2016
5 | 7,000 | None | 2014
5 | 3,000 | None | 2015
There exists an SQL table, persons, which looks like the following:
p_id | p_name | from_year | to_year
1 | Brian Griffin | 2017 | Null
2 | Quagmire | 2016 | Null
5 | Cleveland | 2014 | 2015
5 | Lois Griffin | 2015 | Null
I'm trying to populating the missing data in my dataframe from the SQL table.
A p_id can be reused as long as it's used by 1 person at a time.
What I've done is the following:
for index, row in df.iterrows():
df.at[index, 'salesperson'] = fetch_name(row['p_id'], row['year'])
def fetch_name(pid, year):
meta = sqlalchemy.MetaData()
persons = sqlalchemy.Table('persons', meta, autoload=True, autoload_with=data_engine)
stmt = sqlalchemy.select([persons.c.p_name]).where(
and_(persons.c.p_id == pid, and_(year >= persons.c.from_year,
or_(year < persons.c.to_year, persons.c.to_year.is_(None))))
name = data_engine.execute(stmt).scalar()
return name
This works fine but it's very slow. For a dataframe of 30,000 rows, it takes about 20 minutes to map and populate the missing data.
Would there be a better way to achieve the same result?
I have a 3 tables
table 1
| id | name |
|:---:|:----:|
| 1 | name |
table 2
| id | name | status |
|:---:|:----:|:------:|
| 1 | name | True |
table 3
| id_table1 | id_table2 | datetime | status_table2 |
|:----------:|----------:|:--------:|:-------------:|
| 1 | 1 |01/11/2011| True |
How I can change a status in table 2 when I create a link in table 3, with sqlalchemy ORM in python, status must be changed when link in table 3 created and also must be changed when link deleted, who have any cool and simple ideas?
solved problem by use ORM Events