update with csv file using python - python

I have to update the database with the CSV files. Consider the database table looks like this:
The CSV file data looks like this:
As you can see the CSV file data some data modified and some new records are added and what I supposed to do is to update only the data which is modified or some new records which are added.
In Table2 the first record of col2 is modified.. I need to update only the first record of col2(i.e, AA) but not the whole records of col2.
I could do this by hardcoding but I don't want to do it by hardcoding as I need to do this with 2000 tables.
Can anyone suggest me the steps to approach my goal.
Here is my code snippet..
df = pd.read_csv('F:\\filename.csv', sep=",", header=0, dtype=str)
sql_query2 = engine.execute('''
SELECT
*
FROM ttcmcs023111temp
''')
df2 = pd.DataFrame(sql_query2)
df.update(df2)

Since I do not have data similar to you, I used my own DB.
The schema of my books table is as follows:
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | varchar(30) | NO | | NULL | |
| author | char(30) | NO | | NULL | |
+--------+-------------+------+-----+---------+-------+
And the table looks like this:
+----+--------------------+------------------+
| id | name | author |
+----+--------------------+------------------+
| 1 | Origin | Dan Brown |
| 2 | River God | Wilbur Smith |
| 3 | Chromosome 6 | Robin Cook |
| 4 | Where Eagles Dare | Alistair Maclean |
| 5 | The Seventh Scroll | Dan Brown | ### Added wrong entry to prove
+----+--------------------+------------------+ ### my point
So, my approach is to create a new temporary table with the same schema as the books table from the CSV using python.
The code I used is as follows:
sql_query = sqlalchemy.text("CREATE TABLE temp (id int primary key, name varchar(30) not null, author varchar(30) not null)")
result = db_connection.execute(sql_query)
csv_df.to_sql('temp', con = db_connection, index = False, if_exists = 'append')
Which creates a table like this:
+----+--------------------+------------------+
| id | name | author |
+----+--------------------+------------------+
| 1 | Origin | Dan Brown |
| 2 | River God | Wilbur Smith |
| 3 | Chromosome 6 | Robin Cook |
| 4 | Where Eagles Dare | Alistair Maclean |
| 5 | The Seventh Scroll | Wilbur Smith |
+----+--------------------+------------------+
Now, you just need to use the update in MySQL using INNER JOIN to update the values you want to update in your original table. (in my case, 'books').
Here's how you'll do this:
statement = '''update books b
inner join temp t
on t.id = b.id
set b.name = t.name,
b.author = t.author;
'''
db_connection.execute(statement)
This query will update the values in table books from the table temp that I've created using the CSV.
You can destroy the temp table after updating the values.

Related

How do i delete these rows in SQLite database?

I have a table in a SQLite database as below:
+-------+------------+---------------+-------+------------+
| ROWID | student_id | qualification | grade | date_stamp |
+-------+------------+---------------+-------+------------+
| 1 | 000001 | Mathematics | A | 2022-04-01 |
| 2 | 000002 | NULL | NULL | 2022-03-01 |
| 3 | 000003 | Physics | B | 2022-03-01 |
| 4 | 000003 | NULL | NULL | 2022-02-01 |
+-------+------------+---------------+-------+------------+
It is a table of student exam results, if a student has a qualification in a subject it appears in the table as ROW #1. If a student has no qualifications it appears in the table as ROW #2.
ROW #3 & #4 refer to a student (id 000003) who previously had no qualifications in the database, but now has a B in Physics. I need to delete ROW #4 based on the fact that this now has a qualification and the NULL values are no longer appropriate. ROW #2 for student 000002 should be unaffected.
The date_stamp column just shows when that record was last updated.
Appreciate any help, thanks in advance.
You may try doing a delete with exists logic:
DELETE
FROM yourTable
WHERE qualification IS NULL AND
EXISTS (
SELECT 1
FROM yourTable t
WHERE t.student_id = yourTable.student_id AND
t.qualification IS NOT NULL AND
t.date_stamp > yourTable.date_stamp
);

Sqlalchemy many to one array response

Im working with SQLAlchemy and Flask. I have a content table like:
--------------------------------------------
| id | title | description |
--------------------------------------------
| 1 | example | my content |
| 2 | another piece| my other content|
--------------------------------------------
And a status table like this:
--------------------------------------------------------
| id | content_id | status type | date |
--------------------------------------------------------
| 1 | 1 | written | 1/5/2020 |
| 2 | 1 | edited | 1/7/2020 |
--------------------------------------------------------
I want to be able to query the db and get a content with all of the status's in one row instead of have multiple rows of the content repeated. For example I want:
----------------------------------------------------------
| id | title | description | status's |
----------------------------------------------------------
| 1 | example | my content | [1,2] |
----------------------------------------------------------
Is there a way to do this with sqlalchemy?
You can use this query for fetching your answer:
SELECT b.*,
(SELECT GROUP_CONCAT (id) FROM status_table
WHERE content_id = b.id) AS `status's`
FROM status_table a JOIN content_table b
ON a.content_id = b.id
GROUP BY a.content_id;

Pandas + MySQL On Duplicate Key Update Broken

My ON DUPLICATE UPDATE clause stopped updating and I'm not sure why.
Below is my code to create a temporary table via Pandas:
#connect to mysql database
engine = sqlalchemy.create_engine('mysql://username:#localhost/db?charset=utf8')
conn = engine.connect()
#Create df and write to temp table
df = pd.DataFrame(item_bank,columns=['email','id', 'mbid','artist','track','plays','track_count'])
df.to_sql(con=conn, name='temp', if_exists='replace',index=False)
It successfully creates a MySQL table with all of the data types as 'Text' except for user_tracks which writes as a bigint(20).
I then run this, but the table does not update. It is especially strange to me because I have many scripts that use a similar method, and the only thing I remember changing was to stop updating the other static columns.
mysql_statement = """
INSERT INTO pickaresk.permanent
(email, id, mbid, artist, track, plays, track_count)
SELECT * FROM temp
ON DUPLICATE KEY UPDATE
plays=temp.plays,
track_count=temp.track_count,
lastfm_last_update=NOW()
;
"""
conn.execute(mysql_statement)
conn.close()
The permanent table column's schema that are being updated is shown below. The multiple unique key constraint is the combination of id and email. I also confirmed that there are duplicate keys in both tables
| Field | Type | Null | Key | Default | Extra |
+-------------------------------+--------------+------+-----+---------+-------+
|
| id | varchar(255) | NO | | NULL | |
| email | varchar(120) | NO | MUL | NULL | |
| mbid | varchar(120) | YES | | NULL | |
| artist | varchar(250) | YES | | NULL | |
| track | varchar(250) | YES | | NULL | |
| plays | float | YES | | NULL | |
| track_count | int(11) | YES | | NULL | |
| lastfm_last_update | datetime | YES | | NULL | |
+-------------------------------+--------------+------+-----+---------+-------+

Why django order_by is so slow in a manytomany query?

I have a ManyToMany field. Like this:
class Tag(models.Model):
books = models.ManyToManyField ('book.Book', related_name='vtags', through=TagBook)
class Book (models.Model):
nump = models.IntegerField (default=0, db_index=True)
I have around 450,000 books, and for some tags, it related around 60,000 books. When I did a query like:
tag.books.order_by('nump')[1:11]
It gets extremely slow, like 3-4 minutes.
But if I remove order_by, it run queries as normal.
The raw sql for the order_by version looks like this:
'SELECT `book_book`.`id`, ... `book_book`.`price`, `book_book`.`nump`,
FROM `book_book` INNER JOIN `book_tagbook` ON (`book_book`.`id` =
`book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1 ORDER BY
`book_book`.`nump` ASC LIMIT 11 OFFSET 1'
Do you have any idea on this? How could I fix it? Thanks.
---EDIT---
Checked the previous raw query in mysql as #bouke suggested:
SELECT `book_book`.`id`, `book_book`.`title`, ... `book_book`.`nump`,
`book_book`.`raw_data` FROM `book_book` INNER JOIN `book_tagbook` ON
(`book_book`.`id` = `book_tagbook`.`book_id`) WHERE `book_tagbook`.`tag_id` = 1
ORDER BY `book_book`.`nump` ASC LIMIT 11 OFFSET 1;
11 rows in set (4 min 2.79 sec)
Then use explain to find out why:
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
| 1 | SIMPLE | book_tagbook | ref | book_tagbook_3747b463,book_tagbook_752eb95b | book_tagbook_3747b463 | 4 | const | 116394 | Using temporary; Using filesort |
| 1 | SIMPLE | book_book | eq_ref | PRIMARY | PRIMARY | 4 | legend.book_tagbook.book_id | 1 | |
+----+-------------+--------------+--------+---------------------------------------------+-----------------------+---------+-----------------------------+--------+---------------------------------+
2 rows in set (0.10 sec)
And for the table book_book:
mysql> explain book_book;
+----------------+----------------+------+-----+-----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------+------+-----+-----------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(200) | YES | | NULL | |
| href | varchar(200) | NO | UNI | NULL | |
..... skip some part.............
| nump | int(11) | NO | MUL | 0 | |
| raw_data | varchar(10000) | YES | | NULL | |
+----------------+----------------+------+-----+-----------+----------------+
24 rows in set (0.00 sec)

Insert Data to multiple MySQL database tables from .csv file

I have a database with tables: person, player, coach, and team. All the tables have an auto-increment id field as the primary key. Person has id, firstname, lastname. Player and coach both have the id field, as well as person_id and team_id as foreign keys to tie them to a team.id or person.id field in the other tables.
Now in order to fully populate these tables, I have several csv files with the list of names of the players in each team. Can I write a bash or python script to take this data and input not only the names to the person table, but also have the proper person and team id values put into the player table?
If the question isn't clear just ask and I'll do what I can to clarify. Thanks.
mysql> describe person;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| firstname | varchar(30) | NO | | NULL | |
| lastname | varchar(30) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
mysql> describe player;
+-----------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| person_id | int(11) | NO | MUL | NULL | |
| team_id | int(11) | NO | MUL | NULL | |
+-----------+---------+------+-----+---------+----------------+
mysql> describe team;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| teamname | varchar(25) | NO | | NULL | |
| location | varchar(40) | NO | | NULL | |
| city | varchar(25) | NO | | NULL | |
| state | varchar(2) | NO | | NULL | |
| venue | varchar(35) | NO | | NULL | |
| league_id | int(11) | NO | MUL | NULL | |
+-----------+-------------+------+-----+---------+----------------+
And here is an example of the csv file content:
(AL-Central-Indians.csv)
Fausto,Carmona
Carlos,Carrasco
Kelvin,De La Cruz
Chad,Durbin
You can do this directly using the mysql command as follows:
load data local infile 'AL-Central-Indians.csv' into table player
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(person_id, team_id)
I got that from here. Although that page also deals with exporting excel into CSV first.
Using an ORM might be an overkill for your purpose, but it makes your life easy if you need to do real work with data. It will require you to install some software, but if you are willing to learn some new stuff, you will probably gain a lot down the road. Luckily, it is not very hard to start using it, for example, using Django:
Download and install django
Create a new project using django-admin startproject myproject
Create a new app: ./manage.py startapp myapp
Change the database connection parameters in settings.py
./manage.py inspectdb should create the models for you. Use ./manage.py inspectdb > myapp/models.py to save it.
Execute export DJANGO_SETTINGS_MODULE=settings to allow you to use django from command line scripts
Now you can create an import_players.py script in this fashion:
from myapp.models import Player, Person, Coach, Team
for my_file in my_files: # TODO: Iterate through your files
team = Team.objects.create(name=my_team_name) # creates a db record for a team
for line in lines_in_my_file: # TODO: Iterate through lines in your file
player = Player.objects.create(name=my_player_name, team=team) creates a db record for a player
See this to learn how to work with models: https://docs.djangoproject.com/en/dev/topics/db/models/

Categories