Using SELECT IN against 5 millions+ records

Using SELECT IN against 5 millions+ records - python

I have a list of entries that is around 6 million in a text file. I have to check against table to return ALL rows are in text file. For that purpose I want to use SEELCT IN. I want to it is OK to convert all of them in a single query and run?
I am using MySQL.

You can create a temporary table or variable in Database insert the values into that table or variable and then you can perform IN operation like given below.
SELECT field
FROM table
WHERE value IN SELECT somevalue from sometable
Thanks

Related

Insert/delete operation for large data in python without merge

I am quite new to python. I have a table that I want to update daily. I get a csv file with large amount of data, about 15000 entries. Each row from the csv file has to be inserted in my table. But If a specific value from the file matches the primary key of any of the rows, the I want to delete the row from the table and instead insert the corresponding row from the csv file. So for eg. if my csv file is like this:
001|test1|test11|test111
002|test2|test22|test222
003|test3|test33|test333
And in my table I have a row with primary key column value=002, then delete that row and insert corresponding row from the file.
I don't have an idea about how many rows I could get in that csv every day, with values matching primary key. I know this can be done with a MERGE query but I am not really sure if it will take a longer time than any other method. And it would also require me to create a temp table and truncate it every time. Same if I use WHERE EXISTS, I would need a temp table.
What is the most efficient way to do this task?
I am using Python 2.7.5 and SQL Server 2017

I think using merge statement is the optimal solution. Create a stage-table matching your target table, truncate it and insert the csv to the stage table. If your sqlserver instance has access to the file you can use bulk insert or open rowset to load it, othervise use python. To load staged data to target table use a MERGE statement.
If your table has column names Id, Col1, Col2, Col3 then something like this:
MERGE INTO dbo.MyTable as TargetTable USING
(
SELECT
Id,Col1,Col2,Col3
FROM dbo.stage_MyTable
) as SourceTable
ON TargetTable.Id = SourceTable.Id
WHEN MATCHED THEN UPDATE SET
Col1 = SourceTable.Col1,
Col2 = SourceTable.Col2,
Col3 = SourceTable.Col3
WHEN NOT MATCHED BY TARGET THEN INSERT
(Id,Col1, Col2,Col3)
VALUES
(SourceTable.Id,SourceTable.Col1, SourceTable.Col2,SourceTable.Col3)
;
The benefit of this approach is that the query will be executed as a single transaction so if there are duplicate rows or similar the table status will be rolled back to previous state.

SQL- Can you Delete a row in BigQuery and Move the deleted row to another table?

I have looked at the documentation on the available statements but I have not seen any statement that will enable me to move deleted rows to another table.
here is a snippet of sql code:
CREATE TABLE %s;
INSERT INTO rm.table_access (%s) VALUES (%s);
DELETE FROM rm.table_access
Where (%s) LIKE 'HEARTBEAT' AND -7 AND -077 AND -77
OUTPUT Deleted.(%s) INTO test_tables;
Any ideas how to approach this? Is it even possible?

No, this is not possible in BigQuery. It has no implementation of the kind of virtual tables for modified and deleted rows that some traditional RDBMS have.
To implement something similar, you would need to select the rows to be deleted into a new table FIRST (using a CREATE TABLE AS SELECT statement), then delete them from the main table.

In your case, I would suggest you create a new table with appropriate filters for each. You can use the CREATE TABLE dataset.newtable AS SELECT x FROM T, you can read more about it here. Thus, your syntax would be:
CREATE TABLE `your_new_table` AS SELECT *
FROM `source_table`
WHERE VALUES LIKE 'HEARTBEAT' AND VALUES LIKE '-7' AND VALUES LIKE '-77'
I used single quotes in the WHERE statement because I assumed that the field you are filtering is a String. Another option would be using the function REGEXP_CONTAINS(), you can find out more about it here. Your syntax for the filter would be simplified, as follows:
WHERE REGEXP_CONTAINS(VALUES,"HEARTBEAT|STRING|STRING")
Notice that the values you compare using the above method must be a String. Therefore you have to make sure of this conversion before using it, you case use the CAST() function.
In addition, if you want to delete rows in your source table, you can use DELETE. The syntax is as below:
DELETE `source_table` WHERE field_1 = 'HEARTBEAT'
Notice that you wold be deleting rows directly from your source table.
I hope it helps.
UPDATE
Creating a new table with the rows you desire and another table with the "deleted" rows.
#Table with the rows which match the filter and will be "deleted"
#Notice that you have to provide the path to you table
#`project.dataset.your_new_table`
CREATE TABLE `your_new_table` AS
SELECT field1, field2 #select the columns you want
FROM `source_table`
WHERE field1 LIKE 'HEARTBEAT' AND field1 LIKE '-7' AND field1 LIKE '-77'
Now, you get the rows which did not passed through the filter in the first step.They will compose the table with the desired rows, as below:
CREATE TABLE `table_desired_rows` AS
SELECT field1, field2 #select the columns you want
FROM `source_table`
WHERE field1 NOT LIKE 'HEARTBEAT'
AND field1 NOT LIKE '-7'
AND field1 NOT LIKE '-77'
Now you have your source table with raw data, another table with the desired rows and a table with the rows you ignored.
Second option:
If you do not need the raw data, that means you can modify the source table. You can first create a table with the ignored rows and then delete these rows from your source data.
#creating the table with the rows which will be deleted
#notice that you create a new table with these rows
CREATE TABLE `table_ignored_rows` AS
SELECT field1, field2 #select the columns you want
FROM `source_table`
WHERE field1 LIKE 'HEARTBEAT'
AND field1 LIKE '-7'
AND field1 LIKE '-77';
#now deleting the rows from the source table
DELETE `source_table` WHERE field1 LIKE 'HEARTBEAT'
AND field1 LIKE '-7'
AND field1 LIKE '-77';

inserting three columns in a table at a time in mysql using select and values

I need to insert in three columns of a table in mysql at a time. First two columns are inserted by selecting data from other tables by using select statement while the third column needs to be inserted directly and it doesn't need any select. I don't know its syntax in mysql. pos is an array and i need to insert it simultaneously.
here is my sql command in python.
sql="insert into quranic_index_2(quran_wordid,translationid,pos) select quranic_words.wordid,quran_english_translations.translationid from quranic_words, quran_english_translation where quranic_words.lemma=%s and quran_english_translations.verse_no=%s and
quran_english_translations.translatorid="%s,values(%s)"
data=l,words[2],var1,words[i+1]
r=cursor.execute(sql,data)
data is passing variables in which all the variables are stored. words[i+1] holds values for pos.

Try using below sample query :
INSERT INTO table_name(field_1, field_2, field3) VALUES
('Value_1', (SELECT value_2,from user_table ), 'value_3')

SQL - update main table using temp table

I have a question about SQL, especially SQLite3. I have two tables, let's name them main_table and temp_table. These tables are based on the same relational schema so they have the same columns but different rows (values).
Now what I want to do:
For each row of the main_table I want to replace it if there is a row in a temp_table with the same ID. Otherwise I want to keep the old row in the table.
I was thinking about using some joins but it does not provides the thing I want.
Would you give me an advice?
EDIT: ADITIONAL INFO:
I would like to avoid writing all columns because those tables conains tens of attributes and since I have to update all columns it couldn't be necessary to write out all of them.

If the tables have the same structure, you can simply use SELECT *:
BEGIN;
DELETE FROM main_table
WHERE id IN (SELECT id
FROM temp_table);
INSERT INTO main_table
SELECT * FROM temp_table;
COMMIT;
(This will also add any new rows in temp_table that did not previously exist in main_table.)

You have 2 approaches:
Update current rows inside main_table with data from temp_table. The relation will be based by ID.
Add a column to temp_table to mark all rows that have to be transferred to main_table or add aditional table to store IDs that have to be transferred. Then delete all rows that have to be transferred from table main_table and insert corresponding rows from temp_table using column with marks or new table.

How do I copy an entire column(including the datatype) from an sqlite table to another table using python?

I'm trying to write a function for removing columns in sqlite(Because sometimes I might want to delete columns which are too old).

From SQLite FAQ:
SQLite has limited ALTER TABLE support
that you can use to add a column to
the end of a table or to change the
name of a table. If you want to make
more complex changes in the structure
of a table, you will have to recreate
the table. You can save existing data
to a temporary table, drop the old
table, create the new table, then copy
the data back in from the temporary
table.
For example, suppose you have a table
named "t1" with columns names "a",
"b", and "c" and that you want to
delete column "c" from this table. The
following steps illustrate how this
could be done:
BEGIN TRANSACTION;
CREATE TEMPORARY TABLE t1_backup(a,b);
INSERT INTO t1_backup SELECT a,b FROM t1;
DROP TABLE t1;
CREATE TABLE t1(a,b);
INSERT INTO t1 SELECT a,b FROM t1_backup;
DROP TABLE t1_backup;
COMMIT;

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using SELECT IN against 5 millions+ records - python

I have a list of entries that is around 6 million in a text file. I have to check against table to return ALL rows are in text file. For that purpose I want to use SEELCT IN. I want to it is OK to convert all of them in a single query and run? I am using MySQL.

You can create a temporary table or variable in Database insert the values into that table or variable and then you can perform IN operation like given below. SELECT field FROM table WHERE value IN SELECT somevalue from sometable Thanks

Related

Insert/delete operation for large data in python without merge

SQL- Can you Delete a row in BigQuery and Move the deleted row to another table?

inserting three columns in a table at a time in mysql using select and values

SQL - update main table using temp table

How do I copy an entire column(including the datatype) from an sqlite table to another table using python?

Categories

Resources