How to create a table from another table with GridDB? - python

I have a GridDB container where I have stored my database. I want to copy the table but this would exclude a few columns. The function I need should extract all columns matching a given keyword and then create a new table from that. It must always include the first column *id because it is needed on every table.
For example, in the table given below:
'''
-- | employee_id | department_id | employee_first_name | employee_last_name | employee_gender |
-- |-------------|---------------|---------------------|---------------------|-----------------|
-- | 1 | 1 | John | Matthew | M |
-- | 2 | 1 | Alexandra | Philips | F |
-- | 3 | 2 | Hen | Lotte | M |
'''
Suppose I need to get the first column and every other column starting with "employee". How can I do this through a Python function?
I am using GridDB Python client on my Ubuntu machine and I have already stored the database.csv file in the container. Thanks in advance for your help!

Related

update with csv file using python

I have to update the database with the CSV files. Consider the database table looks like this:
The CSV file data looks like this:
As you can see the CSV file data some data modified and some new records are added and what I supposed to do is to update only the data which is modified or some new records which are added.
In Table2 the first record of col2 is modified.. I need to update only the first record of col2(i.e, AA) but not the whole records of col2.
I could do this by hardcoding but I don't want to do it by hardcoding as I need to do this with 2000 tables.
Can anyone suggest me the steps to approach my goal.
Here is my code snippet..
df = pd.read_csv('F:\\filename.csv', sep=",", header=0, dtype=str)
sql_query2 = engine.execute('''
SELECT
*
FROM ttcmcs023111temp
''')
df2 = pd.DataFrame(sql_query2)
df.update(df2)
Since I do not have data similar to you, I used my own DB.
The schema of my books table is as follows:
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | varchar(30) | NO | | NULL | |
| author | char(30) | NO | | NULL | |
+--------+-------------+------+-----+---------+-------+
And the table looks like this:
+----+--------------------+------------------+
| id | name | author |
+----+--------------------+------------------+
| 1 | Origin | Dan Brown |
| 2 | River God | Wilbur Smith |
| 3 | Chromosome 6 | Robin Cook |
| 4 | Where Eagles Dare | Alistair Maclean |
| 5 | The Seventh Scroll | Dan Brown | ### Added wrong entry to prove
+----+--------------------+------------------+ ### my point
So, my approach is to create a new temporary table with the same schema as the books table from the CSV using python.
The code I used is as follows:
sql_query = sqlalchemy.text("CREATE TABLE temp (id int primary key, name varchar(30) not null, author varchar(30) not null)")
result = db_connection.execute(sql_query)
csv_df.to_sql('temp', con = db_connection, index = False, if_exists = 'append')
Which creates a table like this:
+----+--------------------+------------------+
| id | name | author |
+----+--------------------+------------------+
| 1 | Origin | Dan Brown |
| 2 | River God | Wilbur Smith |
| 3 | Chromosome 6 | Robin Cook |
| 4 | Where Eagles Dare | Alistair Maclean |
| 5 | The Seventh Scroll | Wilbur Smith |
+----+--------------------+------------------+
Now, you just need to use the update in MySQL using INNER JOIN to update the values you want to update in your original table. (in my case, 'books').
Here's how you'll do this:
statement = '''update books b
inner join temp t
on t.id = b.id
set b.name = t.name,
b.author = t.author;
'''
db_connection.execute(statement)
This query will update the values in table books from the table temp that I've created using the CSV.
You can destroy the temp table after updating the values.

Link lists that share common elements

I have an issue similar to this one with a few differences/complications
I have a list of groups containing members, rather than merging the groups that share members I need to preserve the groupings and create a new set of edges based on which groups have members in common, and do so conditionally based on attributes of the groups
The source data looks like this:
+----------+------------+-----------+
| Group ID | Group Type | Member ID |
+----------+------------+-----------+
| A | Type 1 | 1 |
| A | Type 1 | 2 |
| B | Type 1 | 2 |
| B | Type 1 | 3 |
| C | Type 1 | 3 |
| C | Type 1 | 4 |
| D | Type 2 | 4 |
| D | Type 2 | 5 |
+----------+------------+-----------+
Desired output is this:
+----------+-----------------+
| Group ID | Linked Group ID |
+----------+-----------------+
| A | B |
| B | C |
+----------+-----------------+
A is linked to B because it shares 2 in common
B is linked to C because it shares 3 in common
C is not linked to D, it has a member in common but is of a different type
The number of shared members doesn't matter for my purposes, a single member in common means they're linked
The output is being used as the edges of a graph, so if the output is a graph that fits the rules that's fine
The source dataset is large (hundreds of millions of rows), so performance is a consideration
This poses a similar question, however I'm new to Python and can't figure out how to get the source data to a point where I can use the answer, or work in the additional requirement of the group type matching
Try some thing like this-
df1=df.groupby(['Group Type','Member ID'])['Group ID'].apply(','.join).reset_index()
df2=df1[df1['Group ID'].str.contains(",")]
This might not handle the case of cyclic grouping.

Filter SQL elements with adjacent ID

I don't really know how to properly state this question in the title.
Suppose I have a table Word like the following:
| id | text |
| --- | --- |
| 0 | Hello |
| 1 | Adam |
| 2 | Hello |
| 3 | Max |
| 4 | foo |
| 5 | bar |
Is it possible to query this table based on text and receive the objects whose primary key (id) is exactly one off?
So, if I do
Word.objects.filter(text='Hello')
I get a QuerySet containing the rows
| id | text |
| --- | --- |
| 0 | Hello |
| 2 | Hello |
but I want the rows
| id | text |
| --- | --- |
| 1 | Adam |
| 3 | Max |
I guess I could do
word_ids = Word.objects.filter(text='Hello').values_list('id', flat=True)
word_ids = [w_id + 1 for w_id in word_ids] # or use a numpy array for this
Word.objects.filter(id__in=word_ids)
but that doesn't seem overly efficient. Is there a straight SQL way to do this in one call? Preferably directly using Django's QuerySets?
EDIT: The idea is that in fact I want to filter those words that are in the second QuerySet. Something like:
Word.objects.filter(text__of__previous__word='Hello', text='Max')
In plain Postgres you could use the lag window function (https://www.postgresql.org/docs/current/static/functions-window.html)
SELECT
id,
name
FROM (
SELECT
*,
lag(name) OVER (ORDER BY id) as prev_name
FROM test
) s
WHERE prev_name = 'Hello'
The lag function adds a column with the text of the previous row. So you can filter by this text in a subquery.
demo:db<>fiddle
I am not really into Django but the documentation means, in version 2.0 the functionality for window function has been added.
If by "1 off" you mean that the difference is exactly 1, then you can do:
select w.*
from w
where w.id in (select w2.id + 1 from words w2 where w2.text = 'Hello');
lag() is also a very reasonable solution. This seems like a direct interpretation of your question. If you have gaps (and the intention is + 1), then lag() is a bit trickier.

SQL/SqlAlchemy: Querying all objects in a dependancy tree

I have a table with a self, asymmetric many-to-many relationship of dependancies between objects. I use that relationship to create a dependably tree between objects.
Having a set of object IDs, I would like to fetch all objects that are somewhere in that dependancy tree.
Here's an example objects table:
+----+------+
| ID | Name |
+----+------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
+----+------+
And a table of relationships:
+------------+-----------+
| Dependancy | Dependant |
+------------+-----------+
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
+------------+-----------+
Showing A (ID: 1) depends on both B(2) and D(4), and that B(2) depends on C(3).
Now, I would like to construct a single SQL query that given {1} as a set with a single ID will return the four objects in A's dependancy tree: A, B, D and C.
Alternatively, using one query to fetch all needed object IDs and another to fetch their actual data is also acceptable.
This should be work regardless of the number of levels in the dependency/hierarchy tree.
I'll be happy with either an SQLAlchemy example or plain SQL for the postgresql 10 database (which I'll see how to implement with SQLAlchemy later on).
Thanks!

Proper way to store ordered set of strings in database

First of all, I have xml file I need to save in mysql database. I have child elements that can occur from one to unbounded times. Are there any constraints I can use in sqlalchemy ORM or I have to save order from application?
The table should look like:
+------+-----------+-------+-----------+
| id | name | part | parent_id |
+------+-----------+-------+-----------+
| 1 | foo | 1 | 123 |
+------+-----------+-------+-----------+
| 2 | bar | 2 | 123 |
+------+-----------+-------+-----------+
| 3 | baz | 1 | 345 |
+------+-----------+-------+-----------+
In other words, what is a proper way to add explicit ordering to many-to-many relationship?
Any ordering needs to be done in code. Once inserted in a table and selected from that table the order is not guaranteed. So also on retrieval you will have to apply an order, in that part adding ORDER BY in SQL is the handiest way to go.

Categories