I have an Sqlite database that I need to query from python.
The database only has two columns, "key" and "Value", and the value column contains a dictionary with multiple values. What I want to do is create a query to use some of those known dictionary keys as column headers, and the corresponding data under that column.
Is it possible to do that all in a query, or will I have to process the dictionary in python afterwards?
Example data (values obviously have been changed) that I want to query.
key | Value
/auth/user_data/fb_me_user | {"uid":"100008112345597","first_name":"Tim","last_name":"Robins","name":"Tim Robins","emails":["t.robins#gmail.com"]"}
There are lots of other key / value combinations, but this is one of the ones I am interested in.
I would like to query this to produce the following;
UID | Name | Email
100008112345597 | Tim Robins | t.robins#gmail.com
Is that possible just in a query?
Thanks
after querying you get the value like below. from them you can get
value='''{"uid":"100008112345597","first_name":"Tim","last_name":"Robins","name":"Tim Robins","emails":["t.robins#gmail.com"]}'''
import ast
details=ast.literal_eval(value)
print details['uid'],details['name'],','.join(details['emails'])
Related
I'm brand new in python and updating tables using sql. I would like to ask how to update certain group of values in single column using SQL. Please see example below:
id
123
999991234
235
789
200
999993456
I need to add the missing prefix '99999' to the records without '99999'. The id column has integer data type by default. I've tried the sql statement, but I have a conflict between data types that's I've tried with cast statement:
update tablename
set id = concat('99999', cast(id as string))
where id not like '99999%';
To be able to use the LIKE operator and CONCAT() function, the column data type should be a STRING or BYTE. In this case, you would need to cast the WHERE clause condition as well as the value of the SET statement.
Using your sample data:
Ran this update script:
UPDATE mydataset.my_table
SET id = CAST(CONCAT('99999', CAST(id AS STRING)) AS INTEGER)
WHERE CAST(id as STRING) NOT LIKE '99999%'
Result:
Rows were updated successfully and the table ended up with this data:
I have a very large Postgres table with millions of rows. One of the columns is called data and is of type JSONB with nested JSON (but thankfully no sub-arrays). The "schema" for the JSON is mostly consistent, but has evolved a bit over time, gaining and losing new keys and nested keys.
I'd like a process by which I can normalize the column into a new table, and which is as simple a process as possible.
For example, if the table looked like:
id | data
---+----------------------------------------------
1| {"hi": "mom", "age": 43}
2| {"bye": "dad", "age": 41}
it should create and populate a new table such as
id | data.hi | data.age | data.bye
---+----------------------------------------------
1| mom | 43 | NULL
2| NULL | 41 | dad
(Note: that the column names aren't crucial)
In theory, I could do the following:
Select the column into a Pandas DataFrame and run a json_normalize on it
Infer the schema as the superset of the derived columns in step 1
Create a Postgres table with the schema of step 2 and insert (to_sql is an easy way to achieve this)
This doesn't seem too bad, but recall, the table is very large and we should assume that this cannot be done from a single DataFrame. If we try to do the next best thing -which is to batch the above steps- we'll run into the problem that the schema has changed slightly between batches.
Is there a better way to solve this problem then my approach? A "perfect" solution would be "pure SQL" and not involve any Python at all. But I'm not looking for perfection here. Just an automatic and robust process that doesn't require human intervention.
You can try to create a new table via the CREATE TABLE AS statement.
CREATE TABLE newtable AS
SELECT
id,
(data->>'hi')::text AS data_hi,
(data->>'bye')::text AS data_bye,
(data->'age')::int AS data_age
FROM mytable
If the JSON structure is unknown, all keys and data types can be selected like this:
SELECT DISTINCT
jsonb_object_keys(data) as col_name,
jsonb_typeof(data->jsonb_object_keys(data)) as col_type
FROM mytable
Output:
col_name col_type
--------------------
bye string
hi string
age number
For a nested structure
id data
---------
3 {"age": 33, "foo": {"bar": true}}
you can use a recursive query:
WITH RECURSIVE cte AS (
select
jsonb_object_keys(data) as col_name,
jsonb_object_keys(data) as col_path,
jsonb_typeof(data->jsonb_object_keys(data)) as col_type,
data
from mytable
union all
select
jsonb_object_keys(data->col_name) as col_name,
col_path || '_' || jsonb_object_keys(data->col_name) as col_path,
jsonb_typeof(data->col_name->jsonb_object_keys(data->col_name)) as col_type,
data->cte.col_name AS data
from cte
where col_type = 'object'
)
SELECT distinct col_path AS col_name, col_type
FROM cte
WHERE col_type <> 'object';
Output:
col_name col_type
--------------------
age number
foo_bar boolean
Next, you need to build a list of columns for the SELECT clause based on this data for use in the CREATE TABLE AS statement, as shown above.
The following fiddle has a helper that generates the entire SQL:
db<>fiddle
Note that all numeric types, including fractional ones, will be designated as number type and require correction.
I have a data frame like
I have a dictionary with the ec2 instance details
Now, I want to add a new column 'Instance Name' and populate it based on a condition that the instance ID in the dictionary is in the column 'ResourceId' and further, depending on what is there in the Name field in dictionary for that instance Id, I want to populate the new column value for each matching entry
Finally I want to create separate data frames for my specific use-cases e.g. to get only Box-Usage results. Something like this
box_usage = df[df['lineItem/UsageType'].str.contains('BoxUsage')]
print(box_usage.groupby('Instance Name')['lineItem/BlendedCost'].sum())
The new column value is not coming up against the respective Resource Id as I desire. It is rather coming up sequentially.
I have tried bunch of things including what I mentioned in above code, but no result yet. Any help?
After struggling through several options, I used the .apply() way and it did the trick
df.insert(loc=17, column='Instance_Name', value='Other')
instance_id = []
def update_col(x):
for key, val in ec2info.items():
if x == key:
if ('MyAgg' in val['Name']) | ('MyAgg-AutoScalingGroup' in val['Name']):
return 'SharkAggregator'
if ('MyColl AS Group' in val['Name']) | ('MyCollector-AutoScalingGroup' in val['Name']):
return 'SharkCollector'
if ('MyMetric AS Group' in val['Name']) | ('MyMetric-AutoScalingGroup' in val['Name']):
return 'Metric'
df['Instance_Name'] = df.ResourceId.apply(update_col)
df.Instance_Name.fillna(value='Other', inplace=True)
Imagine one has two SQL tables
objects_stock
id | number
and
objects_prop
id | obj_id | color | weight
that should be joined on objects_stock.id=objects_prop.obj_id, hence the plain SQL-query reads
select * from objects_prop join objects_stock on objects_stock.id = objects_prop.obj_id;
How can this query be performed with SqlAlchemy such that all returned columns of this join are accessible?
When I execute
query = session.query(ObjectsStock).join(ObjectsProp, ObjectsStock.id == ObjectsProp.obj_id)
results = query.all()
with ObjectsStock and ObjectsProp the appropriate mapped classes, the list results contains objects of type ObjectsStock - why is that? What would be the correct SqlAlchemy-query to get access to all fields corresponding to the columns of both tables?
Just in case someone encounters a similar problem: the best way I have found so far is listing the columns to fetch explicitly,
query = session.query(ObjectsStock.id, ObjectsStock.number, ObjectsProp.color, ObjectsProp.weight).\
select_from(ObjectsStock).join(ObjectsProp, ObjectsStock.id == ObjectsProp.obj_id)
results = query.all()
Then one can iterate over the results and access the properties by their original column names, e.g.
for r in results:
print(r.id, r.color, r.number)
A shorter way of achieving the result of #ctenar's answer is by unpacking the columns using the star operator:
query = (
session
.query(*ObjectsStock.__table__.columns, *ObjectsProp.__table__.columns)
.select_from(ObjectsStock)
.join(ObjectsProp, ObjectsStock.id == ObjectsProp.obj_id)
)
results = query.all()
This is useful if your tables have many columns.
I want to find all rows where a certain value is present inside the column's list value.
So imagine I have a dataframe set up like this:
| placeID | users |
------------------------------------------------
| 134986| [U1030, U1017, U1123, U1044...] |
| 133986| [U1034, U1011, U1133, U1044...] |
| 134886| [U1031, U1015, U1133, U1044...] |
| 134976| [U1130, U1016, U1133, U1044...] |
How can I get all rows where 'U1030' exists in the users column?
Or... is the real problem that I should not have my data arranged like this, and I should instead explode that column to have a row for each user?
What's the right way to approach this?
The way you have stored data looks fine to me. You do not need to change the format of storing data.
Try this :
df1 = df[df['users'].str.contains("U1030")]
print(df1)
This will give you all the rows containing specified user in df format.
When you are wanting to check whether a value exists inside the column when the value in the column is a list, it's helpful to use the map function.
Implementing it like below, with a lambda inline function, the list of values stored in the 'users' column is mapped to the value u, and userID is compared to it...
Really the answer is pretty straightforward when you look at the code below:
# user_filter filters the dataframe to all the rows where
# 'userID' is NOT in the 'users' column (the value of which
# is a list type)
user_filter = df['users'].map(lambda u: userID not in u)
# cuisine_filter filters the dataframe to only the rows
# where 'cuisine' exists in the 'cuisines' column (the value
# of which is a list type)
cuisine_filter = df['cuisines'].map(lambda c: cuisine in c)
# Display the result, filtering by the weight assigned
df[user_filter & cuisine_filter]