i got a json feed like this:
{'OUT': '232827###Price;1=9.8=0;2=4.1=1;3=7.9=0;#Needs;1=3.0=0;2=1.4=1;3=3.5=0;'}
it is very strange format:
Without KEYS
Used "=" to be separators
Two sets of data in one json array
With a prefix at the begginning(i am not sure the "OUT" isn't the prefix? maybe)
I want the output result like this with indicated Key:
The First Json:
{
"No":1,
"Price":9.8,
"status":0
},
{
"No":2,
"Price":4.1,
"status":1
},
{
"No":3,
"Price":7.9,
"status":0
}
The Second Json (After"#Needs"):
{
"No":1,
"Needs":3.0,
"status":0
},
{
"No":2,
"Needs":1.4,
"status":1
},
{
"No":3,
"Needs":3.5,
"status":0
}
A regex will handle that nicely
import re
value = {'OUT': '232827###Price;1=9.8=0;2=4.1=1;3=7.9=0;#Needs;1=3.0=0;2=1.4=1;3=3.5=0;'}
result = []
for part in value['OUT'].split("Needs"):
subresult = []
for a, b, c in re.findall(r"(\d+)=([\d.]+)=(\d+)", part):
subresult.append({
"No": int(a),
"Needs": float(b),
"status": int(c)
})
result.append(subresult)
print(result)
[[{'No': 1, 'Needs': 9.8, 'status': 0}, {'No': 2, 'Needs': 4.1, 'status': 1}, {'No': 3, 'Needs': 7.9, 'status': 0}],
[{'No': 1, 'Needs': 3.0, 'status': 0}, {'No': 2, 'Needs': 1.4, 'status': 1}, {'No': 3, 'Needs': 3.5, 'status': 0}]]
Related
Hi i have an array of objects like below,
"output": [
{
'id': 1,
'items': [
{
'id':'1',
'data': {
'id': 3,
}
},
{
'id': '2',
'data': {
'id': 4,
}
}
]
},
{
'id': 2,
'items': [
{
'id':'3',
'data': {
'id': 5,
}
},
]
},
]
I want to retrieve the id property of items array and put it an array so the expected output is ['1','2','3']
below code works in javascript
arr_obj.map(obj=>obj.items.map(item=>item.id)).flat()
how can i do the above in python and django. could someone help me with this. I am new to python and django thanks.
Edit:
An example of how the logging the data in console looks.
output '[{'id': 1,'items': [{'id': 14, 'data': {'id': 1,}],}]'
You can work with list comprehension:
>>> [i['id'] for d in data for i in d['items']]
['1', '2', '3']
where data is the list of dictionaries.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
i'm learning python and some webscraping techniques.
I made a request from a website and i want to get a value from this confuse dict:
newDict = {'actions': [{'account_action_seq': 3186, 'action_trace': {'account_ram_deltas': [], 'act': {'account': 'test', 'authorization': [{'actor': 'test', 'permission': 'xfer'}], 'data': {'from': 'm.federation', 'm
emo': 'test', 'quantity': '0.0442', 'to': 'test'}, 'hex_data': 'test', '
name': 'transfer'}, 'action_ordinal': 5, 'block_num': 117988314, 'block_time': 'test', 'closest_unnotified_ancestor_action_ordinal': 2, 'context_free': False, 'creator_action_ordinal': 2, 'elapsed': 2, 'p
roducer_block_id': 'test', 'receipt': {'abi_sequence': 4, 'act_digest': 'test', 'auth_sequence': [['m.f
ederation', 2]], 'code_sequence': 5, 'global_sequence': 4798388072, 'receiver': 'test', 'recv_sequence': 1514}, 'receiver': 'pvwbq.wam', 'trx_id': '3'}, 'block_num': 117988314, 'block_time': '2021-05-08T00:56:14.000', 'global_action_seq': 4798388072, 'irreversible': True}], 'head_block_num': 117989564, 'last_irreversible_block': 117989233}
I want to print the value in 'quantity' that is 0.0442 but i don't know how to get to it.
Update:
act = conteudo.json()
act_list = act['actions']
act_trace = act_list[0]['action_trace']
act_act = act_trace['act']
act_data = act_act['data']
print(act_data['quantity'])
I reached the value with this code, but i don't know if it's the best way. Could you guys please analise?
A way to solve problems like this is to pretty-print the data in order to understand its layout. Once you've done that, it's usually fairly easy to determine how to access the value(s) you want.
I usually use json.dumps() or pprint.pprint() to do this. In this case I used the former:
import json
newDict = {'actions': [{'account_action_seq': 3186, 'action_trace': {'account_ram_deltas': [], 'act': {'account': 'test', 'authorization': [{'actor': 'test', 'permission': 'xfer'}], 'data': {'from': 'm.federation', 'memo': 'test', 'quantity': '0.0442', 'to': 'test'}, 'hex_data': 'test', 'name': 'transfer'}, 'action_ordinal': 5, 'block_num': 117988314, 'block_time': 'test', 'closest_unnotified_ancestor_action_ordinal': 2, 'context_free': False, 'creator_action_ordinal': 2, 'elapsed': 2, 'producer_block_id': 'test', 'receipt': {'abi_sequence': 4, 'act_digest': 'test', 'auth_sequence': [['m.federation', 2]], 'code_sequence': 5, 'global_sequence': 4798388072, 'receiver': 'test', 'recv_sequence': 1514}, 'receiver': 'pvwbq.wam', 'trx_id': '3'}, 'block_num': 117988314, 'block_time': '2021-05-08T00:56:14.000', 'global_action_seq': 4798388072, 'irreversible': True}], 'head_block_num': 117989564, 'last_irreversible_block': 117989233}
print(json.dumps(newDict, indent=4))
Results:
{
"actions": [
{
"account_action_seq": 3186,
"action_trace": {
"account_ram_deltas": [],
"act": {
"account": "test",
"authorization": [
{
"actor": "test",
"permission": "xfer"
}
],
"data": {
"from": "m.federation",
"memo": "test",
"quantity": "0.0442", # <- BINGO!
"to": "test"
},
"hex_data": "test",
"name": "transfer"
},
"action_ordinal": 5,
"block_num": 117988314,
"block_time": "test",
"closest_unnotified_ancestor_action_ordinal": 2,
"context_free": false,
"creator_action_ordinal": 2,
"elapsed": 2,
"producer_block_id": "test",
"receipt": {
"abi_sequence": 4,
"act_digest": "test",
"auth_sequence": [
[
"m.federation",
2
]
],
"code_sequence": 5,
"global_sequence": 4798388072,
"receiver": "test",
"recv_sequence": 1514
},
"receiver": "pvwbq.wam",
"trx_id": "3"
},
"block_num": 117988314,
"block_time": "2021-05-08T00:56:14.000",
"global_action_seq": 4798388072,
"irreversible": true
}
],
"head_block_num": 117989564,
"last_irreversible_block": 117989233
}
With that information, I was able to come up with this:
quantity = newDict["actions"][0]["action_trace"]["act"]["data"]["quantity"]
print(quantity) # -> 0.0442
Note that quantity is a string, not a numeric value.
I have source_json data looking like this:
{
'ID': {
'0': 8573273,
'1': 8573277
},
'prediction': {
'0': 4.411029362081518,
'1': 4.411029362081518
},
'feature': {
'0': 0,
'1': 0
}
}
But I need it to be in this form:
[
{
'ID': 8573273,
'prediction': 4.411029362081518,
'feature': 0
},
{
'ID': 8573277,
'prediction': 4.411029362081518,
'feature': 0
}
]
I convert the first view to Pandas dataframe and then convert it to the desirable json.
t = pd.DataFrame(source_json)
proper_jsone = t.to_dict(orient='records')
The question is: Is there a proper way to do this without creating an additional dataframe?
I prefer traditional way:
all_keys = list(source_json.keys())
all_indices = list(source_json[all_keys[0]].keys())
transformed_json = [ {_k:source_json[_k][_idx] for _k in all_keys} for _idx in all_indices]
You can try list comprehension
[{'ID':i, 'prediction':j, 'feature':k}
for i,j,k in zip(*[i.values() for i in d.values()]) ]
output
[{'ID': 8573273, 'prediction': 4.411029362081518, 'feature': 0},
{'ID': 8573277, 'prediction': 4.411029362081518, 'feature': 0}]
Currently I've indexed my mongoDB collection into Elasticsearch running in a docker container. I am able to query a document by it's exact name, but Elasticsearch is unable to match the query if it is only part of the name. Here is an example:
>>> es = Elasticsearch('0.0.0.0:9200')
>>> es.indices.get_alias('*')
{'mongodb_meta': {'aliases': {}}, 'sigstore': {'aliases': {}}, 'my-index': {'aliases': {}}}
>>> x = es.search(index='sigstore', body={'query': {'match': {'name': 'KEGG_GLYCOLYSIS_GLUCONEOGENESIS'}}})
>>> x
{'took': 198, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 1, 'relation': 'eq'}, 'max_score': 8.062855, 'hits': [{'_index': 'sigstore', '_type': 'sigs', '_id': '5d66c23228144432307c2c49', '_score': 8.062855, '_source': {'id': 1, 'name': 'KEGG_GLYCOLYSIS_GLUCONEOGENESIS', 'description': 'http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_GLYCOLYSIS_GLUCONEOGENESIS', 'members': ['ACSS2', 'GCK', 'PGK2', 'PGK1', 'PDHB', 'PDHA1', 'PDHA2', 'PGM2', 'TPI1', 'ACSS1', 'FBP1', 'ADH1B', 'HK2', 'ADH1C', 'HK1', 'HK3', 'ADH4', 'PGAM2', 'ADH5', 'PGAM1', 'ADH1A', 'ALDOC', 'ALDH7A1', 'LDHAL6B', 'PKLR', 'LDHAL6A', 'ENO1', 'PKM2', 'PFKP', 'BPGM', 'PCK2', 'PCK1', 'ALDH1B1', 'ALDH2', 'ALDH3A1', 'AKR1A1', 'FBP2', 'PFKM', 'PFKL', 'LDHC', 'GAPDH', 'ENO3', 'ENO2', 'PGAM4', 'ADH7', 'ADH6', 'LDHB', 'ALDH1A3', 'ALDH3B1', 'ALDH3B2', 'ALDH9A1', 'ALDH3A2', 'GALM', 'ALDOA', 'DLD', 'DLAT', 'ALDOB', 'G6PC2', 'LDHA', 'G6PC', 'PGM1', 'GPI'], 'user': 'naji.taleb#medimmune.com', 'type': 'public', 'level1': 'test', 'level2': 'test2', 'time': '08-28-2019 14:03:29 EDT-0400', 'source': 'File', 'mapped': [''], 'notmapped': [''], 'organism': 'human'}}]}}
When using the full name of the document, elasticsearch is able to successfully query it. But this is what happens when I attempt to search part of the name or use a wildcard:
>>> x = es.search(index='sigstore', body={'query': {'match': {'name': 'KEGG'}}})
>>> x
{'took': 17, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}
>>> x = es.search(index='sigstore', body={'query': {'match': {'name': 'KEGG*'}}})
>>> x
{'took': 3, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}
In addition to the default index settings I also tried making an index that allows the use of the nGram tokenizer to enable me to do partial search, but that also didn't work. These are the settings I used for that index:
{
"sigstore": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"max_ngram_diff": "99",
"number_of_shards": "1",
"provided_name": "sigstore",
"creation_date": "1579200699718",
"analysis": {
"filter": {
"substring": {
"type": "nGram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"str_index_analyzer": {
"filter": [
"lowercase",
"substring"
],
"tokenizer": "keyword"
},
"str_search_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "1",
"uuid": "3nf915U6T9maLdSiJozvGA",
"version": {
"created": "7050199"
}
}
}
}
}
and this is the corresponding python command that created it:
es.indices.create(index='sigstore',body={"mappings": {},"settings": { 'index': { "analysis": {"analyzer": {"str_search_analyzer": {"tokenizer": "keyword","filter": ["lowercase"]},"str_index_analyzer": {"tokenizer": "keyword","filter": ["lowercase", "substring"]}},"filter": {"substring": {"type": "nGram","min_gram": 1,"max_gram": 20}}}},'max_ngram_diff': '99'}})
I use mongo-connector as the pipeline between my mongoDB collection and elasticsearch. This is the command I use to start it:
mongo-connector -m mongodb://username:password#xx.xx.xxx.xx:27017/?authSource=admin -t elasticsearch:9200 -d elastic2_doc_manager -n sigstore.sigs
I'm unsure as to why my elasticsearch is unable to get a partial match, and wondering if there is some setting I'm missing or if there's some crucial mistake I've made somewhere. Thanks for reading.
Versions
MongoDB 4.0.10
elasticsearch==7.1.0
elastic2-doc-manager[elastic5]
Updated after checked your gist:
You need to apply the mapping to your field as written in the doc, cf the first link I share in the comment.
You need to do it after applying the settings on your index according to the gist it's line 11.
Something like:
PUT /your_index/_mapping
{
"properties": {
"name": {
"type": "keyword",
"ignore_above": 256,
"fields": {
"str_search_analyzer": {
"type": "text",
"analyzer": "str_search_analyzer"
}
}
}
}
}
After you set the mapping need to apply it to your document, using update_by_query
https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-update-by-query.html
So you can continue to search with term search on your field name as it will be indexed with a keyword mapping (exact match) and on the sub_field name.str_search_analyzer with part of the word.
your_keyword = 'KEGG_GLYCOLYSIS_GLUCONEOGENESIS' OR 'KEGG*'
x = es.search(index='sigstore', body={'query': {'bool': {'should':[{'term': {'name': your_keyword}},
{'match': {'name.str_search_analyzer': your_keyword}}
]}}
})
My Python script connects to an API and gets some JSON.
I've been trying out prettyprint, parse, loads, dumps but I haven't figured them out yet...
Right now, when i do print(request.json()) I get this:
{'info': {'status': 'OK', 'time': {'seconds': 0.050006151199341, 'human': '50 milliseconds'}},
'datalist': {'total': 1, 'count': 1, 'offset': 0, 'limit': 3, 'next': 1, 'hidden': 0, 'loaded': True, 'list': [
{'id': 27862209, 'name': 'Fate/Grand Order', 'package': 'com.xiaomeng.fategrandorder',
'uname': 'komoe-game-fate-go', 'size': 49527668,
'icon': 'http://pool.img.xxxxx.com/msi8/9b58a48638b480c17135a10810374bd6_icon.png',
'graphic': 'http://pool.img.xxxxx.com/msi8/3a240b50ac37a9824b9ac99f1daab8c8_fgraphic_705x345.jpg',
'added': '2017-05-20 10:54:53', 'modified': '2017-05-20 10:54:53', 'updated': '2018-02-12 12:35:51',
'uptype': 'regular', 'store': {'id': 750918, 'name': 'msi8',
'avatar': 'http://pool.img.xxxxx.com/msi8/c61a8cfe9f68bfcfb71ef59b46a8ae5d_ravatar.png',
'appearance': {'theme': 'grey',
'description': '❤️ Welcome To Msi8 Store & My Store Will Mostly Be Specialized in Games With OBB File Extension. I Hope You Find What You Are Looking For Here ❤️'},
'stats': {'apps': 20776, 'subscribers': 96868, 'downloads': 25958359}},
'file': {'vername': '1.14.5', 'vercode': 52, 'md5sum': 'xxxxx', 'filesize': 49527668,
'path': 'http://pool.apk.xxxxx.com/msi8/com-xiaomeng-fategrandorder-52-27862209-32a264b031d6933514970c43dea4191f.apk',
'path_alt': 'http://pool.apk.xxxxx.com/msi8/alt/Y29tLXhpYW9tZW5nLWZhdGVncmFuZG9yZGVyLTUyLTI3ODYyMjA5LTMyYTI2NGIwMzFkNjkzMzUxNDk3MGM0M2RlYTQxOTFm.apk',
'malware': {'rank': 'UNKNOWN'}},
'stats': {'downloads': 432, 'pdownloads': 452, 'rating': {'avg': 0, 'total': 0},
'prating': {'avg': 0, 'total': 0}}, 'has_versions': False, 'obb': None,
'xxxxx': {'advertising': False, 'billing': False}}]}}
But I want it to look like this:
>>> import json
>>> a={"some":"json", "a":{"b":[1,2,3,4]}}
>>> print(json.dumps(a, indent=4, sort_keys=True))
{
"a": {
"b": [
1,
2,
3,
4
]
},
"some": "json"
}