parsing a line into a dictionary through a list comprehension - python

I am trying to slice different parts of a line into a list of dictionaries using a list comprehension. The code below doesn't work, but it illustrates what I am trying to do. Any help would be much appreciated!
Thanks
def getDataElements(self):
return [x for x for line in self.data: {"Number": line[0:9],
"FullName": line[9:27].rstrip(),
"LastName": line[27:63].rstrip(),
"Area": line[63:65].rstrip(),
"City": line[65:90].rstrip(),
"Status": line[91],
"Status2": line[92],
"Status3": line[93]]

You were somewhat clear, but you have to put the dictionary in the beginning...if I fully understand what you want, the following should work:
return [{"Number": line[0:9],"FullName": line[9:27].rstrip(),"LastName": line[27:63].rstrip(),"Area": line[63:65].rstrip(),"City": line[65:90].rstrip(),"Status": line[91],"Status2": line[92],"Status3": line[93]} for line in self.data]
unless there is some extra level of nesting because you say x for x for line yet you don't use x so I ignored it in that manner. Let me know if that was incorrect, and if so explain in a bit more detail please!

There are instances where list comprehensions are good, but this is not one of them. Just use a loop and a generator:
for line in self.data:
yield {
"Number": line[0:9],
"FullName": line[9:27].rstrip(),
"LastName": line[27:63].rstrip(),
"Area": line[63:65].rstrip(),
"City": line[65:90].rstrip(),
"Status": line[91],
"Status2": line[92],
"Status3": line[93]
}
If you absolutely need to return a list, pass the output through list():
output_list = list(self.getDataElements())
If you're not comfortable with that, there's always the append-to-a-list way:
people = []
for line in self.data:
people.append({
"Number": line[0:9],
"FullName": line[9:27].rstrip(),
"LastName": line[27:63].rstrip(),
"Area": line[63:65].rstrip(),
"City": line[65:90].rstrip(),
"Status": line[91],
"Status2": line[92],
"Status3": line[93]
})
return people

First write a function that parses a line and returns the corresponding dict:
def parseDataLine(self, line):
return { ... } # Same as your parsing code.
The rest of your code would be like this:
def getDataElements(self):
return [self.parseDataLine(line) for line in self.data]
This type of approach keeps everything very readable and simple.

Related

Python parse JSON file

{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information",
"Microsoft.AspNetCore": "Warning",
"System.Net.Http.HttpClient.Default.ClientHandler": "Warning",
"System.Net.Http.HttpClient.Default.LogicalHandler": "Warning"
}
},
"AllowedHosts": "*",
"AutomaticTransferOptions": {
"DateOffsetForDirectoriesInDays": -1,
"DateOffsetForPortfoliosInDays": -3,
"Clause": {
"Item1": "1"
}
},
"Authentication": {
"ApiKeys": [
{
"Key": "AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6",
"OwnerName": "Cron job",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestManager"
}
]
},
{
"Key": "B11D4F27-483A-4234-8EC7-CA121712D5BE",
"OwnerName": "Test admin",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestAdmin"
},
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
},
{
"Key": "EBF98F2E-555E-4E66-9D77-5667E0AA1B54",
"OwnerName": "Test manager",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
}
],
"LDAP": {
"Domain": "domain.local",
"MachineAccountName": "Soft13",
"MachineAccountPassword": "vixuUEY7884*",
"EnableLdapClaimResolution": true
}
},
"Authorization": {
"Permissions": {
"Roles": [
{
"Role": "TestAdmin",
"Permissions": [
"transfers.create",
"bindings.create"
]
},
{
"Role": "TestManager",
"Permissions": [
"transfers.create"
]
}
]
}
}
}
I have JSON above and need to parse it with output like this
Logging__LogLevel__Default
Authentication__ApiKeys__0__Claims__0__Type
Everything is ok, but I always get some strings with this output
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Authorization__Permissions__Roles__1
Why does my code adds not full strings like
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__1
And why it doesn't print every value from
Authorization__Permissions__Roles__0__Permissions__*
and from
Authorization__Permissions__Roles__1__Permissions__*
I have this code in python3:
def checkdepth(sub_key, variable):
delmt = '__'
for item in sub_key:
try:
if isinstance(sub_key[item], dict):
sub_variable = variable + delmt + item
checkdepth(sub_key[item], sub_variable)
except TypeError:
continue
if isinstance(sub_key[item], list):
sub_variable = variable + delmt + item
for it in sub_key[item]:
sub_variable = variable + delmt + item + delmt + str(sub_key[item].index(it))
checkdepth(it, sub_variable)
print(sub_variable)
if isinstance(sub_key[item], int) or isinstance(sub_key[item], str):
sub_variable = variable + delmt + item
print (sub_variable)
for key in data:
if type(data[key]) is str:
print(key + '=' +str(data[key]))
else:
variable = key
checkdepth(data[key], variable)
I know that the problem in block where I process list data type, but I don't know where is the problem exactly
Use a recursive generator:
import json
with open('input.json') as f:
data = json.load(f)
def strkeys(data):
if isinstance(data,dict):
for k,v in data.items():
for item in strkeys(v):
yield f'{k}__{item}' if item else k
elif isinstance(data,list):
for i,v in enumerate(data):
for item in strkeys(v):
yield f'{i}__{item}' if item else str(i)
else:
yield None # termination condition, not a list or dict
for s in strkeys(data):
print(s)
Output:
Logging__LogLevel__Default
Logging__LogLevel__Microsoft
Logging__LogLevel__Microsoft.Hosting.Lifetime
Logging__LogLevel__Microsoft.AspNetCore
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler
AllowedHosts
AutomaticTransferOptions__DateOffsetForDirectoriesInDays
AutomaticTransferOptions__DateOffsetForPortfoliosInDays
AutomaticTransferOptions__Clause__Item1
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__1__Key
Authentication__ApiKeys__1__OwnerName
Authentication__ApiKeys__1__Claims__0__Type
Authentication__ApiKeys__1__Claims__0__Value
Authentication__ApiKeys__1__Claims__1__Type
Authentication__ApiKeys__1__Claims__1__Value
Authentication__ApiKeys__2__Key
Authentication__ApiKeys__2__OwnerName
Authentication__ApiKeys__2__Claims__0__Type
Authentication__ApiKeys__2__Claims__0__Value
Authentication__LDAP__Domain
Authentication__LDAP__MachineAccountName
Authentication__LDAP__MachineAccountPassword
Authentication__LDAP__EnableLdapClaimResolution
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__0
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Using json_flatten this can be converted to pandas, but it's not clear if that's what you want. Also, when you do convert it can use df.iloc[0] to see why each column is being provided (ie you see the value for that key).
Note: you need to pass a list so I just wrapped your json above in [].
# https://github.com/amirziai/flatten
dic = your json from above
dic =[dic] # put it in a list
dic_flattened = (flatten(d, '__') for d in dic) # add your delimiter
df = pd.DataFrame(dic_flattened)
df.iloc[0]
Logging__LogLevel__Default Information
Logging__LogLevel__Microsoft Warning
Logging__LogLevel__Microsoft.Hosting.Lifetime Information
Logging__LogLevel__Microsoft.AspNetCore Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler Warning
AllowedHosts *
AutomaticTransferOptions__DateOffsetForDirectoriesInDays -1
AutomaticTransferOptions__DateOffsetForPortfoliosInDays -3
AutomaticTransferOptions__Clause__Item1 1
Authentication__ApiKeys__0__Key AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6
Authentication__ApiKeys__0__OwnerName Cron job
Authentication__ApiKeys__0__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__0__Claims__0__Value StressTestManager
Authentication__ApiKeys__1__Key B11D4F27-483A-4234-8EC7-CA121712D5BE
Authentication__ApiKeys__1__OwnerName Test admin
Authentication__ApiKeys__1__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__0__Value StressTestAdmin
Authentication__ApiKeys__1__Claims__1__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__1__Value TestManager
Authentication__ApiKeys__2__Key EBF98F2E-555E-4E66-9D77-5667E0AA1B54
Authentication__ApiKeys__2__OwnerName Test manager
Authentication__ApiKeys__2__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__2__Claims__0__Value TestManager
Authentication__LDAP__Domain domain.local
Authentication__LDAP__MachineAccountName Soft13
Authentication__LDAP__MachineAccountPassword vixuUEY7884*
Authentication__LDAP__EnableLdapClaimResolution true
Authorization__Permissions__Roles__0__Role TestAdmin
Authorization__Permissions__Roles__0__Permissions__0 transfers.create
Authorization__Permissions__Roles__0__Permissions__1 bindings.create
Authorization__Permissions__Roles__1__Role TestManager
Authorization__Permissions__Roles__1__Permissions__0 transfers.create
Ok, I looked at your code and it's hard to follow. You're variable and function names are not easy to understand their purpose. Which is fine cause everyone has to learn best practice and all the little tips and tricks in python. So hopefully I can help you out.
You have a recursive-ish function. Which is definingly the best way to handle a situation like this. However your code is part recursive and part not. If you go recursive to solve a problem you have to go 100% recursive.
Also the only time you should print in a recursive function is for debugging. Recursive functions should have an object that is passed down the function and gets appended to or altered and then passed back once it gets to the end of the recursion.
When you get a problem like this, think about which data you actually need or care about. In this problem we don't care about the values that are stored in the object, we just care about the keys. So we should write code that doesn't even bother looking at the value of something except to determine its type.
Here is some code I wrote up that should work for what you're wanting to do. But take note that because I did purely a recursive function my code base is small. Also my function uses a list that is passed around and added to and then at the end I return it so that we can use it for whatever we need. If you have questions just comment on this question and I'll answer the best I can.
def convert_to_delimited_keys(obj, parent_key='', delimiter='__', keys_list=None):
if keys_list is None: keys_list = []
if isinstance(obj, dict):
for k in obj:
convert_to_delimited_keys(obj[k], delimiter.join((parent_key, str(k))), delimiter, keys_list)
elif isinstance(obj, list):
for i, _ in enumerate(obj):
convert_to_delimited_keys(obj[i], delimiter.join((parent_key, str(i))), delimiter, keys_list)
else:
# Append to list, but remove the leading delimiter due to string.join
keys_list.append(parent_key[len(delimiter):])
return keys_list
for item in convert_to_delimited_keys(data):
print(item)

How to search for values in Redis with redis for python

I'm currently trying to search for entries inside my Redis that match a specific value through an HTTP API the same way you'd do with a regular DB (eg: http://localhost:8000/api?name=John&age=20) with the redis python library (https://pypi.org/project/redis/).
The code I have thus far returns the whole hash, converts each entry into a JSON and adds it to a list
import json
import redis
import os
r = redis.StrictRedis(host=os.environ['redis_url'], port=os.environ['redis_port'], password=os.environ['redis_pass'], ssl=True)
result = r.hgetall('Directory')
dic_list = []
for key in result.keys():
dic_list.append(json.loads(result[key].decode('utf-8')))
return dic_list
I know that I can get the value of a specific key with
r.hget('Directory', 'key_I_want')
However inside each key there is a whole JSON full with information, so for example this would be a key, value example inside of the Directory hash
"1": {
"name": "James",
"age": "22",
"favorite_color":"Green"
},
"2":{
"name":"John",
"age": "20",
"favorite_color": "red"
},
"3":{
"name":"Jim",
"age": "30",
"favorite_color": "yellow"
}
So I know
r.hget('Directory', '1')
would return
{
"name": "James",
"age": "22",
"favorite_color":"Green"
}
But what I really want is to look for every JSON that has specific values, not just to get the values of each key inside the hash, is there any way to actually do that?
Based on your question, you are, maybe, looking for a value within results[key]. Assumin that value is equal to val, try:
for key in result.keys():
if val in result[key].values():
dic_list.append(json.loads(result[key].decode('utf-8')))
for example, if
val = "James"
Yoy will get all results with James in the value.
You can mix it up a little, change it the way you want it.

Accessing Nested Dict from JSON

I'm using requests and JSON to pull some data from an API, and I'm struggling with using a nested dict.
Here is the JSON data:
{"data": [
{
"ContactId": "123",
"EmailAddress": "abc#xyz.com",
"FirstName": null,
"LastName": null,
"ClickDate": "6/6/1966",
"Clicks": "5",
"IPAddress": "1.1.1.1.1",
"UserAgent": "IE8.0",
"UniqueLinksClicked": [
{
"LinkURL": "http://link1.com",
"LinkURL": "http://link2.com",
"LinkURL": "http://link3.com"
}
]
}
]}
I'm able to access all of the ContactID and other 1st level stuff fine, but I can't figure out how to traverse the "LinkURL" stuff.
Here is my python...
result = requests.get(requesturl, headers=headers)
jdata = json.loads(result.content)
for result in jdata["data"]:
contactID = str([(result["ContactId"])])
for result in jdata["data"]["UniqueLinksClicked"]: #I'm doing this wrong, but I'm not sure how.
print(ContactID + " " + str([(result["LinkURL"])]))
The line marked with a comment above generates a TypeError indicating it's a list, where I expected it to be a dict:
list indices must be integers or slices, not str
If instead I drop the ["data"] dereference and try to access "UniqueLinksClicked" on jdata:
for link in jdata["UniqueLinksClicked"]:
I get a key error because the ["UniqueLinksClicked"] is an item inside of the ["data"] dict.
How do I do this correctly?
You can iterate over the links in a nested loop. Do not use the same variable name result in two nested loops! Use a different variable name in the inner loop.
for link in result["UniqueLinksClicked"]:
print(ContactID, link["LinkURL"])
(Moved from question.)
[OP] was confused about the variable naming in the for variable1 in variable2["dict"]: portion. After some help from HÃ¥ken Lid, [they] figured it out.
It should look like this...
for item in jdata["data"]:
contactID = str([(item["ContactId"])])
print(contactID)
for link in item["UniqueLinksClicked"]:
print(link["LinkURL"])

List Indices in json in Python

I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None

python - Filter JSON to a new JSON object

I have a json object which I would like to filter for misspelled key name. So for the example below, I would like to have a json object without the misspelled test_name key. What is the easiest way to do this?
json_data = """{
"my_test": [{
"group_name": "group-A",
"results": [{
"test_name": "test1",
"time": "8.556",
"status": "pass"
}, {
"test_name": "test2",
"time": "45.909",
"status": "pass"
}, {
"test_nameZASSD": "test3",
"time": "9.383",
"status": "fail"
}]
}]
}"""
This is an online test, and looks like i'm not allowed to use jsonSchema.
So far my code looks like this:
if 'test_suites' in data:
for suites in data["test_suites"]:
if 'results' in suites and 'suite_name' in suites:
for result in suites["results"]:
if 'test_name' not in result or 'time' not in result or 'status' not in result:
result.clear()
else:
....
else:
print("Check 'suite_name' and/or 'results'")
else:
print("Check 'test_suites'")
It kind of works, but result.clear() leaves a empty {}, which get annoying later. What can I do here?
It looks like your data have a consistent schema. So I would try using json schema to solve your problem. With that you can set up a schema and only allow objects with certain key names.
If you just want to check if a certain key is in the dictionary and make sure that you only get the ones that are according to spec you could do something like this:
passed = []
for item in result:
if 'test_name' in item.keys():
passed.append(item)
But if you have a lot of different keys you need to check for it will become unwieldy. So for bigger projects I would say that json schema is the way to go.

Categories