Transform text into key value pairs - python

Given text like this:
"$key\n
some value\n
$another_key\n
another longer\n
value\n"
# continue this trend with keys on lines marked by special symbol in this case $
# with corresponding value on the following lines until we hit another key
What would be nice and terse way to transform that into lists like this
keys = ["$key", "$another_key"]
values = ["some value", "another longervalue"]

You can use the $ at the start of a line to identify this is a new key, append it to the keys list and append a new blank string to the values list. Then every time you have a line which doesnt start with a $ you concat that value on to the last element of values as this line must be related to the current key. only when you read a new key do you create a new blank values element.
data = "$key\nsome value\n$another_key\nanother longer\nvalue\n"
keys = []
values = []
for line in data.split('\n'):
if line.startswith('$'):
keys.append(line)
values.append("")
else:
values[-1] += line
print(keys, values)
Output
['$key', '$another_key'] ['some value', 'another longervalue']

text = "$key\nsome value\n$another_key\nanother longer\nvalue"
key = []
value = []
for i in text.split('\n'):
if i[0]=='$':
key.append(i)
else:
value.append(i)

Try this ( assuming text is the variable with text values in it ):
all_text = text.split('\n')
val_tup = [ (val, all_text[i+1]) for i,val in enumerate(all_text) if val.startswith('$') ]
keys, values = [ val[0] for val in val_tup ], [ val[1] for val in val_tup ]

May be like this?
from collections import defaultdict
text = """$key\n
some value\n
$another_key\n
another longer\n
value\n"""
def get_pairs(s):
pairs=defaultdict(str)
current_key=None
for line in s.splitlines():
if line.startswith('$'):
current_key=line
continue
pairs[current_key]+=line
return pairs
get values you need with pairs.keys() and pairs.values()
BTW, the \n is redundant, '''string''' will have \n in it.

s = """$key\n
some value\n
$another_key\n
another longer\n
value\n"""
s = s.split("\n")[:-1] # we know the string ednwith \n
keys, values = [], []
for i in s:
if i[0] == "$" and i[1] !="$":
keys.append(i)
else:
values.append(i)

Related

Convert list of lists with strings into list of dictionaries

I'm trying to convert a list of lists with strings like:
[
["amenity=language_school"],
["amenity=sport_school,place=rural", "amenity=sport_school,place=urban"],
["amenity=middle_school,place=city", "amenity=high_school,place=city"]
]
Some lists can have multiple string elements, and some of the string elements can have multiple key:values separated by a , like "amenity=high_school,place=city".
My goal is to get a list of dicts, in which the key of each dict could append in list several values from the same key. Like this:
[
{"amenity":"language_school"},
{"amenity":"sport_school", "place":["rural","urban"]},
{"amenity":["middle_school", "high_school"], "place":"city"}
]
This code works for you. Just if you want any list with just one member to become converted to a simple String it needs to add one line code to it.
Good wishes
output_list = []
for each_row in [
["amenity=language_school"],
["amenity=sport_school,place=rural", "amenity=sport_school,place=urban"],
["amenity=middle_school,place=city", "amenity=high_school,place=city"]
]:
output_list.append(dict())
for each_element in each_row:
for each_def in each_element.split(','):
key, value = each_def.split('=')
if key in output_list[-1]:
if value not in output_list[-1][key]:
output_list[-1][key].append(value)
else:
output_list[-1][key] = [value]
print(output_list)
The output:
[{'amenity': ['language_school']}, {'amenity': ['sport_school'], 'place': ['rural', 'urban']}, {'amenity': ['middle_school', 'high_school'], 'place': ['city']}]
And this is an alternative way with the same output:
output_list = []
for each_row in [
["amenity=language_school"],
["amenity=sport_school,place=rural", "amenity=sport_school,place=urban"],
["amenity=middle_school,place=city", "amenity=high_school,place=city"]
]:
output_list.append(dict())
for each_element in each_row:
for each_def in each_element.split(','):
key, value = each_def.split('=')
content = output_list[-1].get(key, [])
output_list[-1][key] = content + ([value] if value not in content else [])
print(output_list)

how to make a dictionary of inputs

these are the inputs:
name:SignalsAndSystems genre:engineering author:Oppenheim
name:calculus genre:mathematics author:Thomas
name:DigitalSignalProcessing genre:engineering author:Oppenheim
and I tried to make dictionaries of each line separated by ":" for example name:SignalsAndSystems.
this is my code but the code makes dictionaries only from the first line of the inputs.
lst_inps = []
for i in range(2):
inp = input()
inp = inp.split(" ")
for item in inp:
attribute, value = item.split(":")
dict.update({attribute: value})
lst_inps.append(dict)
the answer that I'm looking for is:
[
{"name":"SignalsAndSystems", "genre":"engineering", "author":"Oppenheim"} ,
{"name":"calculus", "genre":"mathematics", "author":"Thomas"} ,
{"name":"DigitalSignalProcessing", "genre":"engineering", "author":"Oppenheim"}
]
You aren't creating a dictionary in your for loop. You need to create a dictionary, then update it with your new key value pairs, before appending it to your list.
lst_inps = []
for i in range(3):
new_dict = dict() # create the dictionary here
inp = input()
inp = inp.split(" ")
for item in inp:
attribute, value = item.split(":")
new_dict.update({attribute: value}) # add your key value pairs to the dictionary
lst_inps.append(new_dict) # append your new dictionary to the list
print(lst_inps)

How can I get array values from key in json file with python?

Here is my JSON file:
{"name": "Nghia", "name2": ["Bao", "Tam"]}
My Python code:
file = open(jsonfile, 'r')
data = json.load(file)
key = list(data.keys())
value = list(data.values())
print(key[value.index('Nghia')])
Output: name
But the issue is I can't use key[value.index('Bao')] or key[value.index('Tam')] to get name2
The issue is that you're trying to match a string to a list, which will of course not match. If you must keep the structure of your data, you need to explicitly check for both strings matching or the list containing the string. For example:
data = {"name": "Nghia", "name2": ["Bao", "Tam"]}
search_term = "Bao"
for k, v in data.items():
if v == search_term or search_term in v:
print("Found in " + k)
else :
print("Not found in " + k)
which will output
Not found in name
Found in name2
In simple terms, the index() method finds the given element in a list and returns its position.
So here,
key = list(data.keys()) // ['name', 'name2']
value = list(data.values()) // ['Nghia', ['Bao', 'Tam']]
Now this code
print(key[value.index('Nghia')])
finds the element from this list ['Nghia', ['Bao', 'Tam']] and returns the index of that element and print it.
So above you can see in the second element of the list ['Nghia', ['Bao', 'Tam']], we have an array as element ['Bao', 'Tam']
So in order to find the the index of that element in the value list, you have to use this
print(key[value.index(['Bao', 'Tam'])])
here is the function which returns the index of an element if the element in the list is an array or string
Function takes two arguments value and item,
value is a List
item is the element of list which index needs to be find
Function returns -1 if its not found any item in the list
def findIndex(value,item):
for elementIndex in range(0, len(value)):
if type(value[elementIndex]) is list:
for itemElemIndex in range(0 ,len(value[elementIndex])):
if value[elementIndex][itemElemIndex] == item:
return elementIndex
else:
if value[elementIndex] == item:
return elementIndex
return -1
print(key[findIndex(value,"Tam")])

Using a list of lists as a lookup table and updating a value in new list of lists

I have an application that creates a list of lists. The second element in the list needs to be assigned using lookup list which also consists of a list of lists.
I have used the "all" method to match the values in the list. If the list value exists in the lookup list, it should update the second position element in the new list. However this is not the case. The == comparative yields a False match for all elements, even though they all exist in both lists.
I have also tried various combinations of index finding commands but they are not able to unpack the values of each list.
My code is below. The goal is to replace the "xxx" values in the newData with the numbers in the lookupList.
lookupList= [['Garry','34'],['Simon', '24'] ,['Louise','13'] ]
newData = [['Louise','xxx'],['Garry', 'xxx'] ,['Simon','xxx'] ]
#Matching values
for i in newData:
if (all(i[0] == elem[0] for elem in lookupList)):
i[1] = elem[1]
You can't do what you want with all(), because elem is not a local variable outside of the generator expression.
Instead of using a list, use a dictionary to store the lookupList:
lookupDict = dict(lookupList)
and looking up matches is a simple constant-time (fast) lookup:
for entry in newData:
if entry[0] in lookupDict:
entry[1] = lookupDict[entry[0]]
you should use dictionaries instead, like this:
lookupList = newData = {}
old_lookupList = [['Garry','34'],['Simon', '24'] ,['Louise','13'] ]
old_newData = [['Louise','xxx'],['Garry', 'xxx'] ,['Simon','xxx'] ]
#convert into dictionary
for e in old_newData: newData[e[0]] = e[1]
for e in old_lookupList: lookupList[e[0]] = e[1]
#Matching values
for key in lookupList:
if key in newData.keys():
newData[key]=lookupList[key]
#convert into list
output_list = []
for x in newData:
output_list.append([x, newData[x]])
I like the following code since it can be tweaked and used in different ways:
lookupList= [ ['Garry', '34'],['Simon', '24'] ,['Louise', '13'] ]
newData = [ ['Louise', 'xxx'],['Garry', 'xxx'], ['Peter', 'xxx'] ,['Simon', 'xxx'] ]
#Matching values
for R in newData:
for i in range(0, len(lookupList) + 1):
try:
if lookupList[i][0] == R[0]:
R[1] = lookupList[i][1]
break
except:
print('Lookup fail on record:', R)
print(newData)

Python: Sum entries in list of tuples entries with case sensitive keys?

I have a list of tuples holding hashtags and frequencies for example:
[('#Example', 92002),
('#example', 65544)]
I want to sum entries which have have the same string as the first entry in the tuple (but a different case-sensitive version), keeping the first entry with the highest value in the second entry. The above would be transformed to:
[('#Example', 157,546)]
I've tried this so far:
import operator
for hashtag in hashtag_freq_list:
if hashtag[0].lower() not in [res_entry[0].lower() for res_entry in res]:
entries = [entry for entry in hashtag_freq_list if hashtag[0].lower() == entry[0].lower()]
k = max(entries,key=operator.itemgetter(1))[0]
v = sum([entry[1] for entry in entries])
res.append((k,v))
I was just wondering if this could be approached in a more elegant way?
I would use dictionary
data = [('#example', 65544),('#Example', 92002)]
hashtable = {}
for i in data:
# See if this thing exists regardless of casing
if i[0].lower() not in hashtable:
# Create a dictionary
hashtable[i[0].lower()] = {
'meta':'',
'value':[]
}
# Copy the relevant information
hashtable[i[0].lower()]['value'].append(i[1])
hashtable[i[0].lower()]['meta'] = i[0]
# If the value exists
else:
# Check if the number it holds is the max against
# what was collected so far. If so, change meta
if i[1] > max(hashtable[i[0].lower()]['value']):
hashtable[i[0].lower()]['meta'] = i[0]
# Append the value regardless
hashtable[i[0].lower()]['value'].append(i[1])
# For output purposes
myList = []
# Build the tuples
for node in hashtable:
myList.append((hashtable[node]['meta'],sum(hashtable[node]['value'])))
# Voila!
print myList
# [('#Example', 157546)]

Categories