String matching and storing within a dictionary - python

I'm using pattern matching to collect the postcodes belonging to a street address and storing these addresses as values within a dictionary, here is what I have tried:
test = pd.DataFrame(['SR2', 'SA1', 'M16', 'KY6', 'SR6'], columns=(['postcode']))
street = pd.DataFrame(['UnnamedRoad,LlandeiloSA196UA,UK', '8NewRd,LlandeiloSA196DB,UK','1RomanRd,Banwen,NeathSA109LH,UK', 'UnnamedRoad,LlangadogSA199UN,UK', '48ColeAve,ChadwellStMary,GraysRM164JQ,UK', '37WellingtonRd,NorthWealdBassett,EppingCM166JY,UK'], columns=(['address']))
dictframe = {}
for i in test['postcode']:
dictframe[i] = list()
for k in range(0, len(test), 1):
dictframe[i].append(list(filter(lambda x: test['postcode'][k] in x, street['address'])))
However this prints all the outputs in each key, but I wanted only for where values appear to be within the key otherwise keep the list empty if nothing match. Here's the output I get:
{'SR2': [[],
['UnnamedRoad,LlandeiloSA196UA,UK',
'8NewRd,LlandeiloSA196DB,UK',
'1RomanRd,Banwen,NeathSA109LH,UK',
'UnnamedRoad,LlangadogSA199UN,UK'],
['48ColeAve,ChadwellStMary,GraysRM164JQ,UK',
'37WellingtonRd,NorthWealdBassett,EppingCM166JY,UK'],
[],
[]],
..
..
..
Expected output:
{'SR2': [],
'SA1': ['UnnamedRoad,LlandeiloSA196UA,UK',
'8NewRd,LlandeiloSA196DB,UK',
'1RomanRd,Banwen,NeathSA109LH,UK',
'UnnamedRoad,LlangadogSA199UN,UK']
...
...
}

corrected code - inner for loop is not required & in the string matching the index of test['postcode'] needs to be used, refer Python enumerate
import pandas as pd
test = pd.DataFrame(['SR2', 'SA1', 'M16', 'KY6', 'SR6'], columns=(['postcode']))
street = pd.DataFrame(['UnnamedRoad,LlandeiloSA196UA,UK', '8NewRd,LlandeiloSA196DB,UK','1RomanRd,Banwen,NeathSA109LH,UK', 'UnnamedRoad,LlangadogSA199UN,UK', '48ColeAve,ChadwellStMary,GraysRM164JQ,UK', '37WellingtonRd,NorthWealdBassett,EppingCM166JY,UK'], columns=(['address']))
dictframe = {}
for index, i in enumerate(test['postcode']):
dictframe[i] = list()
#for k in range(0, len(street), 1):
dictframe[i].append(list(filter(lambda x: test['postcode'][index] in x, street['address'])))
Output-
{'KY6': [[]],
'M16': [['48ColeAve,ChadwellStMary,GraysRM164JQ,UK',
'37WellingtonRd,NorthWealdBassett,EppingCM166JY,UK']],
'SA1': [['UnnamedRoad,LlandeiloSA196UA,UK',
'8NewRd,LlandeiloSA196DB,UK',
'1RomanRd,Banwen,NeathSA109LH,UK',
'UnnamedRoad,LlangadogSA199UN,UK']],
'SR2': [[]],
'SR6': [[]]}

Related

python edit tuple duplicates in a list

my target is:
while for looping a list I would like to check for duplicates and if there are some i would like to append a number to it see following example
my list output as an example:
[('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
in a loop i would like to edit those duplicates so instead of the 2nd microsoft i would like to have microsoft1 (if there would be 3 microsoft guys so the third guy would have microsoft2)
with this i can filter the duplicates but i dont know how to edit them directly in the list
list = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
names = []
double = []
for u in list[1:]:
names.append(u[1])
list_size = len(names)
for i in range(list_size):
k = i + 1
for j in range(k, list_size):
if names[i] == names[j] and names[i] not in double:
double.append(names[i])
This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
lst = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
seen = defaultdict(int)
result = []
for k, v in lst:
if seen[v]:
result.append((k, "{}_{}".format(v, seen[v])))
else:
result.append((k,v))
seen[v] += 1
print(result)
Output:
[('name', 'company'),
('someguy', 'microsoft'),
('anotherguy', 'microsoft_1'),
('thirdguy', 'amazon')]

defining multiple variables to an empty list in a loop

I am trying to create and assign 10 variables, only differenciated by their index, all as empty lists within a for loop.
The ideal output would be to have agent_1 = [], agent_2 = [], agent_n = []
I know I could write this all out but thought I should be able to create a simple loop. The main issue is assigning the empty list over each iteration
for i in range(1,10):
agent_ + i = []
Why don't you use dict object with keys equal to agent_i.
dic = {}
for i in range(1,10):
dic["agent_" + str(i)] = []
// access the dic directly and iterating purpose also just iterate through the dictionary.
print dic["agent_1"]
# iteration over the dictionary
for key,value in dic.items():
print key,value
Here is the link to code snippet
This is a horrible idea. I will let the code speak for itself:
n = 10
for i in range(n):
globals()['agent_%d' % (i + 1)] = []
a = {}
for i in xrange(10):
ab = "{}_{}".format("agent", i)
a[ab] = []
print a
#OP
{'agent_0': [], 'agent_1': [], 'agent_2': [], 'agent_3': [], 'agent_4': [], 'agent_5': [], 'agent_6': [], 'agent_7': [], 'agent_8': [], 'agent_9': []}

Renaming points in python

I have a collection of new points i,j,k,l with their coordinates (1953.2343076828638, 730.0513627132909), (1069.4232335022705, 5882.057343563125),(2212.5767664977293, 3335.942656436875),(4386.765692317136, 1318.948637286709).
I'm trying to give these points some names as s1,s2,s3,s4.
Also, create two separate lists one with just the point name [s1,s2,s3,s4] and the other one with point name and its coordinate as [s1:(1953.2343076828638, 730.0513627132909),(1069.4232335022705, 5882.057343563125)...]
I have the following code for creating random points.
n = 10
#print(n)
#for k in n:
V = []
V=range(n)
#print("vertices",V)
# Create n random points
random.seed()
pos = {i:(random.randint(0,4000),random.randint(0,5000)) for i in V}
#print("pos =", pos)
points = []
positions = []
for i in pos:
points.append(pos[i])
positions.append(i)
positions.append(pos[i])
Suppose I am forming a new list L with two existing points 4 and 7.Then, L = [4,7]
When I type L[0] in the console it gives me, 4 and pos[L[0]] gives me its coordinates.
But considering my new list K= [i,j,k,l], when I type K[0] in the console it gives me the coordinate, but not its name.
I need to add these points in K to the same list as pos defined above with their coordinates, but with different names. Can someone please help me with this?
To access name and coordinates by index, use a list of tuples. Note that you need to name these explicitly. You should preferably avoid this step by using a list of tuples to store your name-coordinate pairs from the beginning.
To access by name, use a dictionary.
i, j, k, l = (1953.2343076828638, 730.0513627132909),\
(1069.4232335022705, 5882.057343563125),\
(2212.5767664977293, 3335.942656436875),\
(4386.765692317136, 1318.948637286709)
K = [(name, var) for name, var in zip('ijkl', (i, j, k, l))]
## ACCESS BY INDEX
name_coord = K[0] # ('i', (1953.2343076828638, 730.0513627132909))
name = K[0][0] # 'i'
coord = K[0][1] # (1953.2343076828638, 730.0513627132909)
## ACCESS BY NAME
d = dict(K)
coord = d['i'] # (1953.2343076828638, 730.0513627132909)
Based on #jpp answer I would go for thing named namedtuple. Dictionaries usually tend to take more space than tuples.
from collections import namedtuple
coord = namedtuple('Coordinate', 'name coord')
i, j, k, l = (1953.2343076828638, 730.0513627132909),\
(1069.4232335022705, 5882.057343563125),\
(2212.5767664977293, 3335.942656436875),\
(4386.765692317136, 1318.948637286709)
K = [coord(name, var) for name, var in zip('ijkl', (i, j, k, l))]
It allows you to do then:
c = K[0]
print(c.name)
print(c.coord)

List of dicts: Getting list of matching dictionary based on id

I'm trying to get the matching IDs and store the data into one list. I have a list of dictionaries:
list = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
Expected output would be something like
# {'id':'123','name':'Jason','location': ['McHale', 'Tompson Hall']},
# {'id':'432','name':'Tom','location': 'Sydney'},
How can I get matching data based on dict ID value? I've tried:
for item in mylist:
list2 = []
row = any(list['id'] == list.id for id in list)
list2.append(row)
This doesn't work (it throws: TypeError: tuple indices must be integers or slices, not str). How can I get all items with the same ID and store into one dict?
First, you're iterating through the list of dictionaries in your for loop, but never referencing the dictionaries, which you're storing in item. I think when you wrote list[id] you mean item[id].
Second, any() returns a boolean (true or false), which isn't what you want. Instead, maybe try row = [dic for dic in list if dic['id'] == item['id']]
Third, if you define list2 within your for loop, it will go away every iteration. Move list2 = [] before the for loop.
That should give you a good start. Remember that row is just a list of all dictionaries that have the same id.
I would use kdopen's approach along with a merging method after converting the dictionary entries I expect to become lists into lists. Of course if you want to avoid redundancy then make them sets.
mylist = [
{'id':'123','name':['Jason'],'location': ['McHale']},
{'id':'432','name':['Tom'],'location': ['Sydney']},
{'id':'123','name':['Jason'],'location':['Tompson Hall']}
]
def merge(mylist,ID):
matches = [d for d in mylist if d['id']== ID]
shell = {'id':ID,'name':[],'location':[]}
for m in matches:
shell['name']+=m['name']
shell['location']+=m['location']
mylist.remove(m)
mylist.append(shell)
return mylist
updated_list = merge(mylist,'123')
Given this input
mylist = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
You can just extract it with a comprehension
matched = [d for d in mylist if d['id'] == '123']
Then you want to merge the locations. Assuming matched is not empty
final = matched[0]
final['location'] = [d['location'] for d in matched]
Here it is in the interpreter
In [1]: mylist = [
...: {'id':'123','name':'Jason','location': 'McHale'},
...: {'id':'432','name':'Tom','location': 'Sydney'},
...: {'id':'123','name':'Jason','location':'Tompson Hall'}
...: ]
In [2]: matched = [d for d in mylist if d['id'] == '123']
In [3]: final=matched[0]
In [4]: final['location'] = [d['location'] for d in matched]
In [5]: final
Out[5]: {'id': '123', 'location': ['McHale', 'Tompson Hall'], 'name': 'Jason'}
Obviously, you'd want to replace '123' with a variable holding the desired id value.
Wrapping it all up in a function:
def merge_all(df):
ids = {d['id'] for d in df}
result = []
for id in ids:
matches = [d for d in df if d['id'] == id]
combined = matches[0]
combined['location'] = [d['location'] for d in matches]
result.append(combined)
return result
Also, please don't use list as a variable name. It shadows the builtin list class.

Create a list dynamically and store all the values matching with current value in python 3.x

I have a text file which has data created dynamically like
1000L 00V
2000L -10V
3500L -15V
1250L -05V
1000L -05V
2000L -05V
6000L -10V
1010L 00V
and so on...
The numbers before V could vary from -160 to +160
I want to create a list (not using dictionary) dynamically and store the values in a list according to the matching numbers before V
In this case I want to create sets of list as follows
00 = ["1000", "1010"]
-10 = ["2000", "6000"]
-15 = ["3500"]
-05 = ["1250", "1000", "2000"]
Tried code:
if name.split()[1] != "":
gain_value = name.split()[1]
gain_value = int(gain_value.replace("V", ""))
if gain_value not in gain_list:
gain_list.append(gain_value)
gain_length = len(gain_list)
print(gain_length)
g['gain_{0}'.format(gain_length)] = []
'gain_{0}'.format(gain_length).append(L_value)
else:
index_value = gain_list.index(gain_value)
g[index_value].append(L_value)
for x in range(0, len(gain_list)):
print(str(gain_list[x]) + "=" + 'gain_{0}'.format(x))
But the above code doesn't work as I get an error while appending 'gain_{0}'.format(gain_length).append(L_value) and I am unsure how to print the list dynamically after its created as mentioned in my required output.
I can't use dictionary for the above method because I want to give the lists dynamically as input to pygal module as below:
as I need the output for pygal module as input like :
for x in range(0, gain_length):
bar_chart.x_labels = k_list
bar_chart.add(str(gain_length[x]),'gain_{0}'.format(x))
Here I can add the values only from a list not from a dictionary
you can use collections.defaultdict:
import collections
my_dict = collection.defaultdict(list)
with open('your_file') as f:
for x in f:
x = x.strip().split()
my_dict[x[1][:-1]].append(x[0])
output:
defaultdict(<type 'list'>, { '00': ["1000", "1010"],
'-10':["2000", "6000"],
'-15': ["3500"],
'-05': ["1250", "1000", "2000"]})
for your desired output:
for x,y in my_dict.items():
print "{} = {}".format(x,y)

Categories