I have this data.
CITY1 CITY2
A B
A C
A D
B C
B D
C D
How i can create dictionary looking like this from the above data
x={A:[B,C,D],
B:[A,C,D],
C:[A,B,D],
D:[A,B,C]
}
Thanks
Is it in a csv? It looks like, from the data you provide, you are doing an undirected graph. Assuming that the data is in some kind of "row" type format that you can loop through, (i.e. row[0] is the city1 value, and row[1] is the city2 value):
from collections import defaultdict
def make_graph(data):
graph = defaultdict(set)
for a, b in data:
graph[a].add(b)
graph[b].add(a) # delete this line if you want a directed graph
return graph
data = [
['A','B'],
['C','D'],
['A','C']
]
print make_graph(data)
I was trying to do it without any library import.
I made a simple dictionary first .
x={'A':['B','C','D'],'B':['C','D'],'C':['D']}
for i,j in x.items():
for p in j:
if p not in x.keys():
x[p]=[]
if p in x[i] and i not in x[p]:
x[p].append(i)
print x
{'A': ['B', 'C', 'D'], 'C': ['D', 'A', 'B'], 'B': ['C', 'D', 'A'], 'D': ['A', 'C', 'B']}
Related
I have a data frame like this:
2pair counts
'A','B','C','D' 5
'A','B','K','D' 3
'A','B','P','R' 2
'O','Y','C','D' 1
'O','Y','CL','lD' 4
I want to make a nested list, based on the first 2 elements. the first element is the first 2 letters and the rest is 2 other letter and counts column. For example, for the above data the result should be:
[
[
['A','B'],
[['C','D'],5],
[['K','D'],3],
['P','R'],2]
],
[
['O','Y'],
[['C','D'],1],
[['CL','lD'],4]
]
]
The following code does exactly what I want, but it is too slow. How can I make it faster?
pairs=[]
trans=[]
for i in range(df3.shape[0]):
if df3['2pair'].values[i].split(',')[:2] not in trans:
trans.append(df3['2pair'].values[i].split(',')[:2])
sub=[]
sub.append(df3['2pair'].values[i].split(',')[:2])
for j in range(df3.shape[0]):
if df3['2pair'].values[i].split(',')[:2]==df3['2pair'].values[j].split(',')[:2]:
sub.append([df3['2pair'].values[j].split(',')[2:],df3['counts'].values[j]])
pairs.append(sub)
Here's one way using str.split to split the strings in 2pair column; then use groupby.apply + to_dict to create the lists:
df[['head', 'tail']] = [[(*x[:2],), x[2:]] for x in df['2pair'].str.split(',')]
out = [[[*k]] + v for k,v in (df.groupby('head')[['tail','counts']]
.apply(lambda x: x.to_numpy().tolist()).to_dict()
.items())]
Output:
[[['A', 'B'], [['C', 'D'], 5], [['K', 'D'], 3], [['P', 'R'], 2]],
[['O', 'Y'], [['C', 'D'], 1], [['CL', 'lD'], 4]]]
I have a Pandas Dataframe :
A || B || C
x1 x [x,y]
x2 a [b,c,d]
and I am trying to make a dictionary to that looks like:
{x1: {B : x, c : [x,y]}, x2: {B: a, C:[b,c,d}}
I have tried the to_dict function but that changes the entire dataframe into a dictionary. I am kind of lost on how to iterate onto the first column and make it the key and the rest of the df a dictionary as the value of that key.
Try:
x = df.set_index("A").to_dict("index")
print(x)
Prints:
{'x1': {'B': 'x', 'C': ['x', 'y']}, 'x2': {'B': 'a', 'C': ['b', 'c', 'd']}}
I have a Pandas DataFrame with a large number of unique values. I would like to group these values with a more general column. By doing so I expect to add hierarchies to my data and thus make analysis easier.
One thing that worked was to copy the column and replaced the values as follows:
data.loc[data['new_col'].str.contains('string0|string1'), 'new_col']\
= 'substitution'
However, I am trying to find a way to reproduce this easily without adding a condition for each entry.
Also tried using without success using the following methods:
dict.items()
pd.df.replace()
Those attempts were futile for me.
I would like to hear your advice to know how to approach this.
import pandas as pd
# My DataFrame looks similar to this:
>>> df = pd.DataFrame({'A': ['a', 'w', 'c', 'd', 'z']})
# The dictionary were I store the generalization:
>>> subs = {'g1': ['a', 'b', 'c', 'd'],
... 'g2': ['w', 'x', 'y', 'z']}
>>> df
A H
0 a g1
1 w g2
2 c g1
3 d g1
4 z g2
create a new dict by swapping key with values of list. Next, map df.A with the swapped dict.
swap_dict = {x: k for k, v in d.items() for x in v}
Out[1054]:
{'a': 's1',
'b': 's1',
'c': 's1',
'd': 's1',
'w': 's2',
'x': 's2',
'y': 's2',
'z': 's2'}
df['H'] = df.A.map(swap_dict)
Out[1058]:
A H
0 a s1
1 w s2
2 c s1
3 d s1
4 z s2
Note: I directly use keys of your dict as values of H instead of g1, g2,.... because I think it is enough to identify each group of values. If you still want g1, g2,..., it is easy to accomplish. Just let me know.
I also named your dict as d in my code
I have a dictionary and want to remove certain values in bad_list from its value list, and return the remainder. Here is the code:
d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
bad_list = ['d','e']
ad = {k:d[k].remove(i) for k in d.keys() for sublist in d[k] for i in sublist if i in bad_list}
print 'd =', d
print 'ad =', ad
Unfortunately what that does is it changes the values in d permanently, and returns None for values in ad.
d = {1: ['a', 'c'], 2: ['b'], 5: []}
ad = {1: None, 5: None}
How can I get a dictionary that looks like this:
new_dict = {1: ['a','c'], 2:['b']}
without looping through? I have a much larger dictionary to deal with, and I'd like to do it in the most efficient way.
There is no way to do it without loop:
d = dict((key, [x for x in value if x not in bad_list]) for key, value in d.iteritems())
or with filter:
d = dict((key, filter(lambda x: x not in bad_list, d[key])) for key in d)
UPDATE
To exclude empty values:
d = dict((key, list(x)) for key in d for x in [set(d[key]).difference(bad_list)] if x)
Well, you could just use 'list comprehension', this one liner works, thought I find if ugly.
ad = {k:v for k,v in {k:[i for i in v if i not in bad_list] for k,v in d.items()}.items() if v}
I'd better use a for loop.
ad2 = dict()
for k,v in d.items():
_data_ = [item for item in v if item not in bad_list]
if _data_:
ad2[k]=_data_
Output:
print 'd =', d
print 'ad =', ad
print 'ad2=', ad2
>d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
>ad = {1: ['a', 'c'], 2: ['b']}
>ad2= {1: ['a', 'c'], 2: ['b']}
The following code written in Python 3.5 appears to do as requested in your question. Minimal change should be required for it to work with Python 2.x instead. Just use print statements instead of functions.
d = {1: ['a', 'c', 'd'], 2: ['b'], 5: ['e']}
bad_list = ['d', 'e']
ad = {a: b for a, b in ((a, [c for c in b if c not in bad_list]) for a, b in d.items()) if b}
print('d =', d)
print('ad =', ad)
I have a huge file (with around 200k inputs). The inputs are in the form:
A B C D
B E F
C A B D
D
I am reading this file and storing it in a list as follows:
text = f.read().split('\n')
This splits the file whenever it sees a new line. Hence text is like follows:
[[A B C D] [B E F] [C A B D] [D]]
I have to now store these values in a dictionary where the key values are the first element from each list. i.e the keys will be A, B, C, D.
I am finding it difficult to enter the values as the remaining elements of the list. i.e the dictionary should look like:
{A: [B C D]; B: [E F]; C: [A B D]; D: []}
I have done the following:
inlinkDict = {}
for doc in text:
adoc= doc.split(' ')
docid = adoc[0]
inlinkDict[docid] = inlinkDict.get(docid,0) + {I do not understand what to put in here}
Please help as to how should i add the values to my dictionary. It should be 0 if there are no elements in the list except for the one which will be the key value. Like in example for 0.
A dictionary comprehension makes short work of this task:
>>> s = [['A','B','C','D'], ['B','E','F'], ['C','A','B','D'], ['D']]
>>> {t[0]:t[1:] for t in s}
{'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
Try using a slice:
inlinkDict[docid] = adoc[1:]
This will give you an empty list instead of a 0 for the case where only the key value is on the line. To get a 0 instead, use an or (which always returns one of the operands):
inlinkDict[docid] = adoc[1:] or 0
Easier way with a dict comprehension:
>>> with open('/tmp/spam.txt') as f:
... data = [line.split() for line in f]
...
>>> {d[0]: d[1:] for d in data}
{'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
>>> {d[0]: ' '.join(d[1:]) if d[1:] else 0 for d in data}
{'A': 'B C D', 'C': 'A B D', 'B': 'E F', 'D': 0}
Note: dict keys must be unique, so if you have, say, two lines beginning with 'C' the first one will be over-written.
The accepted answer is correct, except that it reads the entire file into memory (may not be desirable if you have a large file), and it will overwrite duplicate keys.
An alternate approach using defaultdict, which is available from Python 2.4 solves this:
from collections import defaultdict
d = defaultdict(list)
with open('/tmp/spam.txt') as f:
for line in f:
parts = line.strip().split()
d[parts[0]] += parts[1:]
Input:
A B C D
B E F
C A B D
D
C H I J
Result:
>>> d = defaultdict(list)
>>> with open('/tmp/spam.txt') as f:
... for line in f:
... parts = line.strip().split()
... d[parts[0]] += parts[1:]
...
>>> d['C']
['A', 'B', 'D', 'H', 'I', 'J']