I'm receiving values as strings, separated by comma.
Example:
alpha, gane, delta
delta, opsirom, nado
I want to obtain a list/set of uniques values, sorted. I'm trying to use a set for uniquenes:
app = set()
for r in result:
app = app | set(r.split[","])
but I get the following error:
TypeError: 'builtin_function_or_method' object is not subscriptable
I would use a mix between split and replace if I'm understand your input correctly and set for uniqueness as you stated:
value_1 = "alpha, gane, delta, alpha"
aux_1 = value_1.replace(" ","").split(",")
a = list(set(aux_1))
print(a)
#Another list formatted as string arrives:
value_2 = "alpha, beta, omega, beta"
aux_2 = value_2.replace(" ","").split(",")
#Option 1:
a += list(set(aux_2))
a = list(set(a))
print(a)
#Option 2:
for i in aux_2:
if i in a:
pass
else:
a.append(i)
print(a)
Output for both cases:
['delta', 'gane', 'omega', 'beta', 'alpha']
After you receive another string you can add the values to the full list, in this case a and use set() again to eliminate further duplicates. Or check for each individual value if the the value in the string is in the full list and append it if it's not, or skip if it already exists in the full list.
as well as you can use below code,
splited_inputs = inputs.split(',')
unique_values = list(dict.fromkeys(splited_inputs))
Try this:
s = "alpha, gane, delta, delta, opsirom, nado"
unique_values = list(set(s.rsplit(', ')))
print(unique_values)
outputs:
['opsirom', 'delta', 'alpha', 'gane', 'nado']
You are not too far off. The immediate problem was the use of [] instead of () for the split function call.
In [151]: alist = """alpha, gane, delta
...: delta, opsirom, nado""".splitlines()
In [152]: alist
Out[152]: ['alpha, gane, delta', 'delta, opsirom, nado']
In [153]: aset = set()
In [154]: for astr in alist:
...: aset |= set(astr.split(', '))
...:
In [155]: aset
Out[155]: {'alpha', 'delta', 'gane', 'nado', 'opsirom'}
The use | to join sets is fine; I used the \= version. The split delimiter needed to be tweaked to avoid having both 'delta' and ' delta' in result. Otherwise you might need to apply strip to each string. #Victor got this part right.
Related
Creating list of name and price, this will be the given list:
alist = [["Chanel-1000, Dior-2000, Prada"],["Chloe-200,Givenchy-400,LV-600"],["Bag-1,Bagg-2,Baggg-3"]]
To get minimum value of each element
This will be the output:
chanel-1000,chloe-200,bag-1
Check out this code:
1 .If u want output based on integer tag of each name then use the method.
alist = [["Chanel-1000, Dior-2000, Prada-500"],
["Chloe-200,Givenchy-400,LV-600"], ["Bag-1,Bagg-2,Baggg-3"]]
alist_min = [
min(map(str.strip, x[0].split(',')),
key=lambda i: int(str.strip(i).split('-')[-1])) for x in alist
]
print(alist_min)
OUTPUT :
['Prada-500', 'Chloe-200', 'Bag-1']
2 .If u want output based on name tag of each name then use the method.
alist = [
["Chanel-1000, Dior-2000, Prada"],
["Chloe-200,Givenchy-400,LV-600"],
["Bag-1,Bagg-2,Baggg-3"]]
alist_min = [min(map(str.strip, x[0].split(','))) for x in alist]
print(alist_min)
Now u can use this alist_min object to get required output.
Use min() along with a list comprehension:
alist = [["Chanel-1000, Dior-2000, Prada"],["Chloe-200,Givenchy-400,LV-600"],["Bag-1,Bagg-2,Baggg-3"]]
alist_min = [min(re.split(r',\s*', x[0])) for x in alist]
print(alist_min) # ['Chanel-1000', 'Chloe-200', 'Bag-1']
I want to write a code to sort arr based on customAl order, and not use sorted function.
customAl = [dshbanfmg]
arr = [bba,abb,baa,mggfba,mffgh......]
psudo code:
def sortCA(arr, customAl):
dt = {}
generate dt order based on customAl
look up and sort arr
return result
newArr = [bba,baa,abb,mffgh,mggfba......]
I know there's a similiar question but the answer is wrapped in sorted function which I don't wish to use. anyone has a better solution than unsorted, or dictionary which takes space?
Sorting string values according to a custom alphabet in Python
In my opinion, programming is a trade-off, it depends on which part you care most.
Specifically, in this scenario, you can choose to trade time for space by str.index, or you can trade space for time with an extra index dict:
customAl = 'dshbanfmg'
arr = ['bba', 'abb', 'baa', 'mggfba', 'mffgh']
# trade time for space
# no extra space but, but O(n) to index
def sortCA1(arr, customAl):
return sorted(arr, key=lambda x: [customAl.index(c) for c in x])
# trade space for time
# extra space O(n), but O(1) to index
def sortCA2(arr, customAl):
dt = {c: i for i, c in enumerate(customAl)}
return sorted(arr, key=lambda x: [dt[c] for c in x])
# output: ['bba', 'baa', 'abb', 'mffgh', 'mggfba']
Here is a version which not use sorted function, we can use a bucket based on custom alphabet order. split the arr by 1st char, if one bucket has multiple elements then split by 2nd char recursively...kind of radix sort:
one thing to mention, the length is different, so we should add a bucket to record none index str.
def sortCA3(arr, customAl):
dt = {c: i + 1 for i, c in enumerate(customAl)} # keep 0 for none bucket
def bucket_sort(arr, start):
new_arr = []
buckets = [[] for _ in range(len(customAl) + 1)]
for s in arr:
if start < len(s):
buckets[dt[s[start]]].append(s)
else:
buckets[0].append(s)
for bucket in buckets:
if len(bucket) == 1:
new_arr += bucket
elif len(bucket) > 1:
new_arr += bucket_sort(bucket, start+1)
return new_arr
return bucket_sort(arr, 0)
test and output
customAl = 'dshbanfmg'
arr = ['bba', 'bb', 'abb', 'baa', 'mggfba', 'mffgh'] # add `bb` for test
print(sortCA4(arr, customAl))
I have defined a function that takes in a list like this
arr = ['C','D','E','I','M']
I have another function that produces a similar kind of list, the function is:
def tree_count(arr):
feat = ['2','2','2','2','0']
feat_2 = []
dictionary = dict(zip(arr, feat))
print('dic',dictionary)
feat_2.append([k for k,v in dictionary.items() if v=='2'])
newarr = str(feat_2)[1:-1]
print(newarr)
This outputs the correct result that I want, i.e:
['C','D','E','I']
But the problem is, when I use this list in another function, its values should be read as C,D,E,I . But instead when I print this, the bracket [ and ' are included as result:
for i in newarr:
print(i)
The printed result is : [ ' C ', and so on for each line. I want to get rid of [ '. How do I solve this?
For some reason you are using str() on the array, this is what causes the square brackets from array to appear in the print statement.
See if the following methods suit you:
print(arr) # ['C','D','E','I'] - the array itself
print(str(arr)) # "['C', 'D', 'E', 'I']" - the array as string literal
print(''.join(arr)) # 'CDEI' - array contents as string with no spaces
print(' '.join(arr)) # 'C D E I' - array contents as string with spaces
Make your function return the dictionary rather than just printing it:
def tree_count(arr):
feat = ['2','2','2','2','0']
dictionary = dict(zip(arr, feat))
dictionary = [k for k in dictionary if dictionary[k] == '2']
return dictionary
For instance,
$ results = tree_count(['C','D','E','I','M'])
$ print(results)
['I', 'C', 'D', 'E']
Pretty-printing is then fairly straightforward:
$ print("\n".join(results))
I
C
D
E
... or if you just want ,:
$ print(", ".join(results))
I, C, D, E
I'm trying to get the matching IDs and store the data into one list. I have a list of dictionaries:
list = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
Expected output would be something like
# {'id':'123','name':'Jason','location': ['McHale', 'Tompson Hall']},
# {'id':'432','name':'Tom','location': 'Sydney'},
How can I get matching data based on dict ID value? I've tried:
for item in mylist:
list2 = []
row = any(list['id'] == list.id for id in list)
list2.append(row)
This doesn't work (it throws: TypeError: tuple indices must be integers or slices, not str). How can I get all items with the same ID and store into one dict?
First, you're iterating through the list of dictionaries in your for loop, but never referencing the dictionaries, which you're storing in item. I think when you wrote list[id] you mean item[id].
Second, any() returns a boolean (true or false), which isn't what you want. Instead, maybe try row = [dic for dic in list if dic['id'] == item['id']]
Third, if you define list2 within your for loop, it will go away every iteration. Move list2 = [] before the for loop.
That should give you a good start. Remember that row is just a list of all dictionaries that have the same id.
I would use kdopen's approach along with a merging method after converting the dictionary entries I expect to become lists into lists. Of course if you want to avoid redundancy then make them sets.
mylist = [
{'id':'123','name':['Jason'],'location': ['McHale']},
{'id':'432','name':['Tom'],'location': ['Sydney']},
{'id':'123','name':['Jason'],'location':['Tompson Hall']}
]
def merge(mylist,ID):
matches = [d for d in mylist if d['id']== ID]
shell = {'id':ID,'name':[],'location':[]}
for m in matches:
shell['name']+=m['name']
shell['location']+=m['location']
mylist.remove(m)
mylist.append(shell)
return mylist
updated_list = merge(mylist,'123')
Given this input
mylist = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
You can just extract it with a comprehension
matched = [d for d in mylist if d['id'] == '123']
Then you want to merge the locations. Assuming matched is not empty
final = matched[0]
final['location'] = [d['location'] for d in matched]
Here it is in the interpreter
In [1]: mylist = [
...: {'id':'123','name':'Jason','location': 'McHale'},
...: {'id':'432','name':'Tom','location': 'Sydney'},
...: {'id':'123','name':'Jason','location':'Tompson Hall'}
...: ]
In [2]: matched = [d for d in mylist if d['id'] == '123']
In [3]: final=matched[0]
In [4]: final['location'] = [d['location'] for d in matched]
In [5]: final
Out[5]: {'id': '123', 'location': ['McHale', 'Tompson Hall'], 'name': 'Jason'}
Obviously, you'd want to replace '123' with a variable holding the desired id value.
Wrapping it all up in a function:
def merge_all(df):
ids = {d['id'] for d in df}
result = []
for id in ids:
matches = [d for d in df if d['id'] == id]
combined = matches[0]
combined['location'] = [d['location'] for d in matches]
result.append(combined)
return result
Also, please don't use list as a variable name. It shadows the builtin list class.
I have got two lists. The first one contains names and second one names and corresponding values. The names of the first list in a subset of the name of the name of the second lists. The values are a true or false. I want to find the co-occurrences of the names of both lists and count the true values. My code:
data1 = [line.strip() for line in open("text_files/first_list.txt", 'r')]
ins = open( "text_files/second_list.txt", "r" ) # the "r" is not really needed - default
parseTable = []
for line in ins:
row = line.rstrip().split(' ') # <- note use of rstrip()
parseTable.append(row)
new_data = []
indexes = []
for index in range(len(parseTable)):
new_data.append(parseTable[index][0])
indexes.append(parseTable[index][1])
in1 =return_indices_of_a(new_data, data1)
def return_indices_of_a(a, b):
b_set = set(b)
return [i for i, v in enumerate(a) if v in b_set] #return the co-occurrences
I am reading both text files which containing the lists, i found the co-occurrences and then I want to keep from the parseTable[][1] only the in1 indices . Am I doing it right? How can I keep the indices I want? My two lists:
['SITNC', 'porkpackerpete', 'teensHijab', '1DAlert', 'IsmodoFashion',....
[['SITNC', 'true'], ['1DFAMlLY', 'false'], ['tibi', 'true'], ['1Dneews', 'false'], ....
Here's a one liner to get the matches:
matches = [(name, dict(values)[name]) for name in set(names) if name in dict(values)]
and then to get the number of true matches:
len([name for (name, value) in matches if value == 'true'])
Edit
You might want to move dict(values) into a named variable:
value_map = dict(values)
matches = [(name, value_map[name]) for name in set(names) if name in value_map]
There are two ways, one is what Andrey suggests (you may want to convert names to set), or, alternatively, convert the second list into a dictionary:
mapping = dict(values)
sum_of_true = sum(mapping[n] for n in names)
The latter sum works because bool is essentially int in Python (True == 1).
If you need just the sum of true values, then use in operator and list comprehension:
In [1]: names = ['SITNC', 'porkpackerpete', 'teensHijab', '1DAlert', 'IsmodoFashion']
In [2]: values = [['SITNC', 'true'], ['1DFAMlLY', 'false'], ['tibi', 'true'], ['1Dneews', 'false']]
In [3]: sum_of_true = len([v for v in values if v[0] in names and v[1] == "true"])
In [4]: sum_of_true
Out[4]: 1
To get also indices of co-occurrences, this one-liner may come in handy:
In [6]: true_indices = [names.index(v[0]) for v in values if v[0] in names and v[1] == "true"]
In [7]: true_indices
Out[7]: [0]