Search in a list for characters - python

I have done a large amount of searching but cannot find what I am after. I am using Iron Python.
I have a large list of strings (MyList) that I have extracted and I would like to see if there are values that contain the Items in the SearchStrings Dictionary. The searchStrings Dictionary could have over 500 items within.
MyList = ["123steel","MylistConcrete","Nothinginhere","45","56","steel","CONCRETE"]
SearchStrings = {'concrete' : 'C','CONCRETE' : 'C','Steel' : 'S', 'STEEL' : 'S'}
I need to return the index and then matching code from the SearchString.
i.e If we find 'MylistConcrete' i will know the index '1' and can return 'C'
I hope this makes sense to everyone. Let me know if you need any clarification
Thanks in Advance,
Geoff.

First of all, I'd suggest you to use string.lower() to eliminate case dependencies in the search. This will make your dictionary smaller and more manageable.
Then you can use a simple map function to create a new array with your values while preserving the index (or alter the original should you require that).
MyList = ["123steel","MylistConcrete","Nothinginhere","45","56","steel","CONCRETE"]
SearchStrings = {'concrete' : 'C', 'steel' : 'S'}
def check_search_strings(x):
for k, v in SearchStrings.items():
if k in x.lower():
return v
return None
indexes = list(map(check_search_strings, MyList))
print (indexes)

Iterate over your items in MyList and check for every item (lowercase) if any of the dict's (lowercase) keys is in it. Then replace.
This assumes that you don't have different values for identical words as keys (except for lower- / uppercase difference)
my_list = ["123steel", "MylistConcrete", "Nothinginhere", "45", "56", "steel", "CONCRETE"]
search_strings = {'concrete': 'C', 'CONCRETE': 'C', 'Steel': 'S', 'STEEL': 'S'}
for i in range(len(my_list)):
for k, v in search_strings.items():
if k.lower() in my_list[i].lower():
my_list[i] = v
break # avoids completing the loop if first item is found
print(my_list)
The result is
['S', 'C', 'Nothinginhere', '45', '56', 'S', 'C']

for m in MyList :
for k in SearchStrings :
if k.lower() in m.lower() :
print 'found', k, 'in', m, 'result', SearchStrings[k]

Related

generating list of every combination without duplicates

I would like to generate a list of combinations. I will try to simplify my problem to make it understandable.
We have 3 variables :
x : number of letters
k : number of groups
n : number of letters per group
I would like to generate using python a list of every possible combinations, without any duplicate knowing that : i don't care about the order of the groups and the order of the letters within a group.
As an example, with x = 4, k = 2, n = 2 :
# we start with 4 letters, we want to make 2 groups of 2 letters
letters = ['A','B','C','D']
# here would be a code that generate the list
# Here is the result that is very simple, only 3 combinations exist.
combos = [ ['AB', 'CD'], ['AC', 'BD'], ['AD', 'BC'] ]
Since I don't care about the order of or within the groups, and letters within a group, ['AB', 'CD'] and ['DC', 'BA'] is a duplicate.
This is a simplification of my real problem, which has those values : x = 12, k = 4, n = 3. I tried to use some functions from itertools, but with that many letters my computer freezes because it's too many combinations.
Another way of seeing the problem : you have 12 players, you want to make 4 teams of 3 players. What are all the possibilities ?
Could anyone help me to find an optimized solution to generate this list?
There will certainly be more sophisticated/efficient ways of doing this, but here's an approach that works in a reasonable amount of time for your example and should be easy enough to adapt for other cases.
It generates unique teams and unique combinations thereof, as per your specifications.
from itertools import combinations
# this assumes that team_size * team_num == len(players) is a given
team_size = 3
team_num = 4
players = list('ABCDEFGHIJKL')
unique_teams = [set(c) for c in combinations(players, team_size)]
def duplicate_player(combo):
"""Returns True if a player occurs in more than one team"""
return len(set.union(*combo)) < len(players)
result = (combo for combo in combinations(unique_teams, team_num) if not duplicate_player(combo))
result is a generator that can be iterated or turned into a list with list(result). On kaggle.com, it takes a minute or so to generate the whole list of all possible combinations (a total of 15400, in line with the computations by #beaker and #John Coleman in the comments). The teams are tuples of sets that look like this:
[({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'I'}, {'J', 'K', 'L'}),
({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'J'}, {'I', 'K', 'L'}),
({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'K'}, {'I', 'J', 'L'}),
...
]
If you want, you can cast them into strings by calling ''.join() on each of them.
Another solution (players are numbered 0, 1, ...):
import itertools
def equipartitions(base_count: int, group_size: int):
if base_count % group_size != 0:
raise ValueError("group_count must divide base_count")
return set(_equipartitions(frozenset(range(base_count)), group_size))
def _equipartitions(base_set: frozenset, group_size: int):
if not base_set:
yield frozenset()
for combo in itertools.combinations(base_set, group_size):
for rest in _equipartitions(base_set.difference(frozenset(combo)), group_size):
yield frozenset({frozenset(combo), *rest})
all_combinations = [
[tuple(team) for team in combo]
for combo in equipartitions(12, 3)
]
print(all_combinations)
print(len(all_combinations))
And another:
import itertools
from typing import Iterable
def equipartitions(players: Iterable, team_size: int):
if len(players) % team_size != 0:
raise ValueError("group_count must divide base_count")
return _equipartitions(set(players), team_size)
def _equipartitions(players: set, team_size: int):
if not players:
yield []
return
first_player, *other_players = players
for other_team_members in itertools.combinations(other_players, team_size-1):
first_team = {first_player, *other_team_members}
for other_teams in _equipartitions(set(other_players) - set(first_team), team_size):
yield [first_team, *other_teams]
all_combinations = [
{''.join(sorted(team)) for team in combo} for combo in equipartitions(players='ABCDEFGHIJKL', team_size=3)
]
print(all_combinations)
print(len(all_combinations))
Firstly, you can use a list comprehension to give you all of the possible combinations (regardless of the duplicates):
comb = [(a,b) for a in letters for b in letters if a != b]
And, afterwards, you can use the sorted function to sort the tuples. After that, to remove the duplicates, you can convert all of the items to a set and then back to a list.
var = [tuple(sorted(sub)) for sub in comb]
var = list(set(var))
You could use the list comprehension approach, which has a time complexity of O(n*n-1), or you could use a more verbose way, but with a slightly better time complexity of O(n^2-n)/2:
comb = []
for first_letter_idx, _ in enumerate(letters):
for sec_letter_idx in range(first_letter_idx + 1, len(letters)):
comb.append(letters[first_letter_idx] + letters[sec_letter_idx])
print(comb)
comb2 = []
for first_letter_idx, _ in enumerate(comb):
for sec_letter_idx in range(first_letter_idx + 1, len(comb)):
if (comb[first_letter_idx][0] not in comb[sec_letter_idx]
and comb[first_letter_idx][1] not in comb[sec_letter_idx]):
comb2.append([comb[first_letter_idx], comb[sec_letter_idx]])
print(comb2)
This algorithm needs more work to handle dynamic inputs. Maybe with recursion.
Use combination from itertools
from itertools import combinations
x = list(combinations(['A','B','C','D'],2))
t = []
for i in (x):
t.append(i[0]+i[1]) # concatenating the strings and adding in a list
g = []
for i in range(0,len(t),2):
for j in range(i+1,len(t)):
g.append([t[i],t[j]])
break
print(g)

Python: Create dictionary with list index number as key and list element as value?

All the questions I've seen do the exact opposite of what I want to do:
Say I have a list:
lst = ['a','b','c']
I am looking to make a dictionary where the key is the element number (starting with 1 instead of 0) and the list element is the value. Like this:
{1:'a', 2:'b', 3:'c'}
But for a long list. I've read a little about enumerate() but everything I've seen has used the list element as the key instead.
I found this:
dict = {tuple(key): idx for idx, key in enumerate(lst)}
But that produces:
{'a':1, 'b':2, 'c':3}
... which is the opposite of what I want. And, also in a weird notation that is confusing to someone new to Python.
Advice is much appreciated! Thanks!
enumerate has a start keyword argument so you can count from whatever number you want. Then just pass that to dict
dict(enumerate(lst, start=1))
You could also write a dictionary comprehension
{index: x for index, x in enumerate(lst, start=1)}
By default enumerate start from 0 , but you can set by this value by second argument which is start , You can add +1 to every iterator if you want to start from 1 instead of zero :
print({index+1:value for index,value in enumerate(lst)})
output:
{1: 'a', 2: 'b', 3: 'c'}
Above dict comprehension is same as :
dict_1={}
for index,value in enumerate(lst):
dict_1[index+1]=value
print(dict_1)
Using Dict Comprehension and enumerate
print({x:y for x,y in enumerate(lst,1)})
{1: 'a', 2: 'b', 3: 'c'}
Using Dict Comprehension , zip and range-
print({x:y for x,y in zip(range(1,len(lst)+1),lst)})
{1: 'a', 2: 'b', 3: 'c'}
I think the below code should help.
my_list = ['A', 'B', 'C', 'D']
my_index = []
my_dict = {}
for i in range(len(my_list)):
my_index.append(i+1)
for key in my_index:
for value in my_list:
my_dict[key] = value

Breaking up a string by comparing characters/combinations of characters against a dictionary

What I would like to do is something like this:
testdictionary = {"a":1, "b":2, "c":3, "A":4}
list1 = []
list2 = []
keyval = 200
for char in string:
i = 0
y = "".join(list1)
while y in testdictionary:
list1.append(string[i])
i +=1
list2.append(y[:-1])
testdictionary[y] = keyval
keyval +=1
string = string[((len(list1))-1):]
list1 = []
So for a string "abcacababa" the desired output would be:
['ab', 'ca', 'cab', 'aba']
Or "AAAAA" would be
['A', 'AA'. 'AA']
Take abcacababa. Iterating through we get a which is in testdictionary so we append list1 again. This time we have ab which is not in the dictionary, so we add it as a key to testdictionary with a value of 200. Then doing the same process again, we add ca to testdictionary with a value of 201. Then since we have already added ca, the next value appended to list2 would be cab and so on.
What I am trying to do is take a string and compare each character against a dictionary, if the character is a key in the dictionary add another character, do this until it is not in the dictionary at which point add it to the dictionary and assign a value to it, keep doing this for the whole string.
There's obviously a lot wrong with this code, it also doesn't work. The i index being out of range but I have no idea how to approach this iteration. Also I need to add in an if statement to ensure the "leftovers" of the string at the end are appended to list2. Any help is appreciated.
I think I get it now #Boa. This code I believe works for abcacababa at least. As for leftovers, I think it's only possible to have a single 'leftover' key when the last key is in the test dictionary, so you just have to check after the loop if curr_key is not empty:
testdictionary = {"a":1, "b":2, "c":3, "A":4}
word = 'abcacababa'
key_val = 200
curr_key = ''
out_lst = []
let_ind = 0
for let in word:
curr_key += let
if curr_key not in testdictionary:
out_lst.append(curr_key)
testdictionary[curr_key] = key_val
key_val += 1
curr_key = ''
leftover = curr_key
print(out_lst)
print(testdictionary)
Output:
['ab', 'ca', 'cab', 'aba']
{'a': 1, 'A': 4, 'c': 3, 'b': 2, 'aba': 203, 'ca': 201, 'ab': 200, 'cab': 202}
Please let me know if anything is unclear. Also I think your second example with AAAAA should be ['AA', 'AAA'] instead of ['A', 'AA', 'AA']

Handeling Dictionary of lists using Python

I have a single dictionary that contains four keys each key representing a file name and the values is nested lists as can be seen below:
{'file1': [[['1', '909238', '.', 'G', 'C', '131', '.', 'DP=11;VDB=3.108943e02;RPB=3.171491e-01;AF1=0.5;AC1=1;DP4=4,1,3,3;MQ=50;FQ=104;PV4=0.55,0.29,1,0.17', 'GT:PL:GQ', '0/1:161,0,131:99'], ['1', '909309', '.', 'T', 'C', '79', '.', 'DP=9;VDB=8.191851e-02;RPB=4.748531e-01;AF1=0.5;AC1=1;DP4=5,0,1,3;MQ=50;FQ=81.7;PV4=0.048,0.12,1,1', 'GT:PL:GQ', '0/1:109,0,120:99']......,'008_NTtrfiltered': [[['1', '949608', '.', 'G', 'A',...}
My question is how to check only the first two elements in the list for instance "1", "909238" for each of the key if they are the same and then write them to a file. The reason I want to do this is I want to filter only common values (only the first two elements of the list) for the four files (keys).
Thanks a lot in advance
Best.
You can access to the keys of the dictionary dictio and make your comparison using :
f = open('file.txt','w')
value_to_check_1 = '1'
value_to_check_2 = '909238'
for k in dictio:
value_1 = dictio[k][0][0][0]
value_2 = dictio[k][0][0][1]
if (( value_1 == value_to_check_1) and (value_2 == value_to_check_2)):
f.write('What you want to write\n')
f.close()
If you want to do a check that imply every values of your dictionary dictio.
Maybe you want to store couples of values from dictio.
couples = [(dictio[k][0][0][0], dictio[k][0][0][1]) for k in dictio]
Then, you can do a loop and iterate over the couples to do your check.
Example you can adapt according to your need :
for e in values_to_check:
for c in couples:
if (float(e[0][0][0]) >= float(c[0]) and float(e[0][0][1]) <= float(c[1])):
f.write(str(e[0][0][0]) + str(e[0][0][1]) + '\n')

Sort a dictionary alphabetically, and print it by frequency

I am running python 2.7.2 on a mac.
I have a simple dictionary:
dictionary= {a,b,c,a,a,b,b,b,b,c,a,w,w,p,r}
I want it to be printed and have the output like this:
Dictionary in alphabetical order:
a 4
b 5
c 2
p 1
r 1
w 2
But what I'm getting is something like this...
a 1
a 1
a 1
a 1
b 1
.
.
.
w 1
This is the code I am using.
new_dict = []
for word in dictionary.keys():
value = dictionary[word]
string_val = str(value)
new_dict.append(word + ": " + string_val)
sorted_dictionary = sorted(new_dict)
for entry in sorted_dictionary:
print entry
Can you please tell me where is the mistake?
(By the way, I'm not a programmer but a linguist, so please go easy on me.)
What you're using is not a dictionary, it's a set! :)
And sets doesn't allow duplicates.
What you probably need is not dictionaries, but lists.
A little explanation
Dictionaries have keys, and each unique keys have their own values:
my_dict = {1:'a', 2:'b', 3:'c'}
You retrieve values by using the keys:
>>> my_dict [1]
'a'
On the other hand, a list doesn't have keys.
my_list = ['a','b','c']
And you retrieve the values using their index:
>>> my_list[1]
'b'
Keep in mind that indices starts counting from zero, not 1.
Solving The Problem
Now, for your problem. First, store the characters as a list:
l = ['a', 'b', 'c', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'a', 'w', 'w', 'p', 'r']
Next, we'll need to know what items are in this list:
items = []
for item in l:
if item not in items:
items.append(item)
This is pretty much equal to items = set(l) (the only difference is that this is a list). But just to make things clear, hope you understand what the code does.
Here is the content of items:
>>> items
['a', 'b', 'c', 'w', 'p', 'r']
With that done, we will use lst.count() method to see the number of a char's occurence in your list, and the built-in function sorted() to sort the items:
for item in sorted(items): #iterates through the sorted items.
print item, l.count(item)
Result:
a 4
b 5
c 2
w 2
p 1
r 1
Hope this helps!!
Let's start with the obvious, this:
dictionary= {a,b,c,a,a,b,b,b,b,c,a,w,w,p,r}
is not a dictionary. It is a set, and sets do not preserve duplicates. You probably meant to declare that as a list or a tuple.
Now, onto the meat of your problem: you need to implement something to count the items of your collection. Your implementation doesn't really do that. You could roll your own, but really you should use a Counter:
my_list = ['a','b','c','a','a','b','b','b','b','c','a','w','w','p','r']
from collections import Counter
c = Counter(my_list)
c
Out[19]: Counter({'b': 5, 'a': 4, 'c': 2, 'w': 2, 'p': 1, 'r': 1})
Now on to your next problem: dictionaries (of all types, including Counter objects) do not preserve key order. You need to call sorted on the dict's items(), which is a list of tuples, then iterate over that to do your printing.
for k,v in sorted(c.items()):
print('{}: {}'.format(k,v))
a: 4
b: 5
c: 2
p: 1
r: 1
w: 2
dictionary is something like this{key1:content1, key2:content2, ...} key in a dictionary is unique. then a = {1,2,3,4,5,5,4,5,6} is the set, when you print this out, you will notice that
print a
set([1,2,3,4,5,6])
duplicates are eliminated.
In your case, a better data structure you can use is a list which can hold multiple duplicates inside.
if you want to count the element number inside, a better option is collections.Counter, for instance:
import collections as c
cnt = c.Counter()
dict= ['a','b','c','a','a','b','b','b','b','c','a','w','w','p','r']
for item in dict:
cnt[item]+=1
print cnt
the results would be:
Counter({'b': 5, 'a': 4, 'c': 2, 'w': 2, 'p': 1, 'r': 1})
as you notice, the results become a dictionary here.
so by using:
for key in cnt.keys():
print key, cnt[key]
you can access the key and content
a 4
c 2
b 5
p 1
r 1
w 2
you can achieve what you want by modifying this a little bit. hope this is helpful
Dictionary cannot be defined as {'a','b'}. If it defined so, then it is an set, where you can't find duplicates in the list
If your defining a character, give it in quotes unless it is declared already.
You can't loop through like this for word in dictionary.keys():, since here dictionary is not a dictionary type.
If you like to write a code without using any builtin function, try this
input=['a','b','c','a','a','b','b','b','b','c','a','w','w','p','r']
dict={}
for x in input:
if x in dict.keys():
dict[x]=dict[x]+1
else:
dict[x]=1
for k in dict.keys():
print k, dict[k]
First, a dictionary is an unordered collection (i.e., it has no guaranteed order of its keys).
Second, each dict key must be unique.
Though you could count the frequency of characters using a dict, there's a better the solution. The Counter class in Python's collections module is based on a dict and is specifically designed for a task like tallying frequency.
from collections import Counter
letters = ['a', 'b', 'c', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'a', 'w', 'w', 'p', 'r']
cnt = Counter(letters)
print cnt
The contents of the counter are now:
Counter({'b': 5, 'a': 4, 'c': 2, 'w': 2, 'p': 1, 'r': 1})
You can print these conveniently:
for char, freq in sorted(cnt.items()):
print char, freq
which gives:
a 4
b 5
c 2
p 1
r 1
w 2

Categories