Getting the maximum value from dictionary - python

I'm facing problem with this. I have 10,000 rows in my dictionary and this is one of the rows
Example: A (8) C (4) G (48419) T (2) when printed out
I'd like to get 'G' as an answer, since it has the highest value.
I'm currently using Python 2.4 and I have no idea how to solve this as I'm quite new in Python.
Thanks a lot for any help given :)

Here's a solution that
uses a regexp to scan all occurrences of an uppercase letter followed by a number in brackets
transforms the string pairs from the regexp with a generator expression into (value,key) tuples
returns the key from the tuple that has the highest value
I also added a main function so that the script can be used as a command line tool to read all lines from one file and the write the key with the highest value for each line to an output file. The program uses iterators, so that it is memory efficient no matter how large the input file is.
import re
KEYVAL = re.compile(r"([A-Z])\s*\((\d+)\)")
def max_item(row):
return max((int(v),k) for k,v in KEYVAL.findall(row))[1]
def max_item_lines(fh):
for row in fh:
yield "%s\n" % max_item(row)
def process_file(infilename, outfilename):
infile = open(infilename)
max_items = max_item_lines(infile)
outfile = open(outfilename, "w")
outfile.writelines(max_items)
outfile.close()
if __name__ == '__main__':
import sys
infilename, outfilename = sys.argv[1:]
process_file(infilename, outfilename)
For a single row, you can call:
>>> max_item("A (8) C (4) G (48419) T (2)")
'G'
And to process a complete file:
>>> process_file("inputfile.txt", "outputfile.txt")
If you want an actual Python list of every row's maximum value, then you can use:
>>> map(max_item, open("inputfile.txt"))

max(d.itervalues())
This will be much faster than say d.values() as it is using an iterable.

Try the following:
st = "A (8) C (4) G (48419) T (2)" # your start string
a=st.split(")")
b=[x.replace("(","").strip() for x in a if x!=""]
c=[x.split(" ") for x in b]
d=[(int(x[1]),x[0]) for x in c]
max(d) # this is your result.

Use regular expressions to split the line. Then for all the matched groups, you have to convert the matched strings to numbers, get the maximum, and figure out the corresponding letter.
import re
r = re.compile('A \((\d+)\) C \((\d+)\) G \((\d+)\) T \((\d+)\)')
for line in my_file:
m = r.match(line)
if not m:
continue # or complain about invalid line
value, n = max((int(value), n) for (n, value) in enumerate(m.groups()))
print "ACGT"[n], value

row = "A (8) C (4) G (48419) T (2)"
lst = row.replace("(",'').replace(")",'').split() # ['A', '8', 'C', '4', 'G', '48419', 'T', '2']
dd = dict(zip(lst[0::2],map(int,lst[1::2]))) # {'A': 8, 'C': 4, 'T': 2, 'G': 48419}
max(map(lambda k:[dd[k],k], dd))[1] # 'G'

Related

How to print different character of repeated words in python?

My data in a text file PDBs.txt looks like this:
150L_A
150L_B
150L_C
150L_D
16GS_A
16GS_B
17GS_A
17GS_B
The end result needed is:
"First chain of 150L is A and second is B and third is C and forth is D"
"First chain of 16GS is A and second is B"
etc.
in chains.txt output file.
Thank you for your help.
You could achieve this by first reading the file and extracting the PDB and chain labels to a dictionary mapping the PDB ID to a list of chain labels, here called results. Then, you can write the "chains.txt" file line by line by iterating through these results and constructing the output lines you indicated:
from collections import defaultdict
results = defaultdict(list)
with open("PDBs.txt") as fh:
for line in fh:
line = line.strip()
if line:
pdb, chain = line.split("_")
results[pdb].append(chain)
# Note that you would need to extend this if more than 4 chains are possible
prefix = {2: "second", 3: "third", 4: "fourth"}
with open("chains.txt", "w") as fh:
for pdb, chains in results.items():
fh.write(f"First chain of {pdb} is {chains[0]}")
for ii, chain in enumerate(chains[1:], start=1):
fh.write(f" and {prefix[ii + 1]} is {chain}")
fh.write("\n")
Content of "chains.txt":
First chain of 150L is A and second is B and third is C and fourth is D
First chain of 16GS is A and second is B
First chain of 17GS is A and second is B
First chain of 18GS is A and second is B
First chain of 19GS is A and second is B
You can reach that simply with split operations and a loop.
First split your data by empty chars to get the separated chunks as a list. Then each chunk consists of a key and a value, separated by an underscore. You can iterate over all chunks and split each of them into the key and the value. Then simply create a python dictionary with an array of all values per key.
data = "150L_A 150L_B 150L_C 150L_D 16GS_A 16GS_B 17GS_A 17GS_B 18GS_A 18GS_B 19GS_A 19GS_B"
chunks = data.split()
result = {}
for chunk in chunks:
(key, value) = chunk.split('_')
if not key in result:
result[key] = []
result[key].append(value)
print(result)
# {'150L': ['A', 'B', 'C', 'D'], '16GS': ['A', 'B'], '17GS': ['A', 'B'], '18GS': ['A', 'B'], '19GS': ['A', 'B']}

Is there a python code for finding the combination of a row with only some colums are variable

I have a row of string data like:
A, B, C/D , E/F, J , K
I want to find the non repeating combination of whole row where only column data containing "/" undergo combination (python code 3.x )
Output:
A,B,C,E,J,K
A,B,D,E,J,K
A,B,C,F,J,K
A,B,D,F,J,K
Single Items are constant columns. It is preferable if the code works with N number of columns and m number of items within each variable colums. Please help
I could not use any of python builtin methods directly to solve this.
You mean something like this?
from itertools import product
s = ['A', 'B', 'C/D', 'E/F', 'J', 'K']
s = [x.split('/') for x in s]
for p in product(*s):
print(''.join(p))
Output
ABCEJK
ABCFJK
ABDEJK
ABDFJK
Here is the code to compute the answer recursively:
def _compute_result(arr, i, prefix, results):
if i == len(arr):
results.append(','.join(prefix))
return
for j in arr[i].split('/'):
_compute_result(arr, i + 1, prefix + [j], results)
string_array = input().strip().split(' ')
results = []
_compute_result(string_array, 0, [], results)
print(' '.join(results))
O/P:
A,B,C,E,J,K A,B,C,F,J,K A,B,D,E,J,K A,B,D,F,J,K

How to count elements on each position in lists

I have a lot of lists like:
SI821lzc1n4
MCap1kr01lv
All of them have the same length. I need to count how many times each symbol appears on each position. Example:
abcd
a5c1
b51d
Here it'll be a5cd
One way is to use zip to associate characters in the same position. We can then send all of the characters from each position to a Counter, then use Counter.most_common to get the most common character
from collections import Counter
l = ['abcd', 'a5c1', 'b51d']
print(''.join([Counter(z).most_common(1)[0][0] for z in zip(*l)]))
# a5cd
from statistics import mode
[mode([x[i] for x in y]) for i in xrange(len(y[0]))]
where y is your list.
Python 3.4 and up
You could use combination of zip and Counter
a = ("abcd")
b = ("a5c1")
c = ("b51d")
from collections import Counter
zippedList = list(zip(a,b,c))
print("zipped: {}".format(zippedList))
final = ""
for x in zippedList:
countLetters = Counter(x)
print(countLetters)
final += countLetters.most_common(3)[0][0]
print("output: {}".format(final))
output:
zipped: [('a', 'a', 'b'), ('b', '5', '5'), ('c', 'c', '1'), ('d', '1', 'd')]
Counter({'a': 2, 'b': 1})
Counter({'5': 2, 'b': 1})
Counter({'c': 2, '1': 1})
Counter({'d': 2, '1': 1})
output: a5cd
This all depends on where your list is. Is your list coming from another file or is it an actual array? At the end of the day, the best way to do this simply is going to be to use a dictionary and a for loop.
new_dict = {}
for i in range(len(line)):
if i in new_dict:
new_dict[i].append(line[i])
else:
new_dict[i] = [line[i]]
Then after that I'm assuming that you'd like to output the four most common element appearances. For that I'd recommend importing statistics and using the mode method...
from statistics import mode
new_line = ""
for key in new_dict:
x = mode(new_dict[key])
new_line = new_line + x
However, your question is quite vague, please elaborate more next time.
P.s. I'm a newbie so all you experienced programmers plz don't hate :)
I would use a combination of defaultdict, enumerate, and Counter:
>>> from collections import Counter, defaultdict
>>> data = '''abcd
a5c1
b51d
'''
>>> poscount = defaultdict(Counter)
>>> for line in data.split():
for i, character in enumerate(line):
poscount[i][character] += 1
>>> ''.join([poscount[i].most_common(1)[0][0] for i in sorted(poscount)])
'a5cd'
Here's how it works:
The defaultdict() creates new entries when it sees a new key.
The enumerate() function returns both the character and its position in the line.
The Counter counts the occurences of individual characters
Combining the three makes a defaultdict whose keys are the column positions and whose values are character counters. That gives you one character counter per column.
The most_common() method returns the highest frequency (character, count) pair for that counter.
The [0][0] extracts the character from the list of (character, count) tuples.
The str.join() method combines the results back together.

Read python list correctly

I have defined a function that takes in a list like this
arr = ['C','D','E','I','M']
I have another function that produces a similar kind of list, the function is:
def tree_count(arr):
feat = ['2','2','2','2','0']
feat_2 = []
dictionary = dict(zip(arr, feat))
print('dic',dictionary)
feat_2.append([k for k,v in dictionary.items() if v=='2'])
newarr = str(feat_2)[1:-1]
print(newarr)
This outputs the correct result that I want, i.e:
['C','D','E','I']
But the problem is, when I use this list in another function, its values should be read as C,D,E,I . But instead when I print this, the bracket [ and ' are included as result:
for i in newarr:
print(i)
The printed result is : [ ' C ', and so on for each line. I want to get rid of [ '. How do I solve this?
For some reason you are using str() on the array, this is what causes the square brackets from array to appear in the print statement.
See if the following methods suit you:
print(arr) # ['C','D','E','I'] - the array itself
print(str(arr)) # "['C', 'D', 'E', 'I']" - the array as string literal
print(''.join(arr)) # 'CDEI' - array contents as string with no spaces
print(' '.join(arr)) # 'C D E I' - array contents as string with spaces
Make your function return the dictionary rather than just printing it:
def tree_count(arr):
feat = ['2','2','2','2','0']
dictionary = dict(zip(arr, feat))
dictionary = [k for k in dictionary if dictionary[k] == '2']
return dictionary
For instance,
$ results = tree_count(['C','D','E','I','M'])
$ print(results)
['I', 'C', 'D', 'E']
Pretty-printing is then fairly straightforward:
$ print("\n".join(results))
I
C
D
E
... or if you just want ,:
$ print(", ".join(results))
I, C, D, E

Python - Appending and Sorting a List

I'm working on a code where I'm trying to take a argv (i, w or f) from the command line. Then using input, I want to take a list of integers, float or words and execute a few things.
User will enter 'f' on the command line and then input a list of floating points where the values will append to an empty list. Then the program will sort the list of float and print the output results.
I want to similar for words and integers.
If the input is a list of words, the output will print words in alphabetize order. If the input is a list of integers, the output will be the list in the reverse order.
This is the code that I have so far, but as of right now some of the input values are just appending the values to the empty list. What am I missing that is preventing the code to execute properly?
for example, program will start by adding program name and 'w' for word:
$ test.py w
>>> abc ABC def DEF
[ABC, DEF,abc,def] # list by length, alphabetizing words
code
import sys, re
script, options = sys.argv[0], sys.argv[1:]
a = []
for line in options:
if re.search('f',line): # 'f' in the command line
a.append(input())
a.join(sorted(a)) # sort floating point ascending
print (a)
elif re.search('w', line):
a.append.sort(key=len, reverse=True) # print list in alphabetize order
print(a)
else: re.search('i', line)
a.append(input())
''.join(a)[::-1] # print list in reverse order
print (a)
Try this:
import sys
option, values = sys.argv[1], sys.argv[2:]
tmp = {
'i': lambda v: map(int, v),
'w': lambda v: map(str, v),
'f': lambda v: map(float, v)
}
print(sorted(tmp[option](values)))
Output:
shell$ python my.py f 1.0 2.0 -1.0
[-1.0, 1.0, 2.0]
shell$
shell$ python my.py w aa bb cc
['aa', 'bb', 'cc']
shell$
shell$ python my.py i 10 20 30
[10, 20, 30]
shell$
You'll have to add necessary error handling. For e.g,
>>> float('aa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: aa
>>>

Categories