Python 3 k-mers count errors

Python 3 k-mers count errors - python

I have to solve a problem (problem and the code is given below) and I'm getting errors on executing this program.
Traceback (most recent call last):
File "C:/Python33/exercise1.py", line 9, in <module>
for n in range[len(string) - k]:
TypeError: unsupported operand type(s) for -: 'int' and 'list'
-
Frequent Words Problem: Find the most frequent k-mers in a string.
Input: A string Text and an integer k. [XGMcKJXnoMhmuXcsjswaejVunrrsDhoiWEzXbiKoiYBEpVmhJszFWCFJAHLPzKfKBhWbCABPCTqASGvgquUtnwSeeYkXtLcbzMxvcsUwjmhHfexpEEhjhjzKvYdODZeCgrehxUnYqDwYMBxaFsYAqIFsBSZslMmTNXPovRtRbqFOhMXMUErCnRgjiBIovOWXxlkYInG]
Output: All most frequent k-mers in Text.
My designed code is:
k=open("dataset_3_6.txt").readlines()
import string
string = str()
for k in string:
k = int(k)
kmer_count = {}
for n in range(len(string) - k):
c = string[n:n+k]
kmer_count[c] = kmer_count[c] + 1 if kmer_count.has_key(c) else 1
max = 0
max_kmer = []
for k,v in kmer_count.iteritems():
if v > max:
max_kmer = [k]
max = v
elif v == max:
max_kmer += [k]
print("max_kmer")

There are a huge number of problems right at the top:
import string
Why are you importing this module? Are you planning to use it?
string = str()
This hides the string module so you can never use it again, and creates a default (that is, empty) string (which is easier to do with string = '').
for k in string:
k = int(k)
Since string is an empty string, this loops zero times.
Even if it weren't empty, k = int(k) wouldn't have any effect; you're just rebinding the loop variable over and over again. If you wanted to replace the string string with a list of numbers, where each number is the integer value of the corresponding character in the string, you'd need to build a list, either by creating an empty list and calling append, or by using a list comprehension (e.g., string = [int(k) for k in string]). I have no idea whether that's what you're actually trying to do here.
Anyway, if string weren't empty, after the loop, k would be the int value of the last character in the string. But since it is, k is still the result of calling open("dataset_3_6.txt").readlines() earlier. That is, it's a list of lines. So, in this:
for n in range(len(string) - k):
You're trying to subtract that list of lines from the number 0. (Remember, you set string to an empty string, so its len is 0.)
I have no idea what you expected this to do.
Part of the confusion is that you have meaningless variable names, and you reuse the same names to refer to different things over and over. First k is the list of lines. Then it's a loop variable. Then… you apparently intended it to be something else, but I don't know what. Then, it's each key in a dictionary.

Related

Python - loop through the characters of a string until length of string reaches length of another string

I'm trying to write a function that takes two strings (message and keyword) and where the latter is shorter than the former, loop over the characters in it so that the length of both strings are the same.
eg. message = "hello", keyword = "dog" – my intended output is "dogdo" so it loops over the characters inside the keyword as many times as the length of the message
here is my attempted code which repeats the entire string rather than each individual character. ie. with message = "hell" and keyword = "do", the output will be "dodododo" instead of "dodo".
when len(keyword) is not a divisor of len(message), I have tried to have my output composed of the this plus the remainder. so for message="hello" and keyword="dog", intended output is "dogdo", but the output I get is "dogggdogggdogggdogggdoggg".
I know the way I'm looping this is wrong and I would really appreciate it if somebody could let me know why this is the case and how to get each character looped rather than the whole string.
if len(keyword) < len(message):
if len(message) % len(keyword) ==0:
for x in range(0, len(message)):
for char in keyword:
keys += char
else:
for x in range(0, len(message)):
for char in keyword:
keys += char
remainder = len(message)%len(keyword)
for x in range(0, remainder):
keys+= char

You can do some itertools magic or just write your own trivial generator as follows:
def func(message, keyword):
def gc(s):
while True:
for e in s:
yield e
g = gc(keyword)
for _ in range(len(message)-len(keyword)):
keyword += next(g)
return keyword
print(func('hello', 'dog'))
Output:
dogdo

This doesn't require any loops at all:
(keyword * ((len(message+keyword)-1)//len(keyword)))[0:len(message)]
That is, make a string of enough copies of keyword to be at least as big as message, then only take a long enough prefix of that.

You can use zip and itertools.cycle to get the effect you are looking for.
zip will iterate until the shortest iterable is exhausted and cycle will keep iterating around forever:
from itertools import cycle
message="hello"
keyword="dog"
output = ''.join([x[1] for x in zip(message, cycle(keyword))])
print(output)
Output as requested

If I've understood your request correctly, I think something like the below is what you are looking for.
The concept here is that we have a function that does the work for us that takes as input the keyword and how long you want the string to be.
It then loops until the string has become long enough, adding 1 character at a time.
def lengthen(keyword, length):
output = keyword
loop_no = 0
while len(output) < length:
char_to_add = keyword[loop_no % len(keyword)]
output += char_to_add
loop_no += 1
return output
longer_keyword_string = lengthen(keyword, len(message))

Python for iteration with previous variable

the function is meant to do the follow, "to get the n (non-negative integer) copies of the first 2 characters of a given string. Return the n copies of the whole string if the length is less than 2."
Can anyone tell me what does the substr do in line 12?
I get how it works previously on line 8 (when string is larger than 2), but it looses me on how it works on line 12, where the string is lower than 2.
def substring_copy(str, n):
"""
Method 2
"""
f_lenght = 2
if f_lenght > len(str): # If strings length is larger than 2
f_lenght = len(str) # Length of string will be len(str)
substr = str[:f_lenght] # substr = str[:2] (slice 0 y 1)
# If length is shorter than 2
result = ""
for i in range(n):
result = result + substr
return result
print ("\nMethod 2:")
print(substring_copy('abcdef', 2))
print(substring_copy('p', 3));
If the length of p is 1, then isn't it a case that substr isn't that important and the for loop will run 3 (thanks to 3* in the last line of code)?
Thanks in advance!

I think I got it, substr is important for if there are more than 2 characters in the string. When there are less than 2, substr could have a value of 200; the p string would still be just one p and that would concatenate n times (3 times in this example).

So as you inferred substr is just the substring of length 2 (or less) from the original, which can be the input itself if it's already 2 or less.
It's your "duplication target", basically.
Though I do want to point out that the entire thing is a rather bad style, it is over-complicated and doesn't make good use of python:
str is the python string type (which also acts as a conversion function), it's a builtin, shadowing builtin is a bad idea and str is a common builtin, naming a variable str is terrible style, sometimes it's justifiable, but not here
python slicing is "bounded" to what it's slicing, so e.g. "ab"[:5] will return "ab", unlike regular indexing it does not care that the input is too short, this means the entire mess with f_lenght is unnecessary, you can just
substr = s[:2]
Python strings have an override for multiplication, str * n repeats the string n times, this also works with lists (though that is more risky because of mutability and reference semantics)
So the entire function could just be:
def substring_copy(s, n):
return s[:2] * n
The prompt is also not great because of the ambiguity of the word "character", but I guess we can let that slide

I'm unable to figure out what test cases am I failing here

I need to find the maximum occurring character in a string: a-z. It is 26 characters long i.e. 26 different types.
Even though the output is correct, I'm still failing. What am I doing wrong?
These are the conditions:
Note: If there are more than one type of equal maximum then the type with lesser ASCII value will be considered.
Input Format
The first line of input consists of number of test cases, T.
The second line of each test case consists of a string representing the type of each individual characters.
Constraints
1<= T <=10
1<= |string| <=100000
Output Format
For each test case, print the required output in a separate line.
Sample TestCase 1
Input
2
gqtrawq
fnaxtyyzz
Output
q
y
Explanation
Test Case 1: There are 2 q occurring the max while the rest all are present alone.
Test Case 2: There are 2 y and 2 z types. Since the maximum value is same, the type with lesser Ascii value is considered as output. Therfore, y is the correct type.
def testcase(str1):
ASCII_SIZE = 256
ctr = [0] * ASCII_SIZE
max = -1
ch = ''
for i in str1:
ctr[ord(i)]+=1;
for i in str1:
if max < ctr[ord(i)]:
max = ctr[ord(i)]
ch = i
return ch
print(testcase("gqtrawq"))
print(testcase("fnaxtyyzz"))
I'm passing the output i.e. I'm getting the correct output but failing the test cases.

Note the note:
Note: If there are more than one type of equal maximum then the type with lesser ASCII value will be considered.
But with your code, you return the character with highest count that appears first in the string. In case of ties, take the character itself into account in the comparison:
for i in str1:
if max < ctr[ord(i)] or max == ctr[ord(i)] and i < ch:
max = ctr[ord(i)]
ch = i
Or shorter (but not necessarily clearer) comparing tuples of (count, char):
if (max, i) < (ctr[ord(i)], ch):
(Note that this is comparing (old_cnt, new_char) < (new_cnt, old_chr)!)
Alternatively, you could also iterate the characters in the string in sorted order:
for i in sorted(str1):
if max < ctr[ord(i)]:
...
Having said that, you could simplify/improve your code by counting the characters directly instead of their ord (using a dict instead of list), and using the max function with an appropriate key function to get the most common character.
def testcase(str1):
ctr = {c: 0 for c in str1}
for c in str1:
ctr[c] += 1
return max(sorted(set(str1)), key=ctr.get)
You could also use collections.Counter, and most_common, but where's the fun in that?

What should be the output for this - print(testcase("fanaxtyfzyz"))?
IMO the output should be 'a' but your program writes 'f'.
The reason is you are iterating through the characters of the input string,
for i in str1: #Iterating through the values 'f','a','n','a','x','t',...
#first count of 'f' is considered.
#count of 'f' occurs first, count of 'a' not considered.
if max < ctr[ord(i)]:
max = ctr[ord(i)]
ch = i
Instead, you should iterate through the values of ctr. Or sort the input string and do the same.

TypeError: 'int' object is not iterable

There is an error when I execute This code-
for i in len(str_list):
TypeError: 'int' object is not iterable
How would I fix it? (Python 3)
def str_avg(str):
str_list=str.split()
str_sum=0
for i in len(str_list):
str_sum += len(str_list[i])
return str_sum/i

You are trying to loop over in integer; len() returns one.
If you must produce a loop over a sequence of integers, use a range() object:
for i in range(len(str_list)):
# ...
By passing in the len(str_list) result to range(), you get a sequence from zero to the length of str_list, minus one (as the end value is not included).
Note that now your i value will be the incorrect value to use to calculate an average, because it is one smaller than the actual list length! You want to divide by len(str_list):
return str_sum / len(str_list)
However, there is no need to do this in Python. You loop over the elements of the list itself. That removes the need to create an index first:
for elem in str_list
str_sum += len(elem)
return str_sum / len(str_list)
All this can be expressed in one line with the sum() function, by the way:
def str_avg(s):
str_list = s.split()
return sum(len(w) for w in str_list) / len(str_list)
I replaced the name str with s; better not mask the built-in type name, that could lead to confusing errors later on.

For loops requires multiple items to iterate through like a list of [1, 2, 3] (contains 3 items/elements).
The len function returns a single item which is an integer of the length of the object you have given it as a parameter.
To have something iterate as many times as the length of an object you can provide the len functions result to a range function. This creates an iterable allowing you to iterate as any times as the length of the object you wanted.
So do something like
for i in range(len(str_list)):
unless you want to go through the list and not the length of the list. You can then just iterate with
for i in str_list:

def str_avg(str):
str_list = str.split()
str_sum = len(''.join(str_list)) # get the total number of characters in str
str_count = len(str_list) # get the total words
return (str_sum / str_count)

While running for loop we need to provide any iterable object. If we use len(str_list) then it will be an integer value. We can not make an integer iterable.
Solution - Using range() function.
for i in range(len(str_list)):
Get complete detail in this article. enter link description here

Why does this give me an IndexError?

I have the following code that opens a csv, and appends all the values to a list. I then remove all the values that do not start with '2'. However, on the line if lst[k][0] != '2':, it raises an error:
Traceback (most recent call last):
File "historical_tempo1.py", line 23, in <module>
if lst[k][0] != '2':
IndexError: list index out of range
Here is the code:
y = open('today.csv')
lst = []
for k in y:
lst.append(k)
lst = ' '.join(lst).split()
for k in range(0, len(lst)-1):
if lst[k][0] != '2':
lst[k:k+1] = ''
Here is the first bit of content from the csv file:
Date,Time,PM2.5 Mass concentration(ug/m3),Status
3/15/2014,4:49:13 PM,START
2014/03/15,16:49,0.5,0
3/15/2014,4:49:45 PM,START
2014/03/15,16:50,5.3,0
2014/03/15,16:51,5.1,0
2014/03/15,16:52,5.0,0
2014/03/15,16:53,5.0,0
2014/03/15,16:54,5.4,0
2014/03/15,16:55,6.4,0
2014/03/15,16:56,6.4,0
2014/03/15,16:57,5.0,0
2014/03/15,16:58,5.2,0
2014/03/15,16:59,5.2,0
3/15/2014,5:03:48 PM,START
2014/03/15,17:04,4.8,0
2014/03/15,17:05,4.9,0
2014/03/15,17:06,4.9,0
2014/03/15,17:07,5.1,0
2014/03/15,17:08,4.6,0
2014/03/15,17:09,4.9,0
2014/03/15,17:10,4.4,0
2014/03/15,17:11,5.7,0
2014/03/15,17:12,4.4,0
2014/03/15,17:13,4.0,0
2014/03/15,17:14,4.6,0
2014/03/15,17:15,4.7,0
2014/03/15,17:16,4.8,0
2014/03/15,17:17,4.5,0
2014/03/15,17:18,4.4,0
2014/03/15,17:19,4.5,0
2014/03/15,17:20,4.8,0
2014/03/15,17:21,4.6,0
2014/03/15,17:22,5.1,0
2014/03/15,17:23,4.2,0
2014/03/15,17:24,4.6,0
2014/03/15,17:25,4.5,0
2014/03/15,17:26,4.4,0

Why do you get an IndexError? Because when you write lst[k:k+1] = '', you have just removed the k+1 element from your list, which means your list is shorter by 1 element, and your loop is still going up to the old len(lst), so the index variable k is guaranteed to go over.
How can you fix this? Loop over a copy and delete from the original using list.remove().
The following code loops over the copy.
for s in lst[:]:
if k[0] != '2':
list.remove(k)

The expressions lst[k][0] raises an IndexError, which means that either:
# (1) this expressions raises it
x = lst[k]
# or (2) this expression raises it
x[0]
If (1) raises it, it means len(lst) <= k, i.e. there are fewer items than you expect.
If (2) raises it, it means x is an empty string, which means you can't access its item at index 0.
Either way, instead of guessing, use pdb. Run your program using pdb, and at the point your script aborts, examine the values of lst, k, lst[k], and lst[k][0].

Basically, your list, 'lst', starts out at length 43. The 'slice' operation lst[k:k+1] doesn't replace two separate indexed values with '', but wipes out one of the list entries. If you did a lst[k:k+5], you would wipe out five entries. Try it in the interpreter.
I'd recommend you don't try to wipe out those entries particularly in the list you are performing operations. It is shrinking in this case which means you go out of range and get an "IndexError". Store the values you want into another a list if you have to remove the lines that don't begin with "2".
List comprehensions work great in this case...
mynewlist = [x for x in lst if x[0] == '2']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3 k-mers count errors - python

Related

Python - loop through the characters of a string until length of string reaches length of another string

Python for iteration with previous variable

I'm unable to figure out what test cases am I failing here

TypeError: 'int' object is not iterable

Why does this give me an IndexError?

Categories

Resources