Related
I want to get a program or a function to compress ASCII art from a text file into RLE with a run length of 2, so it counts the amount of characters and displays it like so: 04662312x52c02z01 03a (just an example), so that it takes 2 numbers then a character.
from collections import OrderedDict
def runLengthEncoding(input):
dict=OrderedDict.fromkeys(input, 0)
for ch in input:
dict[ch] += 1
for key,value in dict.iteritems():
output = output + key + str(value)
return output
Ive tried this code but it doesnt work for number characters (it reads 53405211c as "53405211", "c" instead of "53", "4" "05", "2" "11", "c")
If anyone could simplify this
I don't really see how your code is supposed to split the string into groups of 2+1 character, but anyway, using a dict or OrderedDict would not work, as even if ordered, a dict can hold any key at most once, i.e. it could not represent an encoded string like 01a01b01a. Instead, you should create a list of tuples, and you can do so using just string slicing and a range with step=3:
def runLengthEncoding(s):
return [(int(s[i:i+2]), s[i+2]) for i in range(0, len(s), 3)]
>>> runLengthEncoding("04662312x52c02z01 03a")
[(4, '6'), (62, '3'), (12, 'x'), (52, 'c'), (2, 'z'), (1, ' '), (3, 'a')]
It is not really clear from your question whether the function is supposed to encode or decode the strings, but judging from your final sentence, I assume you want to decode them. The other direction can easily be done with itertools.groupby and some str.joining:
# RLE -> Text
s = "04662312x52c02z01 03a"
pairs = [(int(s[i:i+2]), s[i+2]) for i in range(0, len(s), 3)]
# [(4, '6'), (62, '3'), (12, 'x'), (52, 'c'), (2, 'z'), (1, ' '), (3, 'a')]
text = ''.join(n * c for n, c in pairs)
# '666633333333333333333333333333333333333333333333333333333333333333xxxxxxxxxxxxcccccccccccccccccccccccccccccccccccccccccccccccccccczz aaa'
# Text -> RLE
from itertools import groupby
pairs = [(len(list(g)), k) for k, g in groupby(text)]
# [(4, '6'), (62, '3'), (12, 'x'), (52, 'c'), (2, 'z'), (1, ' '), (3, 'a')]
s = ''.join("%02d%s" % (n, c) for n, c in pairs)
# '04662312x52c02z01 03a'
I want to find the most occurring substring in a CSV row either by itself, or by using a list of keywords for lookup.
I've found a way to find out the top 5 most occurring words in each row of a CSV file using Python using the below responses, but, that doesn't solve my purpose. It gives me results like -
[(' Trojan.PowerShell.LNK.Gen.2', 3),
(' Suspicious ZIP!lnk', 2),
(' HEUR:Trojan-Downloader.WinLNK.Powedon.a', 2),
(' TROJ_FR.8D496570', 2),
('Trojan.PowerShell.LNK.Gen.2', 1),
(' Trojan.PowerShell.LNK.Gen.2 (B)', 1),
(' Win32.Trojan-downloader.Powedon.Lrsa', 1),
(' PowerShell.DownLoader.466', 1),
(' malware (ai score=86)', 1),
(' Probably LNKScript', 1),
(' virus.lnk.powershell.a', 1),
(' Troj/LnkPS-A', 1),
(' Trojan.LNK', 1)]
Whereas, I would want something like 'Trojan', 'Downloader', 'Powershell' ... as the top results.
The matching words can be a substring of a value (cell) in the CSV or can be a combination of two or more words. Can someone help fix this either by using a keywords list or without.
Thanks!
Let, my_values = ['A', 'B', 'C', 'A', 'Z', 'Z' ,'X' , 'A' ,'X','H','D' ,'A','S', 'A', 'Z'] is your list of words which is to sort.
Now take a list which will store information of occurrences of every words.
count_dict={}
Populate the dictionary with appropriate values :
for i in my_values:
if count_dict.get(i)==None: #If the value is not present in the dictionary then this is the first occurrence of the value
count_dict[i]=1
else:
count_dict[i] = count_dict[i]+1 #If previously found then increment it's value
Now sort the values of dict according to their occurrences :
sorted_items= sorted(count_dict.items(),key=operator.itemgetter(1),reverse=True)
Now you have your expected results!
The most occurring 3 values are:
print(sorted_items[:3])
output :
[('A', 5), ('Z', 3), ('X', 2)]
The most occurring 2 values are :
print(sorted_items[:3])
output:
[('A', 5), ('Z', 3)]
and so on.
I'm new to programming and am having some trouble with this exercise. The goal is to write a function that returns a list of matching items.
Items are defined by a tuple with a letter and a number and we consider item 1 to match item 2 if:
Both their letters are vowels (aeiou), or both are consonants
AND
The sum of their numbers is a multiple of 3
NOTE: The return list should not include duplicate matches --> (1,2) contains the same information as (2,1), the output list should only contain one of them.
Here's an example:
***input:*** [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
***output:*** [(0,4), (1,2), (3,5)]
Any help would be much appreciated!
from itertools import combinations
lst = [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
vowels = 'aeiou'
matched = [(i[0],j[0]) for (i,j) in combinations(enumerate(lst),2) if (i[1][0] in vowels) == (j[1][0] in vowels) and ((i[1][1] + j[1][1]) % 3 == 0)]
print(matched)
Sorry, I'm high enough rep to comment, but i'll edit / update once I can.
Im a little confused about the question, what is the purpose of the letters, should we be using their positon in the alphabet as their value? i.e a=0, b=1?
what are we comparing one tuple to?
Thanks
You can use itertools.combinations with enumerate to iterate all combinations and output indices. Combinations do not include permutations, so you will not see duplicates.
from itertools import combinations
lst = [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
def checker(lst):
vowels = set('aeiou')
for (idx_i, i), (idx_j, j) in combinations(enumerate(lst), 2):
if ((i[0] in vowels) == (j[0] in vowels)) and ((i[1] + j[1]) % 3 == 0):
yield idx_i, idx_j
res = list(checker(lst))
# [(0, 4), (1, 2), (3, 5)]
I'm trying to create a program without importing anything. The program lets the user input a passage, then prints how many A's there are in the message, how many B's, etc.
So it works...it's just VERY long. I'm new to coding, and I know that there is a way to simplify the code below with def but I'm not really sure how. Can anyone help?
You need no methods, but you can definately cut it short:
String can be used as an array of characters.
You can use the index method to determine what is the position of the letter in the alphabet.
You can iterate a zipped list of pairs from the alphabet and the counter list, to produce the output.
Use if letter in alphabet as a guard to ensure the letter is valid for the alphabet, instead of hard coding the alphabet. That way you can even expand your alphabet. (Note that the counter is set to the length of the alphabet).
Here is a suggestion:
message = input('what is your message? ').upper()
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
counter = [0] * len(alphabet)
for letter in message:
if letter in alphabet:
counter[alphabet.index(letter)] += 1
for letter, count in zip(alphabet, counter):
print(letter, ':', count)
One can do it with a one line instruction, where we make use of:
count method of string that returns the numbers of element contained in a string
chr function that gives a character from an int. chr(65) gives a A, chr(66) gives a B, ...
join function that concatenates strings of a list
The result looks like
message = input('what is your message? ').upper()
print('\n'.join([chr(65+i)+':'+str(message.count(chr(65+i))) for i in range(26)]))
For a very short and elegant solution use the Counter unit from the collections module:
from collections import Counter
message=raw_input("what is your message?")
message=message.upper()
c = Counter(message)
print c.most_common()
This counts every kind of letter in the message. And it can even sort the result for you quickly. Here is a sample dialog:
"what is your message?Hi there, new Pythonist!
[(' ', 3), ('E', 3), ('H', 3), ('T', 3), ('I', 2), ('N', 2), ('!', 1), (',', 1), ('O', 1), ('P', 1), ('S', 1), ('R', 1), ('W', 1), ('Y', 1)]"
Following is my coding for count letters and i need the output as
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
and my out put is
[('e', 1), ('g', 2), ('g', 2), ('l', 1), ('o', 2), ('o', 2)]
This is my code
def countLetters(word):
word=list(word)
word.sort()
trans=[]
for j in word:
row=[]
a=word.count(j)
row.append(j)
row.append(a)
trans.append(tuple(row))
return trans
can anyone explain me, how to get the expected output with my code?
Thank you
Why not just use a Counter?
Example:
from collections import Counter
c = Counter("Foobar")
print sorted(c.items())
Output:
[('F', 1), ('a', 1), ('b', 1), ('o', 2), ('r', 1)]
Another way is to use a dict, or better, a defaultdict (when running python 2.6 or lower, since Counter was added in Python 2.7)
Example:
from collections import defaultdict
def countLetters(word):
d = defaultdict(lambda: 0)
for j in word:
d[j] += 1
return sorted(d.items())
print countLetters("Foobar")
Output:
[('F', 1), ('a', 1), ('b', 1), ('o', 2), ('r', 1)]
Or use a simple list comprehension
word = "Foobar"
print sorted((letter, word.count(letter)) for letter in set(word))
>>> from collections import Counter
>>> Counter('google')
Counter({'o': 2, 'g': 2, 'e': 1, 'l': 1})
>>> from operator import itemgetter
>>> sorted(Counter('google').items(), key=itemgetter(0))
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
>>>
Actually, there is no need for key:
>>> sorted(Counter('google').items())
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
As tuples are sorted first by the first item, then by the second, etc.
def countLetters(word):
k=[]
Listing=[]
Cororo=[]
for warm in word:
if warm not in k:
k.append(warm)
for cold in range(len(k)):
word.count(k[cold])
Listing.append(word.count(k[cold]))
Cororo.append((k[cold],Listing[cold]))
return sorted(Cororo)
This is a bit of an old fashion way of doing this since you can use the counter module like the guy above me and make life easier.
You can modify your code like this (Python 2.5+):
def countLetters(word):
word=list(word)
word.sort()
trans=[]
for j in word:
row=[]
a=word.count(j)
row.append(j)
row.append(a)
trans.append(tuple(row))
ans = list(set(trans))
ans.sort()
return ans
The problem is you're not accounting for the duplicate occurrence of the letters in your j loop
I think a quick fix will be to modify the iteration as for j in set(word).
This ensures each letter is iterated once.
trans = list(set(trans))
Converting a list to a set removes duplicates (which I think is what you want to do).