Getting ord() to read from a file - python

What my code does is it counts the amount of times a letter has appeared and counts it to the respected letter. So if A appears two times, it will show 2:A. My problem is that i want it to read from a file and when ord() tries to, it cant. I dont know how to work around this.
t=open('lettersTEst.txt','r')
tList=[0]*26
aL=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
idx=0
for char in t:
ch=ord(char)
if ch >=65 and ch <= 90:
pos=int(ch)-65
tList[pos]+=1
for ele in tList:
print(idx, ": ", tList[ch])
idx+=1

When you iterate over a file you get lines. If you want characters you need to iterate over each line as well.
for line in t:
for char in line:
ch = ord(char)
...

You need to loop over the indivdual characters of the each line of the file, and you could use a Counter instead of an array.
And if you want uppercase characters only, then add if char.isupper() before you add to the Counter.
Example
>>> from collections import Counter
>>> c = Counter()
>>> with open('lettersTEst.txt') as f:
... for line in f:
... for char in line:
... c[char] += 1
...
>>> for k,v in c.items():
... print('{}:{}'.format(k,v))
...
a:2
:4
e:1
g:1
i:3
h:1
m:1
l:1
n:1
p:1
s:4
r:1
t:2

While I prefer #JohnKugelman's answer over my own, I'd like to show two alternate methods of iterating over every character of a file in a single for loop
The first is using the second form of iter using a callable (read one character) and a sentinel (keep calling the function until it returns this value) In this case I'd use functools.partial to make the function that reads one byte:
import functools
read_a_byte = functools.partial(t.read, 1)
for char in iter(read_a_byte,''):
ch = ord(char)
...
The second is frequently used to flatten two dimensional lists, itertools.chain.from_iterable takes something that is iterated over (the file) and chains each generated value (each line) together in iteration.
import itertools
char_iterator = itertools.chain.from_iterable(t)
for char in char_iterator:
ch = ord(char)
...
Then you could pass either to collections.Counter to construct a basic counter but it wouldn't follow the same logic you have applied with ord:
read_a_byte = functools.partial(t.read, 1)
c = collections.Counter(iter(read_a_byte,''))
>>> pprint.pprint(dict(c))
{'a': 8,
'b': 2,
'c': 9,
'd': 4,
'e': 11,
...}

Related

Loop over string based on keys in dictionary

Instead of looping over each separate character of a string, I want to loop over parts of a string (multiple characters). Those parts are defined by the keys of a dictionary.
Example:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
output = ""
What I've tried (looping over each character separately, indeed):
for i in word:
letter = my_dict[i]
output += letter
word = word.lstrip(letter)
My output:
"KeyError: '1'"
But I want to get key "1000" and its value "i", and then continue with key "0010" and get its value "n", etc...
Expected output:
# Expected output:
output = "internet"
Assuming it's a prefix code (otherwise you'd need to define how to deal with ambiguities), accumulate the bits until you have a match, then output the letter and clear the bits:
output = ""
bits = ""
for bit in word:
bits += bit
if bits in my_dict:
letter = my_dict[bits]
output += letter
bits = ""
Try it online!
Slight variation of it the lookup, reminded by Jnevill's answer:
if letter := my_dict.get(bits):
output += letter
You could use a regular expression to substitutes the patterns with the corresponding letters. re.sub allows use of a function for the replacement which could be access to the dictionary to get the letters. The search pattern would need to have the longer values first so that they are "consumed" in priority over shorter patterns that could start with the same bits:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
import re
pattern = "|".join(sorted(my_dict.keys(),key=len,reverse=True))
output = re.sub(pattern,lambda m:my_dict[m.group(0)],word)
print(output) # internet
[EDIT]
If there are no conflicts between short and long bit patterns, the sort is not needed (as Kelly pointed out), the solution could be a single line:
output = re.sub('|'.join(my_dict),lambda m:my_dict[m[0]],word)
Issue with your code:
for i in word: # here, i is a single character
# so you can't get corresponding value since it's multiple character keys
letter = my_dict[i]
output += letter # this would work fine
word = word.lstrip(letter)
You can do a while loop on word, and remove the part you found in the dict each time. When words is empty, you will stop looping and the program ends.
You can iterate over each key in the dict and test if it match the beginning of the word. If it does, you have the letter you are looking for. Do what you want instead of the print, and repeat.
translate_table = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
message = "1000001001100001100000100000110"
while message:
for code, letter in translate_table.items():
if message.startswith(code):
# replace this with whatever you want to do with the letter
print(letter, end="")
# "Cut" the word to keep the remaining characters
message = message[len(code):]
break # a letter was found, move to next while iteration
While iterating my_dict (as DorianTurba suggests) feels like a more elegant solution, your gut was suggesting that you should iterate word. To do this you can use a while loop and then manage the length of characters you jump in each iteration depending on the size of the my_dict key that matches the first 3, 4, or 5 characters in word.
Consider:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
i=0
while len(word) > i:
for size in [3,4,5]:
if my_dict.get(word[i:i+size]):
print(my_dict[word[i:i+size]])
i += size
break

Cannot find glitch in program using recursion for multible nested for-loops

alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z']
endlist = []
def loopfunc(n, lis):
if n ==0:
endlist.append(lis[0]+lis[1]+lis[2]+lis[3]+lis[4])
for i in alphabet:
if n >0:
lis.append(i)
loopfunc(n-1, lis )
loopfunc(5, [])
This program is supposed to make endlist be:
endlist = [aaaaa, aaaab, aaaac, ... zzzzy, zzzzz]
But it makes it:
endlist = [aaaaa, aaaaa, aaaaa, ... , aaaaa]
The lenght is right, but it won't make different words. Can anyone help me see why?
The only thing you ever add to endlist is the first 5 elements of lis, and since you have a single lis that is shared among all the recursive calls (note that you never create a new list in this code other than the initial values for endlist and lis, so every append to lis is happening to the same list), those first 5 elements are always the a values that you appended in your first 5 recursive calls. The rest of the alphabet goes onto the end of lis and is never reached by any of your other code.
Since you want string in the end, it's a little easier just to use strings for collecting your items. This avoids the possibility of shared mutable references which is cause your issues. With that the recursion becomes pretty concise:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def loopfunc(n, lis=""):
if n < 1:
return [lis]
res = []
for a in alphabet:
res.extend(loopfunc(n-1, lis + a))
return res
l = loopfunc(5)
print(l[0], l[1], l[-1], l[-2])
# aaaaa aaaab zzzzz zzzzy
Note that with n=5 you'll have almost 12 million combinations. If you plan on having larger n values, it may be worth rewriting this as a generator.

Split a string in Python having parenthesis (multiple splitters)

I have a string, for example:
"ab(abcds)kadf(sd)k(afsd)(lbne)"
I want to split it to a list such that the list is stored like this:
a
b
abcds
k
a
d
f
sd
k
afsd
lbne
I need to get the elements outside the parenthesis in separate rows and the ones inside it in separate ones.
I am not able to think of any solution to this problem.
You can use iter to make an iterator and use itertools.takewhile to extract the strings between the parens:
it = iter(s)
from itertools import takewhile
print([ch if ch != "(" else "".join(takewhile(lambda x: x!= ")",it)) for ch in it])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
If ch is not equal to ( we just take the char else if ch is a ( we use takewhile which will keep taking chars until we hit a ) .
Or using re.findall get all strings starting and ending in () with \((.+?))` and all other characters with :
print([''.join(tup) for tup in re.findall(r'\((.+?)\)|(\w)', s)])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
You just need to use the magic of 're.split' and some logic.
import re
string = "ab(abcds)kadf(sd)k(afsd)(lbne)"
temp = []
x = re.split(r'[(]',string)
#x = ['ab', 'abcds)kadf', 'sd)k', 'afsd)', 'lbne)']
for i in x:
if ')' not in i:
temp.extend(list(i))
else:
t = re.split(r'[)]',i)
temp.append(t[0])
temp.extend(list(t[1]))
print temp
#temp = ['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
Have a look at difference in append and extend here.
I hope this helps.
You have two options. The really easy one is to just iterate over the string. For example:
in_parens=False
buffer=''
for char in my_string:
if char =='(':
in_parens=True
elif char==')':
in_parens = False
my_list.append(buffer)
buffer=''
elif in_parens:
buffer+=char
else:
my_list.append(char)
The other option is regex.
I would suggest regex. It is worth practicing.
Try: Python re. If you are new to re it may take a bit of time but you can do all kind of string manipulations once you get it.
import re
search_string = 'ab(abcds)kadf(sd)k(afsd)(lbne)'
re_pattern = re.compile('(\w)|\((\w*)\)') # Match single character or characters in parenthesis
print [x if x else y for x,y in re_pattern.findall(search_string)]

Make a script in python that lists adjacent words through Unix?

How can I write a script in python through nested dictionaries that takes a txt file written as,
white,black,green,purple,lavendar:1
red,black,white,silver:3
black,white,magenta,scarlet:4
and make it print for each entry before the : character, all neighbors it showed up next to
white: black silver magenta
black: white green red
green: black purple
and so on
Edit: Well, I didn't post what I have because it is rather unsubstantial...I'll update it if I figure out anything else... I just have been stuck for a while -
all I have figured out how to do is post each word/letter on a separate line with:
from sys import argv
script,filename=argv
txt=open(filename)
for line in txt:
line=line[0:line.index(';')]
for word in line.split(","):
print word
I guess what I want is to have some kind of for loop that runs through each word, if the word is not in an original dictionary, I'll add it to it, then I'll search through for words that appear next to it in the file.
Input
a,c,f,g,hi,lw:1
f,g,j,ew,f,h,a,w:3
fd,s,f,g,s:4
Code
neighbours = {}
for line in file('4-input.txt'):
line = line.strip()
if not line:
continue # skip empty input lines
line = line[:line.index(':')] # take everything left of ':'
previous_token = ''
for token in line.split(','):
if previous_token:
neighbours.setdefault(previous_token, []).append(token)
neighbours.setdefault(token, []).append(previous_token)
previous_token = token
import pprint
pprint.pprint(neighbours)
Output
{'a': ['c', 'h', 'w'],
'c': ['a', 'f'],
'ew': ['j', 'f'],
'f': ['c', 'g', 'g', 'ew', 'h', 's', 'g'],
'fd': ['s'],
'g': ['f', 'hi', 'f', 'j', 'f', 's'],
'h': ['f', 'a'],
'hi': ['g', 'lw'],
'j': ['g', 'ew'],
'lw': ['hi'],
's': ['fd', 'f', 'g'],
'w': ['a']}
Tidying up the prettyprinted dictionary is left as an exercise for the reader. (Because dictionaries are inherently not sorted into any order, and removing the duplicates without changing the ordering of the lists is also annoying).
Easy solution:
for word, neighbour_list in neighbours.items():
print word, ':', ', '.join(set(neighbour_list))
But that does change the ordering.
Here you go:
from collections import defaultdict
char_map = defaultdict(set)
with open('input', 'r') as input_file:
for line in input_file:
a_list, _ = line.split(':') # Discard the stuff after the :
chars = a_list.split(',') # Get the elements before : as a list
prev_char = ""
for char, next_char in zip(chars, chars[1:]): # For every character add the
# next and previous chars to the
# dictionary
char_map[char].add(next_char)
if prev_char:
char_map[char].add(prev_char)
prev_char = char
print char_map
def parse (input_file):
char_neighbours = {}
File = open(input_file,'rb')
for line in File:
line = line.strip().split(':')[0]
if line != "":
csv_list=line.split(',')
for i in xrange(0,len(csv_list)-1):
value = char_neighbours.get(csv_list[i]) or False
if value is False:
char_neighbours[csv_list[i]] = []
if(i<len(csv_list)):
if str(csv_list[i+1]) not in char_neighbours[str(csv_list[i])]:
char_neighbours[str(csv_list[i])].append(str(csv_list[i+1]))
if(i>0):
if str(csv_list[i-1]) not in char_neighbours[str(csv_list[i])]:
char_neighbours[str(csv_list[i])].append(str(csv_list[i-1]))
return char_neighbours
if __name__ == "__main__":
dictionary=parse('test.txt')
print dictionary
the parse method returns a dictionary of strings with a list of neighbours as their values

What is the best way to generate all possible three letter strings?

I am generating all possible three letters keywords e.g. aaa, aab, aac.... zzy, zzz below is my code:
alphabets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
keywords = []
for alpha1 in alphabets:
for alpha2 in alphabets:
for alpha3 in alphabets:
keywords.append(alpha1+alpha2+alpha3)
Can this functionality be achieved in a more sleek and efficient way?
keywords = itertools.product(alphabets, repeat = 3)
See the documentation for itertools.product. If you need a list of strings, just use
keywords = [''.join(i) for i in itertools.product(alphabets, repeat = 3)]
alphabets also doesn't need to be a list, it can just be a string, for example:
from itertools import product
from string import ascii_lowercase
keywords = [''.join(i) for i in product(ascii_lowercase, repeat = 3)]
will work if you just want the lowercase ascii letters.
You could also use map instead of the list comprehension (this is one of the cases where map is still faster than the LC)
>>> from itertools import product
>>> from string import ascii_lowercase
>>> keywords = map(''.join, product(ascii_lowercase, repeat=3))
This variation of the list comprehension is also faster than using ''.join
>>> keywords = [a+b+c for a,b,c in product(ascii_lowercase, repeat=3)]
from itertools import combinations_with_replacement
alphabets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for (a,b,c) in combinations_with_replacement(alphabets, 3):
print a+b+c
You can also do this without any external modules by doing simple calculation.
The PermutationIterator is what you are searching for.
def permutation_atindex(_int, _set, length):
"""
Return the permutation at index '_int' for itemgetter '_set'
with length 'length'.
"""
items = []
strLength = len(_set)
index = _int % strLength
items.append(_set[index])
for n in xrange(1,length, 1):
_int //= strLength
index = _int % strLength
items.append(_set[index])
return items
class PermutationIterator:
"""
A class that can iterate over possible permuations
of the given 'iterable' and 'length' argument.
"""
def __init__(self, iterable, length):
self.length = length
self.current = 0
self.max = len(iterable) ** length
self.iterable = iterable
def __iter__(self):
return self
def __next__(self):
if self.current >= self.max:
raise StopIteration
try:
return permutation_atindex(self.current, self.iterable, self.length)
finally:
self.current += 1
Give it an iterable object and an integer as the output-length.
from string import ascii_lowercase
for e in PermutationIterator(ascii_lowercase, 3):
print "".join(e)
This will start from 'aaa' and end with 'zzz'.
chars = range(ord('a'), ord('z')+1);
print [chr(a) + chr(b) +chr(c) for a in chars for b in chars for c in chars]
We could solve this without the itertools by utilizing two function definitions:
def combos(alphas, k):
l = len(alphas)
kRecur(alphas, "", l, k)
def KRecur(alphas, prfx, l, k):
if k==0:
print(prfx)
else:
for i in range(l):
newPrfx = prfx + alphas[i]
KRecur(alphas, newPrfx, l, k-1)
It's done using two functions to avoid resetting the length of the alphas, and the second function self-iterates itself until it reaches a k of 0 to return the k-mer for that i loop.
Adopted from a solution by Abhinav Ramana on Geeks4Geeks
Well, i came up with that solution while thinking about how to cover that topic:
import random
s = "aei"
b = []
lenght=len(s)
for _ in range(10):
for _ in range(length):
password = ("".join(random.sample(s,length)))
if password not in b:
b.append("".join(password))
print(b)
print(len(b))
Please let me describe what is going on inside:
Importing Random,
creating a string with letters that we want to use
creating an empty list that we will use to put our combinations in
and now we are using range (I put 10 but for 3 digits it can be less)
next using random.sample with a list and list length we are creating letter combinations and joining it.
in next steps we are checking if in our b list we have that combination - if so, it is not added to the b list. If current combination is not on the list, we are adding it to it. (we are comparing final joined combination).
the last step is to print list b with all combinations and print number of possible combinations.
Maybe it is not clear and most efficient code but i think it works...
print([a+b+c for a in alphabets for b in alphabets for c in alphabets if a !=b and b!=c and c!= a])
This removes the repetition of characters in one string

Categories