Python: Pair alphabets after loop is completed - python

I tried to pair Alphabets by this
import string
a=string.uppercase
for i in range(0,30):
print a[i%26]*(i / 26+1)
This will print A-Z and then after Z it will print AA BB like string
but i need to put this string like AA AB AC AD AE until the range is defined after printing A-Z then the result will be like
print A-Z then AA AB AC ....

You can take advantage of the itertools module and use a generator to handle this pretty cleanly:
from itertools import count, product, islice
from string import ascii_uppercase
def multiletters(seq):
for n in count(1):
for s in product(seq, repeat=n):
yield ''.join(s)
gives
>>> list(islice(multiletters('ABC'), 20))
['A', 'B', 'C', 'AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC', 'AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB']
>>> list(islice(multiletters(ascii_uppercase), 30))
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD']
and you can make an object and get them one by one, if you'd prefer:
>>> m = multiletters(ascii_uppercase)
>>> next(m)
'A'
>>> next(m)
'B'
>>> next(m)
'C'
[Update: I should note though that I pass data between Python and Excel all the time -- am about to do so, actually -- and never need this function. But if you have a specific question about the best way to exchange data, it's probably better to ask a separate question than to edit this one now that there are several answers to the current question.]

I think what you are looking for is a nested for loop, like this:
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(0,26):
for j in range(0, 26):
print get_string(i) + get_string(j)
Note that I defined your indexing of string.uppercase to a function (get_string) so that its code would not be repeated.

I think what you want is something like this
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(-1,26):
for j in range(0, 26):
if i==-1:
print get_string(j)
else:
print get_string(i) + get_string(j)
The first time through the outer loop, do not print a leading character (the first 26 Excel columns) then after that the next 26 columns print a letter followed by a second letter.
Working example available on ideone.com -> http://ideone.com/M862Ra

Related

How can I reference a string (e.g. 'A') to the index of a larger list (e.g. ['A', 'B', 'C', 'D', ...])?

I have been racking my brain and scouring the internet for some hours now, please help.
Effectively I am trying to create a self-contained function (in python) for producing a caesar cipher. I have a list - 'cache' - of all letters A-Z.
def caesarcipher(text, s):
global rawmessage #imports a string input - the 'raw message' which is to be encrypted.
result = ''
cache = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
Is it possible to analyze the string input (the 'rawmessage') and attribute each letter to its subsequent index position in the list 'cache'? e.g. if the input was 'AAA' then the console would recognise it as [0,0,0] in the list 'cache'. Or if the input was 'ABC' then the console would recognise it as [0,1,2] in the list 'cache'.
Thank you to anyone who makes the effort to help me out here.
Use a list comprehension:
positions = [cache.index(letter) for letter in rawmessage if letter in cache]
You can with a list comprehension. Also you can get the letter from string.
import string
print([string.ascii_uppercase.index(c) for c in "AAA"])
# [0, 0, 0]
print([string.ascii_uppercase.index(c) for c in "ABC"])
# [0, 1, 2]
result = []
for i in list(rawmessage):
result.append(cache.index(i))

pandas expand alphanumeric characters to iterate

I have a list with alphanumeric characters like as shown below
l1 = ['G1','L1']
I would like to know whether we have something like below
for i in range(l1): #this doesn't work because range is only for numeric values
for i in range(G1:L1): #this also doesn't work
However, I want the i value at each run to change from G1to H1 to I1 to J1 to K1 to L1
Range always expects a number and cannot work with strings.
However, you can use the built-in ord() function to convert letters to numbers and then use the chr() function to convert them back from numbers to ASCII characters.
Code
a = [chr(c)+'1' for c in range(ord('G'), ord('M'))]
print(a)
Output
['G1', 'H1', 'I1', 'J1', 'K1', 'L1']
Update: Solution for double characters.
Doing it for double characters is a little more complicated, but this StackOverflow answer has a solution to that. You can simply use the from_excel() and to_excel() functions from that answer and replace them in my above code as follows.
Code
a = [to_excel(i) for i in range(from_excel('G'), from_excel('AG'))]
print(a)
Output
['G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD', 'AE', 'AF']
You can use:
from openpyxl.utils import coordinate_to_tuple, get_column_letter
def excel_range(start, end):
t1 = coordinate_to_tuple(start)
t2 = coordinate_to_tuple(end)
rows, cols = zip(t1, t2)
cells = []
for r in range(rows[0], rows[1]+1):
for c in range(cols[0], cols[1]+1):
cells.append(f'{get_column_letter(c)}{r}')
return cells
cells = excel_range('AA1', 'AC4')
Output:
>>> cells
['AA1',
'AB1',
'AC1',
'AA2',
'AB2',
'AC2',
'AA3',
'AB3',
'AC3',
'AA4',
'AB4',
'AC4']

Joining elements in list of Strings in loop with a condition

Every 2 elements should be joined in a loop till the end of the list
This is what i have been trying to do
items = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
for i in range(len(items)+1):
items[i]=items[i]+items[i+1]
i=i+2
print(items)
Expected Output: ['ab' , 'cd' , 'ef' , 'gh' , 'ij']
You can supply another argument to range to specify the increment ("step"):
items = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
res = []
for i in range(0, len(items), 2):
res.append(items[i] + items[i + 1])
print(res)
# ['ab', 'cd', 'ef', 'gh', 'ij']
Or, better yet, use a list comprehension instead:
items = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
items = [items[i] + items[i + 1] for i in range(0, len(items), 2)]
print(items)
# ['ab', 'cd', 'ef', 'gh', 'ij']
you can do it using list comprehension like this:
items = [items[i] + items[i+1] for i in range(0, len(items), 2)]
A solution with regex:
>>> import re
>>> items = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>> re.findall('.{1,2}', ''.join(items))
['ab', 'cd', 'ef', 'gh', 'ij']
Your thought process is correct, I would double check your print statement. You are printing items, but are updating items[i] to items[i] + items[i+1]. I believe you want to print the variable you are updating.
First of all, you need to know that len() returns the number of item, not the size of the array (remember that the array index starts at 0). Here, if you want, you can look at the docs for the len() function.
Next, just to inform you, you can use the method append() to insert an object into the final position of the array. Here you can find some info on arrays.
Moreover, I want to add that in python, when using range, you can take advantage of the step value that you can pass to range: range(start, stop, step). You can read more about it here.
Said so, I would do something as follow:
output=[]
for i in range(0, len(items), 2):
output.append(items[i]+items[i+1]);
print(output)

How to print each letter in a string only once

Hello everyone I have a python question.
I'm trying to print each letter in the given string only once.
How do I do this using a for loop and sort the letters from a to z?
Heres what I have;
import string
sentence_str = ("No punctuation should be attached to a word in your list,
e.g., end. Not a correct word, but end is.")
letter_str = sentence_str
letter_str = letter_str.lower()
badchar_str = string.punctuation + string.whitespace
Alist = []
for i in badchar_str:
letter_str = letter_str.replace(i,'')
letter_str = list(letter_str)
letter_str.sort()
for i in letter_str:
Alist.append(i)
print(Alist))
Answer I get:
['a']
['a', 'a']
['a', 'a', 'a']
['a', 'a', 'a', 'a']
['a', 'a', 'a', 'a', 'a']
['a', 'a', 'a', 'a', 'a', 'b']
['a', 'a', 'a', 'a', 'a', 'b', 'b']
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c']....
I need:
['a', 'b', 'c', 'd', 'e', 'g', 'h', 'i', 'l', 'n', 'o', 'p', 'r', 's', 't', 'u', 'w', 'y']
no errors...
Just check if the letter is not already in your array before appending it:
for i in letter_str:
if not(i in Alist):
Alist.append(i)
print(Alist))
or alternatively use the Set data structure that Python provides instead of an array. Sets do not allow duplicates.
aSet = set(letter_str)
Using itertools ifilter which you can say has an implicit for-loop:
In [20]: a=[i for i in itertools.ifilter(lambda x: x.isalpha(), sentence_str.lower())]
In [21]: set(a)
Out[21]:
set(['a',
'c',
'b',
'e',
'd',
'g',
'i',
'h',
'l',
'o',
'n',
'p',
's',
'r',
'u',
't',
'w',
'y'])
Malvolio correctly states that the answer should be as simple as possible. For that we use python's set type which takes care of the issue of uniqueness in the most efficient and simple way possible.
However, his answer does not deal with removing punctuation and spacing. Furthermore, all answers as well as the code in the question do that pretty inefficiently(loop through badchar_str and replace in the original string).
The best(ie, simplest and most efficient as well as idiomatic python) way to find all unique letters in the sentence is this:
import string
sentence_str = ("No punctuation should be attached to a word in your list,
e.g., end. Not a correct word, but end is.")
bad_chars = set(string.punctuation + string.whitespace)
unique_letters = set(sentence_str.lower()) - bad_chars
If you want them to be sorted, simply replace the last line with:
unique_letters = sorted(set(sentence_str.lower()) - bad_chars)
If the order in which you want to print doesn't matter you can use:
sentence_str = ("No punctuation should be attached to a word in your list,
e.g., end. Not a correct word, but end is.")
badchar_str = string.punctuation + string.whitespace
for i in badchar_str:
letter_str = letter_str.replace(i,'')
print(set(sentence_str))
Or if you want to print in sorted order you could convert it back to list and use sort() and then print.
First principles, Clarice. Simplicity.
list(set(sentence_str))
You can use set() for remove duplicate characters and sorted():
import string
sentence_str = "No punctuation should be attached to a word in your list, e.g., end. Not a correct word, but end is."
letter_str = sentence_str
letter_str = letter_str.lower()
badchar_str = string.punctuation + string.whitespace
for i in badchar_str:
letter_str = letter_str.replace(i,'')
characters = list(letter_str);
print sorted(set(characters))

Is that a tag list or something else?

I am new to NLP and NLTK, and I want to find ambiguous words, meaning words with at least n different tags. I have this method, but the output is more than confusing.
Code:
def MostAmbiguousWords(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
if wordsUniqeTags.has_key(w):
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
else:
wordsUniqeTags[w] = set([t])
# Starting to count
res = []
for w in wordsUniqeTags:
if len(wordsUniqeTags[w]) >= n:
res.append((w, wordsUniqeTags[w]))
return res
MostAmbiguousWords(brown.tagged_words(), 13)
Output:
[("what's", set(['C', 'B', 'E', 'D', 'H', 'WDT+BEZ', '-', 'N', 'T', 'W', 'V', 'Z', '+'])),
("who's", set(['C', 'B', 'E', 'WPS+BEZ', 'H', '+', '-', 'N', 'P', 'S', 'W', 'V', 'Z'])),
("that's", set(['C', 'B', 'E', 'D', 'H', '+', '-', 'N', 'DT+BEZ', 'P', 'S', 'T', 'W', 'V', 'Z'])),
('that', set(['C', 'D', 'I', 'H', '-', 'L', 'O', 'N', 'Q', 'P', 'S', 'T', 'W', 'CS']))]
Now I have no idea what B,C,Q, ect. could represent. So, my questions:
What are these?
What do they mean? (In case they are tags)
I think they are not tags, because who and whats don't have the WH tag indicating "wh question words".
I'll be happy if someone could post a link that includes a mapping of all possible tags and their meaning.
It looks like you have a typo. In this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
you should have set([t]) (not set(t)), like you do in the else case.
This explains the behavior you're seeing because t is a string and set(t) is making a set out of each character in the string. What you want is set([t]) which makes a set that has t as its element.
>>> t = 'WHQ'
>>> set(t)
set(['Q', 'H', 'W']) # bad
>>> set([t])
set(['WHQ']) # good
By the way, you can correct the problem and simplify things by just changing that line to:
wordsUniqeTags[w].add(t)
But, really, you should make use of the setdefault method on dict and list comprehension syntax to improve the method overall. So try this instead:
def most_ambiguous_words(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
wordsUniqeTags.setdefault(w, set()).add(t)
# Starting to count
return [(word,tags) for word,tags in wordsUniqeTags.iteritems() if len(tags) >= n]
You are splitting your POS tags into single characters in this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
set('AT') results in set(['A', 'T']).
How about making use of the Counter and defaultdict functionality in the collections module?
from collection import defaultdict, Counter
def most_ambiguous_words(words, n):
counts = defaultdict(Counter)
for (word,tag) in words:
counts[word][tag] += 1
return [(w, counts[w].keys()) for w in counts if len(counts[word]) > n]

Categories