pandas expand alphanumeric characters to iterate

pandas expand alphanumeric characters to iterate - python

I have a list with alphanumeric characters like as shown below
l1 = ['G1','L1']
I would like to know whether we have something like below
for i in range(l1): #this doesn't work because range is only for numeric values
for i in range(G1:L1): #this also doesn't work
However, I want the i value at each run to change from G1to H1 to I1 to J1 to K1 to L1

Range always expects a number and cannot work with strings.
However, you can use the built-in ord() function to convert letters to numbers and then use the chr() function to convert them back from numbers to ASCII characters.
Code
a = [chr(c)+'1' for c in range(ord('G'), ord('M'))]
print(a)
Output
['G1', 'H1', 'I1', 'J1', 'K1', 'L1']
Update: Solution for double characters.
Doing it for double characters is a little more complicated, but this StackOverflow answer has a solution to that. You can simply use the from_excel() and to_excel() functions from that answer and replace them in my above code as follows.
Code
a = [to_excel(i) for i in range(from_excel('G'), from_excel('AG'))]
print(a)
Output
['G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD', 'AE', 'AF']

You can use:
from openpyxl.utils import coordinate_to_tuple, get_column_letter
def excel_range(start, end):
t1 = coordinate_to_tuple(start)
t2 = coordinate_to_tuple(end)
rows, cols = zip(t1, t2)
cells = []
for r in range(rows[0], rows[1]+1):
for c in range(cols[0], cols[1]+1):
cells.append(f'{get_column_letter(c)}{r}')
return cells
cells = excel_range('AA1', 'AC4')
Output:
>>> cells
['AA1',
'AB1',
'AC1',
'AA2',
'AB2',
'AC2',
'AA3',
'AB3',
'AC3',
'AA4',
'AB4',
'AC4']

Related

How can I reference a string (e.g. 'A') to the index of a larger list (e.g. ['A', 'B', 'C', 'D', ...])?

I have been racking my brain and scouring the internet for some hours now, please help.
Effectively I am trying to create a self-contained function (in python) for producing a caesar cipher. I have a list - 'cache' - of all letters A-Z.
def caesarcipher(text, s):
global rawmessage #imports a string input - the 'raw message' which is to be encrypted.
result = ''
cache = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
Is it possible to analyze the string input (the 'rawmessage') and attribute each letter to its subsequent index position in the list 'cache'? e.g. if the input was 'AAA' then the console would recognise it as [0,0,0] in the list 'cache'. Or if the input was 'ABC' then the console would recognise it as [0,1,2] in the list 'cache'.
Thank you to anyone who makes the effort to help me out here.

Use a list comprehension:
positions = [cache.index(letter) for letter in rawmessage if letter in cache]

You can with a list comprehension. Also you can get the letter from string.
import string
print([string.ascii_uppercase.index(c) for c in "AAA"])
# [0, 0, 0]
print([string.ascii_uppercase.index(c) for c in "ABC"])
# [0, 1, 2]

result = []
for i in list(rawmessage):
result.append(cache.index(i))

Is it possible to take an unknown amount of lists, and get every possible combination of those lists in the original order? [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 1 year ago.
I am attempting a cipher script where each encoded character could be one of multiple letters,
ex: BC = a w k.
the encoding part of the script was simple, however i'm running into a problem while trying to decode sentences. because each encoded character looks like BC i have to split the entire string into 2's, then I have to check the key file to see which letters are possible from each character,
so could look like TVVQBTTV and the lists would look like:
['G', 'H', 'P', 'R', 'T', 'Y']
['G', 'H', 'P', 'R', 'T', 'Y', 'E', 'L', 'Z', '.']
['G', 'H', 'P', 'R', 'T', 'Y', 'E', 'L', 'Z', '.', 'A', 'F', 'M', 'O', 'S', 'W', 'Z']
['G', 'H', 'P', 'R', 'T', 'Y', 'E', 'L', 'Z', '.', 'A', 'F', 'M', 'O', 'S', 'W', 'Z', 'G', 'H', 'P', 'R', 'T', 'Y']
my goal is to print out every possible combination (ex: GGGG, GGGH, GGGP, ect) in the console so the person receiving the message has to look through all of the combinations to find the right one. this is an attempt to make the cipher harder to break. however, because the amount of lists grow with the amount of characters in the sentence, so 'Hello, how are you?' could look like: TVVQVQVQBTVPBWVOBTTBTVVQDDVOBW
and there are to many lists for it to put in this box without it looking thoroughly messy. so is there any way to do this?

I'm a bit confused why your 4 lists extend the previous lists, rather than being 4 separate lists mapping to the possible values of TV, VQ, BT, and TV. In other words I don't get why the possible values of the key 'VQ' include the possible values of 'TV', and so on. But as the question is written:
def get_list_of_possible_vals(key):
# check key file
return ['a','b','c','d'] # just an example; list of characters the given key
# could map to
encoded_list = ['TV','VQ','VQ','VQ','BT','VP'] # input data
decoded_list = []
list_of_lists = []
for enc in encoded_list:
decoded_list += get_list_of_possible_vals(enc)
list_of_lists.append(decoded_list)
# If you want the lists of decoded values to be separate rather than stacking,
# do decoded_list.append() instead
# Cartesian product of the lists
for element in itertools.product(*list_of_lists):
print(''.join(element))

Remove words from list containing certain characters

I have a long list of words that I'm trying to go through and if the word contains a specific character remove it. However, the solution I thought would work doesn't and doesn't remove any words
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
validwords = []
for i in l3:
for x in firstchect:
if i not in x:
validwords.append(x)
continue
else:
break
If a word from firstcheck has a character from l3 I want it removed or not added to this other list. I tried it both ways. Can anyone offer insight on what could be going wrong? I'm pretty sure I could use some list comprehension but I'm not very good at that.

The accepted answer makes use of np.sum which means importing a huge numerical library to perform a simple task that the Python kernel can easily do by itself:
validwords = [w for w in firstcheck if all(c not in w for c in l3)]

you can use a list comprehension:
import numpy as np
[w for w in firstcheck if np.sum([c in w for c in l3])==0]
It seems all the words contain at least 1 char from l3 and the output of above is an empty list.
If firstcheck is defined as below:
firstcheck = ['a', 'z', 'poach', 'omnificent']
The code should output:
['a', 'z']

If you want to avoid all loops etc, you can use re directly.
import re
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['azz', 'poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
# Create a regex string to remove.
strings_to_remove = "[{}]".format("".join(l3))
validwords = [x for x in firstcheck if re.sub(strings_to_remove, '', x) == x]
print(validwords)
Output:
['azz']

Ah, there was some mistake in code, rest was fine:
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['aza', 'ca', 'poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
validwords = []
flag=1
for x in firstcheck:
for i in l3:
if i not in x:
flag=1
else:
flag=0
break
if(flag==1):
validwords.append(x)
print(validwords)
So, here the first mistake was, the for loops, we need to iterate through words first then, through l3, to avoid the readdition of elements.
Next, firstcheck spelling was wrong in 'for x in firstcheck` due to which error was there.
Also, I added a flag, such that if flag value is 1 it will add the element in validwords.
To, check I added new elements as 'aza' and 'ca', due to which, now it shows correct o/p as 'aza' and 'ca'.
Hope this helps you.

Python: Pair alphabets after loop is completed

I tried to pair Alphabets by this
import string
a=string.uppercase
for i in range(0,30):
print a[i%26]*(i / 26+1)
This will print A-Z and then after Z it will print AA BB like string
but i need to put this string like AA AB AC AD AE until the range is defined after printing A-Z then the result will be like
print A-Z then AA AB AC ....

You can take advantage of the itertools module and use a generator to handle this pretty cleanly:
from itertools import count, product, islice
from string import ascii_uppercase
def multiletters(seq):
for n in count(1):
for s in product(seq, repeat=n):
yield ''.join(s)
gives
>>> list(islice(multiletters('ABC'), 20))
['A', 'B', 'C', 'AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC', 'AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB']
>>> list(islice(multiletters(ascii_uppercase), 30))
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD']
and you can make an object and get them one by one, if you'd prefer:
>>> m = multiletters(ascii_uppercase)
>>> next(m)
'A'
>>> next(m)
'B'
>>> next(m)
'C'
[Update: I should note though that I pass data between Python and Excel all the time -- am about to do so, actually -- and never need this function. But if you have a specific question about the best way to exchange data, it's probably better to ask a separate question than to edit this one now that there are several answers to the current question.]

I think what you are looking for is a nested for loop, like this:
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(0,26):
for j in range(0, 26):
print get_string(i) + get_string(j)
Note that I defined your indexing of string.uppercase to a function (get_string) so that its code would not be repeated.

I think what you want is something like this
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(-1,26):
for j in range(0, 26):
if i==-1:
print get_string(j)
else:
print get_string(i) + get_string(j)
The first time through the outer loop, do not print a leading character (the first 26 Excel columns) then after that the next 26 columns print a letter followed by a second letter.
Working example available on ideone.com -> http://ideone.com/M862Ra

Is that a tag list or something else?

I am new to NLP and NLTK, and I want to find ambiguous words, meaning words with at least n different tags. I have this method, but the output is more than confusing.
Code:
def MostAmbiguousWords(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
if wordsUniqeTags.has_key(w):
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
else:
wordsUniqeTags[w] = set([t])
# Starting to count
res = []
for w in wordsUniqeTags:
if len(wordsUniqeTags[w]) >= n:
res.append((w, wordsUniqeTags[w]))
return res
MostAmbiguousWords(brown.tagged_words(), 13)
Output:
[("what's", set(['C', 'B', 'E', 'D', 'H', 'WDT+BEZ', '-', 'N', 'T', 'W', 'V', 'Z', '+'])),
("who's", set(['C', 'B', 'E', 'WPS+BEZ', 'H', '+', '-', 'N', 'P', 'S', 'W', 'V', 'Z'])),
("that's", set(['C', 'B', 'E', 'D', 'H', '+', '-', 'N', 'DT+BEZ', 'P', 'S', 'T', 'W', 'V', 'Z'])),
('that', set(['C', 'D', 'I', 'H', '-', 'L', 'O', 'N', 'Q', 'P', 'S', 'T', 'W', 'CS']))]
Now I have no idea what B,C,Q, ect. could represent. So, my questions:
What are these?
What do they mean? (In case they are tags)
I think they are not tags, because who and whats don't have the WH tag indicating "wh question words".
I'll be happy if someone could post a link that includes a mapping of all possible tags and their meaning.

It looks like you have a typo. In this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
you should have set([t]) (not set(t)), like you do in the else case.
This explains the behavior you're seeing because t is a string and set(t) is making a set out of each character in the string. What you want is set([t]) which makes a set that has t as its element.
>>> t = 'WHQ'
>>> set(t)
set(['Q', 'H', 'W']) # bad
>>> set([t])
set(['WHQ']) # good
By the way, you can correct the problem and simplify things by just changing that line to:
wordsUniqeTags[w].add(t)
But, really, you should make use of the setdefault method on dict and list comprehension syntax to improve the method overall. So try this instead:
def most_ambiguous_words(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
wordsUniqeTags.setdefault(w, set()).add(t)
# Starting to count
return [(word,tags) for word,tags in wordsUniqeTags.iteritems() if len(tags) >= n]

You are splitting your POS tags into single characters in this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
set('AT') results in set(['A', 'T']).

How about making use of the Counter and defaultdict functionality in the collections module?
from collection import defaultdict, Counter
def most_ambiguous_words(words, n):
counts = defaultdict(Counter)
for (word,tag) in words:
counts[word][tag] += 1
return [(w, counts[w].keys()) for w in counts if len(counts[word]) > n]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas expand alphanumeric characters to iterate - python

Related

How can I reference a string (e.g. 'A') to the index of a larger list (e.g. ['A', 'B', 'C', 'D', ...])?

Is it possible to take an unknown amount of lists, and get every possible combination of those lists in the original order? [duplicate]

Remove words from list containing certain characters

Python: Pair alphabets after loop is completed

Is that a tag list or something else?

Categories

Resources