I have a CSV file with the following data:
bel.lez.za;bellézza
e.la.bo.ra.re;elaboràre
a.li.an.te;alïante
u.mi.do;ùmido
the first value is the word divided in syllables and the second is for the stress.
I'd like to merge the the two info and obtain the following output:
bel.léz.za
e.la.bo.rà.re
a.lï.an.te
ù.mi.do
I computed the position of the stressed vowel and tried to substitute the same unstressed vowel in the first value, but full stops make indexing difficult. Is there a way to tell python to ignore full stops while counting? or is there an easier way to perform it? Thx
After splitting the two values for each line I computed the position of the stressed vowels:
char_list=['ò','à','ù','ì','è','é','ï']
for character in char_list:
if character in value[1]:
position_of_stressed_vowel=value[1].index(character)
I'd suggest merging/aligning the two forms in parallel instead of trying to substitute things via indexing. The idea is to iterate through the plain form and take out one character from the accented form for every character from the plain form, keeping dots as they are.
(Or perhaps, the idea is to add the dots to the accented form instead of adding the accented characters to the syllabified form.)
def merge_accents(plain, accented):
output = ""
acc_chars = iter(accented)
for char in plain:
if char == ".":
output += char
else:
output += next(acc_chars)
return output
Test:
data = [['bel.lez.za', 'bellézza'],
['e.la.bo.ra.re', 'elaboràre'],
['a.li.an.te', 'alïante'],
['u.mi.do', 'ùmido']]
# Returns
# bel.léz.za
# e.la.bo.rà.re
# a.lï.an.te
# ù.mi.do
for plain, accented in data:
print(merge_accents(plain, accented))
Is there a way to tell python to ignore full stops while counting?
Yes, by implementing it yourself using an index lookup that tells you which index in the space-delimited string an index in the word is equivalent to:
i = 0
corrected_index = []
for char in value[0]:
if char != ".":
corrected_index.append(i)
i+=1
now, you can correct the index and replace the character:
value[0][corrected_index[position_of_stressed_vowel]] = character
Make sure to use UTF-16 as encoding for your "stressed vowel" characters to have a single index.
You can loop over the two halfs of the string, keep track of the index in the first half, excluding the dots and add the character at the tracked index from the second half of the string to a buffer (modified) string. Like the code below:
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
converted_data = []
# Loop over the data.
for pair in data:
# Split the on ";"
first_half, second_half = pair.split(';')
# Create variables to keep track of the current letter and the modified string.
current_letter = 0
modified_second_half = ''
# Loop over the letter of the first half of the string.
for current_char in first_half:
# If the current_char is a dot add it to the modified string.
if current_char == '.':
modified_second_half += '.'
# If the current_char is not a dot add the current letter from the second half to the modified string,
# and update the current letter value.
else:
modified_second_half += second_half[current_letter]
current_letter += 1
converted_data.append(modified_second_half)
print(converted_data)
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
def slice_same(input, lens):
# slices the given string into the given lengths.
res = []
strt = 0
for size in lens:
res.append(input[strt : strt + size])
strt += size
return res
# split into two.
data = [x.split(';') for x in data]
# Add third column that's the length of each piece.
data = [[x, y, [len(z) for z in x.split('.')]] for x, y in data]
# Put text and lens through function.
data = ['.'.join(slice_same(y, z)) for x, y, z in data]
print(data)
Output:
['bel.léz.za',
'e.la.bo.rà.re',
'a.lï.an.te',
'ù.mi.do']
Related
I was previously working on a problem of String encryption: How to add randomly generated characters in specific locations in a string? (obfuscation to be more specific).
Now I am working on its second part that is to remove the randomly added characters and digits from the obfuscated String.
My code works for removing one random character and digit from the string (when encryption_str is set to 1) but for removing two, three .. nth .. number of characters (when encryption_str is set to 2, 3 or n), I don't understand how to modify it.
My Code:
import string, random
def decrypt():
encryption_str = 2 #Doesn't produce correct output when set to any other number except 1
data = "osqlTqlmAe23h"
content = data[::-1]
print("Modified String: ",content)
result = []
result[:0] = content
indices = []
for i in range(0, encryption_str+3): #I don't understand how to change it
indices.append(i)
for i in indices:
del result[i+1]
message = "".join(result)
print("Original String: " ,message)
decrypt()
Output for Encryption level 1 (Correct Output)
Output for Encryption level 2 (Incorrect Output)
That's easy to append chars, that's a bit more difficult to remove them, because that changes the string length and the position of the chars.
But there is an easy way : retrieve the good ones, and for that you just need to iterate with the encryption_str+1 as step (that avoid adding an if on the indice)
def decrypt(content, nb_random_chars):
content = content[::-1]
result = []
for i in range(0, len(content), nb_random_chars + 1):
result.append(content[i])
message = "".join(result)
print("Modified String: ", content)
print("Original String: ", message)
# 3 lines in 1 with :
result = [content[i] for i in range(0, len(content), nb_random_chars + 1)]
Both will give hello
decrypt("osqlTqlmAe23h", 2)
decrypt("osqFlTFqlmFAe2F3h", 3)
Why not try some modulo arithmetic? Maybe with your original string, you try something like:
''.join([x for num, x in enumerate(data) if num % encryption_str == 0])
How about a list comprehension (which is really just a slightly more compact notation for #azro's answer)?
result = content[0::(encryption_str+1)]
That is, take every encryption_str+1'd character from content starting with the first.
I have a string like this in python3:
ab_cdef_ghilm__nop_q__rs
starting from a specific character, based on the index position I want to slice a window around this character of 5 characters per side. But if the _ character is found it has to skip and to go to the next character. for example, considering in this string the character "i" I want to have a final string of 11 characters around the "i" skipping the _ characters all the times it occurs like outputting this:
defghilmnop
Consider that I have long strings and I want to decide the index position where I want to do this thing.
in this case index=10
Is there a command that crops a string of a specific size skipping a specific character?
for the moment what I'm able to do is to remove the _ from the string meanwhile counting the number of _ occurrences and use it to define the shift in the middle index position and finally I crop a window of the desired size but I want something more processive so if I could just jump every time he find a "_" wolud be perfect
situation B) index=13
I want to have 5 character on the left and 5 on the right of this index getting rid (abd not counting) of the _ characters so having this output:
ghilmnopqrs
so basically when the index corresponds to a character star to from it instead when the index correspond to a _ character we have to shift (to the right up to the next character to have in the end a string of 11 characters.
to make long story short the output is 11 characters with the index position in the middle. if the index position is a _ we have to skip this character and consider the middle character the one close by(closer).
I don't think there's specific command for this, but you could build your own.
For example:
s = 'ab_cdef_ghilm__nop_q__rs'
def get_slice(s, idx, n=5, ignored_chars='_'):
if s[idx] in ignored_chars:
# adjust idx to first valid on right side:
idx = next((i for i, ch in enumerate(s[idx:], idx) if ch not in ignored_chars), None)
if idx is None:
return ''
d = {i: ch for i, ch in enumerate(s) if ch not in ignored_chars}
if idx in d:
keys = [k for k in d.keys()]
idx = keys.index(idx)
return ''.join(d[k] for k in keys[max(0, idx-n):min(idx+n+1, len(s))])
print(get_slice(s, 10, 5, '_'))
print(get_slice(s, 13, 5, '_'))
Prints:
defghilmnop
ghilmnopqrs
In case print(get_slice(s, 1, 5, '_')):
abcdefg
EDIT: Added check for starting index equals ignored char.
you define a function split like below which will split a string such that it has given number of characters on left and right side which is not "_"
st = "ab_cdef_ghilm__nop_q__rs"
def slice(st, ind, c_count):
cp = [char!="_" for char in st]
for i in range(len(st)):
if sum(cp[ind:ind+i]) == c_count:
break
right = ind + i
for i in range(len(st)):
if sum(cp[ind-i:ind]) == c_count:
break
left = ind - i
return st[left:right+1]
slice(st, 10, 5)
I need to find a given pattern in a text file and print the matching patterns. The text file is a string of digits and the pattern can be any string of digits or placeholders represented by 'X'.
I figured the way to approach this problem would be by loading the sequence into a variable, then creating a list of testable subsequences, and then testing each subsequence. This is my first function in python so I'm confused as to how to create the list of test sequences easily and then test it.
def find(pattern): #finds a pattern in the given input file
with open('sequence.txt', 'r') as myfile:
string = myfile.read()
print('Test data is:', string)
testableStrings = []
#how to create a list of testable sequences?
for x in testableStrings:
if x == pattern:
print(x)
return
For example, searching for "X10X" in "11012102" should print "1101" and "2102".
Let pattern = "X10X", string = "11012102", n = len(pattern) - just for followed illustration:
Without using regular expressions, your algorithm may be as follows:
Construct a list of all subsequences of string with length of n:
In[2]: parts = [string[i:i+n] for i in range(len(string) - n + 1)]
In[3]: parts
Out[3]: ['1101', '1012', '0121', '1210', '2102']
Compare pattern with each element in parts:
for part in parts:
The comparison of pattern with part (both have now equal lengths) will be symbol with symbol in corresponding positions:
for ch1, ch2 in zip(pattern, part):
If ch1 is the X symbol or ch1 == ch2, the comparison of corresponding symbols will continue, else we will break it:
if ch1 == "X" or ch1 == ch2:
continue
else:
break
Finally, if all symbol with symbol comparisons were successful, i. e. all pairs of corresponding symbols were exhausted, the else branch of the for statement will be executed (yes, for statements may have an else branch for that case).
Now you may perform any actions with that matched part, e. g. print it or append it to some list:
else:
print(part)
So all in one place:
pattern = "X10X"
string = "11012102"
n = len(pattern)
parts = [string[i:i+n] for i in range(len(string) - n + 1)]
for part in parts:
for ch1, ch2 in zip(pattern, part):
if ch1 == "X" or ch1 == ch2:
continue
else:
break
else:
print(part)
The output:
1101
2102
You probably wanted to create the list of testable sequences from the individual rows of the input file. So instead of
with open('sequence.txt', 'r') as myfile:
string = myfile.read()
use
with open('sequence.txt') as myfile: # 'r' is default
testableStrings = [row.strip() for row in myfile]
The strip() method removes whitespace characters from the start and end of rows, including \n symbols at the end of lines.
Example of the sequence.txt file:
123456789
87654321
111122223333
The output of the print(testableStrings) command:
['123456789', '87654321', '111122223333']
I'm looking for help in creating a script to add periods to a string in every place but first and last, using as many periods as needed to create as many combinations as possible:
The output for the string 1234 would be:
["1234", "1.234", "12.34", "123.4", "1.2.34", "1.23.4" etc. ]
And obviously this needs to work for all lengths of string.
You should solve this type of problems yourself, these are simple algorithms to manipulate data that you should know how to come up with.
However, here is the solution (long version for more clarity):
my_str = "1234" # original string
# recursive function for constructing dots
def construct_dot(s, t):
# s - the string to put dots
# t - number of dots to put
# zero dots will return the original string in a list (stop criteria)
if t==0: return [s]
# allocation for results list
new_list = []
# iterate the next dot location, considering the remaining dots.
for p in range(1,len(s) - t + 1):
new_str = str(s[:p]) + '.' # put the dot in the location
res_str = str(s[p:]) # crop the string frot the dot to the end
sub_list = construct_dot(res_str, t-1) # make a list with t-1 dots (recursive)
# append concatenated strings
for sl in sub_list:
new_list.append(new_str + sl)
# we result with a list of the string with the dots.
return new_list
# now we will iterate the number of the dots that we want to put in the string.
# 0 dots will return the original string, and we can put maximum of len(string) -1 dots.
all_list = []
for n_dots in range(len(my_str)):
all_list.extend(construct_dot(my_str,n_dots))
# and see the results
print(all_list)
Output is:
['1234', '1.234', '12.34', '123.4', '1.2.34', '1.23.4', '12.3.4', '1.2.3.4']
A concise solution without recursion: using binary combinations (think of 0, 1, 10, 11, etc) to determine where to insert the dots.
Between each letter, put a dot when there's a 1 at this index and an empty string when there's a 0.
your_string = "1234"
def dot_combinations(string):
i = 0
combinations = []
# Iter while the binary representation length is smaller than the string size
while i.bit_length() < len(string):
current_word = []
for index, letter in enumerate(string):
current_word.append(letter)
# Append a dot if there's a 1 in this position
if (1 << index) & i:
current_word.append(".")
i+=1
combinations.append("".join(current_word))
return combinations
print dot_combinations(your_string)
Output:
['1234', '1.234', '12.34', '1.2.34', '123.4', '1.23.4', '12.3.4', '1.2.3.4']
This question already has answers here:
how to find words that made up of letter exactly facing each other? (python) [closed]
(4 answers)
Closed 9 years ago.
I have to write a function which takes one arguments text containing a block of text in the form of a str, and returns a sorted list of “symmetric” words. A symmetric word is defined as a word where for all values i, the letter i positions from the start of the word and the letter i positions from the end of the word are equi-distant from the respective ends of the alphabet. For example, bevy is a symmetric word as: b (1 position from the start of the word) is the second letter of the alphabet and y (1 position from the end of the word) is the second-last letter of the alphabet; and e (2 positions from the start of the word) is the fifth letter of the alphabet and v (2 positions from the end of the word) is the fifth-last letter of the alphabet.
For example:
>>> symmetrics("boy bread aloz bray")
['aloz','boy']
>>> symmetrics("There is a car and a book;")
['a']
All I can think about the solution is this but I can't run it since it's wrong:
def symmetrics(text):
func_char= ",.?!:'\/"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
sym = []
for word in text.lower().split():
n = range(0,len(word))
if word[n] == word[len(word)-1-n]:
sym.append(word)
return sym
The code above doesn't take into account the position of alpha1 and alpha2 as I don't know how to put it. Is there anyone can help me?
Here is a hint:
In [16]: alpha1.index('b')
Out[16]: 1
In [17]: alpha2.index('y')
Out[17]: 1
An alternative way to approach the problem is by using the str.translate() method:
import string
def is_sym(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
tr = string.maketrans(alpha1, alpha2)
n = len(word) // 2
return word[:n] == word[::-1][:n].translate(tr)
print(is_sym('aloz'))
print(is_sym('boy'))
print(is_sym('bread'))
(The building of the translation table can be easily factored out.)
The for loop could be modified as:
for word in text.lower().split():
for n in range(0,len(word)//2):
if alpha1.index(word[n]) != alpha2.index(word[len(word)-1-n]):
break
else:
sym.append(word)
return sym
According to your symmetric rule, we may verify a symmetric word with the following is_symmetric_word function:
def is_symmetric_word(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
length = len(word)
for i in range(length / 2):
if alpha1.index(word[i]) != alpha2.index(word[length - 1 - i]):
return False
return True
And then the whole function to get all unique symmetric words out of a text can be defined as:
def is_symmetrics(text):
func_char= ",.?!:'\/;"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
sym = []
for word in text.lower().split():
if is_symmetric_word(word) and not (word in sym):
sym.append(word)
return sym
The following are two test cases from you:
is_symmetrics("boy bread aloz bray") #['boy', 'aloz']
is_symmetrics("There is a car and a book;") #['a']
Code first. Discussion below the code.
import string
# get alphabet and reversed alphabet
try:
# Python 2.x
alpha1 = string.lowercase
except AttributeError:
# Python 3.x and newer
alpha1 = string.ascii_lowercase
alpha2 = alpha1[::-1] # use slicing to reverse alpha1
# make a dictionary where the key, value pairs are symmetric
# for example symd['a'] == 'z', symd['b'] == 'y', and so on
_symd = dict(zip(alpha1, alpha2))
def is_symmetric_word(word):
if not word:
return False # zero-length word is not symmetric
i1 = 0
i2 = len(word) - 1
while True:
if i1 >= i2:
return True # we have checked the whole string
# get a pair of chars
c1 = word[i1]
c2 = word[i2]
if _symd[c1] != c2:
return False # the pair wasn't symmetric
i1 += 1
i2 -= 1
# note, added a space to list of chars to filter to a space
_filter_to_space = ",.?!:'\/ "
def _filter_ch(ch):
if ch in _filter_to_space:
return ' ' # return a space
elif ch in alpha1:
return ch # it's an alphabet letter so return it
else:
# It's something we don't want. Return empty string.
return ''
def clean(text):
return ''.join(_filter_ch(ch) for ch in text.lower())
def symmetrics(text):
# filter text: keep only chars in the alphabet or spaces
for word in clean(text).split():
if is_symmetric_word(word):
# use of yield makes this a generator.
yield word
lst = list(symmetrics("The boy...is a yob."))
print(lst) # prints: ['boy', 'a', 'yob']
No need to type the alphabet twice; we can reverse the first one.
We can make a dictionary that pairs each letter with its symmetric letter. This will make it very easy to test whether any given pair of letters is a symmetric pair. The function zip() makes pairs from two sequences; they need to be the same length, but since we are using a string and a reversed copy of the string, they will be the same length.
It's best to write a simple function that does one thing, so we write a function that does nothing but check if a string is symmetric. If you give it a zero-length string it returns False, otherwise it sets i1 to the first character in the string and i2 to the last. It compares characters as long as they continue to be symmetric, and increments i1 while decrementing i2. If the two meet or pass each other, we know we have seen the whole string and it must be symmetric, in which case we return True; if it ever finds any pair of characters that are not symmetric, it returns False. We have to do the check for whether i1 and i2 have met or passed at the top of the loop, so it won't try to check if a character is its own symmetric character. (A character can't be both 'a' and 'z' at the same time, so a character is never its own symmetric character!)
Now we write a wrapper that filters out the junk, splits the string into words, and tests each word. Not only does it convert the chosen punctuation characters to spaces, but it also strips out any unexpected characters (anything not an approved punctuation char, a space, or a letter). That way we know nothing unexpected will get through to the inner function. The wrapper is "lazy"... it is a generator that yields up one word at a time, instead of building the whole list and returning that. It's easy to use list() to force the generator's results into a list. If you want, you can easily modify this function to just build a list and return it.
If you have any questions about this, just ask.
EDIT: The original version of the code didn't do the right thing with the punctuation characters; this version does. Also, as #heltonbiker suggested, why type the alphabet when Python has a copy of it you can use? So I made that change too.
EDIT: #heltonbiker's change introduced a dependency on Python version! I left it in with a suitable try:/except block to handle the problem. It appears that Python 3.x has improved the name of the lowercase ASCII alphabet to string.ascii_lowercase instead of plain string.lowercase.