Getting the index of the word 'print' in a multiline string - python

I am trying to find the index of all the word: 'print' in a multi line text. But there are some problems, those are:
The code returns the index same of word 'print' two time if there are two prints in a line.
It is not able to find the index of the second 'print' in the same line, but prints the index of the first 'print' two times.
My code is:
text = '''print is print as
it is the function an
print is print and not print
'''
text_list = []
for line in text.splitlines():
#'line' represents each line in the multiline string
text_list.append([])
for letter in line:
#Append the letter of each line in a list inside the the text_list
text_list[len(text_list)-1].append(letter)
for line in text_list:
for letter in line:
#check if the letter is after 'p' is 'r' and after that 'i' and then 'n' and at last 't'
if letter == "p":
num = 1
if text_list[text_list.index(line)][line.index(letter)+num] == 'r':
num += 1
if text_list[text_list.index(line)][line.index(letter)+num] == 'i':
num += 1
if text_list[text_list.index(line)][line.index(letter)+num] == 'n':
num += 1
if text_list[text_list.index(line)][line.index(letter)+num] == 't':
num += 1
print(f'index (start,end) = {text_list.index(line)}.{line.index(letter)}, {text_list.index(line)}.{line.index(letter)+num}')
when I run it prints:
index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line
index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the third print
you can see that in the result, the index are repeated. This is the text_list:
>>> text_list
[['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 's'],
['i', 't', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', ' ', 'a', 'n'],
['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 'n', 'd', ' ', 'n', 'o', 't', ' ', 'p', 'r', 'i', 'n', 't']]
>>>
each list inside the text_list is a line in the text. There are three lines, so there are three lists inside the text_list. How do I get the index of the second 'print' in the first line and the index of second and third 'print' in the third line? You can see that it returns only the index of first 'print' in the first and third line.

import re
text = '''print is print as
it is the function an
print is print and not print
'''
for line_number, line in enumerate(text.split('\n')):
occurrences = [m.start() for m in re.finditer('print', line)]
if occurrences:
for occurrence in occurrences:
print('Found `print` at character %d on line %d' % (occurrence, line_number + 1))
->
Found `print` at character 0 on line 1
Found `print` at character 9 on line 1
Found `print` at character 0 on line 3
Found `print` at character 9 on line 3
Found `print` at character 23 on line 3

strings already have an index method to find substring, and you can give extra arguments to find the next copy of the next copy of a given subtring
>>> text = '''print is print as
it is the function an
print is print and not print
'''
>>> text.index("print")
0
>>> text.index("print",1)
9
>>> text.index("print",10)
40
>>> text.index("print",41)
49
>>> text.index("print",50)
63
>>> text.index("print",64)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
text.index("print",64)
ValueError: substring not found
>>>

You can use regular expressions:
import re
text = '''print is print as
it is the function an
print is print and not print
'''
for i in re.finditer("print", text):
print(i.start())
# OR AS A LIST
[i.start() for i in re.finditer("print", text)]

You were on the right track initially. You split your text into lines. The next step is to split each line into words, not letters, using the split() method. You can then easily get the index of each 'print' string in each line.
The following code prints the desired indexes as list of lists, with each inner list corresponding to a separate line:
text = '''print is print as
it is the function an
print is print and not print
'''
index_list = []
for line in text.splitlines():
index_list.append([])
for idx, word in enumerate(line.split()):
if word == 'print':
index_list[-1].append(idx)
print(index_list)
#[[0, 2], [], [0, 2, 5]]

Related

How do I replace a string at a index in python? [duplicate]

This question already has answers here:
Changing one character in a string
(15 answers)
Closed 1 year ago.
So I already know how to remove a index like this:
i = "hello!"
i= i[:0] + i[1:]
print(i)
'ello!'
But how do I replace it?
So maybe I wanted to now put a H where the old h was but if I do this:
i[0] ="H"
I get this error:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in
i[0] ="H"
TypeError: 'str' object does not support item assignment
How do I fix this?
Strings are immutable in Python, so you can't assign like i[0] = 'H'. What you can do is convert the string to list, which is mutable, then you can assign new values at a certain index.
i = "hello!"
i_list = list(i)
i_list[0] = 'H'
i_new = ''.join(i_list)
print(i_new)
Hello!
Without creating a list you could also do:
i = "hello!"
i = "H" + i[1:]
More general:
def change_letter(string, letter, index): # note string is actually a bad name for a variable
return string[:index] + letter + string[index+1:]
s = "hello!"
s_new = change_letter(s, "H", 0)
print(s_new)
# should print "Hello!"
Also note there is a built in function .capitalize()
This is a duplicate of this post
As said there you have to make a list out of your string and change the char by selecting an item from that list and reassigning a new value and then in a loop rebuilding the string.
>>> s = list("Hello zorld")
>>> s
['H', 'e', 'l', 'l', 'o', ' ', 'z', 'o', 'r', 'l', 'd']
>>> s[6] = 'W'
>>> s
['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']
>>> "".join(s)
'Hello World'
i = "hello!"
print(i) ## will print hello!
i = "H" + i[1:]
print(i) ## will print Hello!

How do I count a specific character that is surrounded by random letters

I'm trying to count punctuations that are: apostrophe (') and hyphen (-) using dictionaries. I want to see if I can pull this off using list/dictionary/for loops and boolean expressions. These punctuations MUST ONLY BE COUNTED if they are surrounded by any other letters! E.g. jack-in-a-box (that is 3 hyphens) and shouldn't (1 apostrophe). These letters can be anything from a to z. Also, since this is part of an assignment, no modules/libraries can be used. I'm out of ideas and don't know what to do.
Any help would be greatly appreciated.
This is what I tried: but I get an KeyError: 0
def countpunc2():
filename = input("Name of file? ")
text = open(filename, "r").read()
text = text.lower() #make all the words lowercase (for our convenience)
for ch in '!"#$%&()*+./:<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
for ch in '--':
text = text.replace(ch, ' ')
words = text.split('\n') #splitting the text for words
wordlist = str(words)
count = {} #create dictionary; the keys/values are added on
punctuations = ",;'-"
letters = "abcdefghijklmnopqrstuvwxyz"
for i, char in enumerate(wordlist):
if i < 1:
continue
if i > len(wordlist) - 2:
continue
if char in punctuations:
if char not in count:
count[char] = 0
if count[i-1] in letters and count[i+1] in letters:
count[char] += 1
print(count)
UPDATE:
I changed the code to:
def countpunc2():
filename = input("Name of file? ")
text = open(filename, "r").read()
text = text.lower() #make all the words lowercase (for our convenience)
for ch in '!"#$%&()*+./:<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
for ch in '--':
text = text.replace(ch, ' ')
words = text.split('\n') #splitting the text for words
wordlist = str(words)
count = {} #create dictionary; the keys/values are added on
punctuations = ",;'-"
letters = "abcdefghijklmnopqrstuvwxyz"
for i, char in enumerate(wordlist):
if i < 1:
continue
if i > len(wordlist) - 2:
continue
if char in punctuations:
if char not in count:
count[char] = 0
if wordlist[i-1] in letters and wordlist[i+1] in letters:
count[char] += 1
print(count)
While it is giving me an output it is not correct.
Sample file: https://www.dropbox.com/s/kqwvudflxnmldqr/sample1.txt?dl=0
The expected results must be: {',' : 27, '-' : 10, ';' : 5, "'" : 1}
I'd probably keep it simpler than that.
#!/usr/bin/env python3
sample = "I'd rather take a day off, it's hard work sitting down and writing a code. It's amazin' how some people find this so easy. Bunch of know-it-alls."
punc = "!\"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~"
letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
d = {}
for i, char in enumerate(sample):
if i < 1:
continue
if i > len(sample) - 2:
continue
if char in punc:
if char not in d:
d[char] = 0
if sample[i - 1] in letters and sample[i + 1] in letters:
d[char] += 1
print(d)
Output:
{"'": 3, ',': 0, '.': 0, '-': 2}
Dunno where you're getting the ";" from. Also your comma has a space next to it.. so it doesn't count here.. if that does count add a space to the letters variable.
Explanation of what's happening:
We initiate a dict and read in sample text as sample and iterate it character by character, using enumerate to play with the indexes. If it is too close to the end or start to qualify, we skip it.
I check the character before and after the one we're at using the i variable from enumerate. and add to it's count if it qualifies.
NOTE: despite the shebang, this code works in python2
You could map the characters of your input string into 3 categories: alphabetic(a), punctuation(p) and spaces(s). Then group them in triples (sequences of 3 characters). From those, isolate the a-p-a triples and count the number of distinct punctuation characters.
for example:
string="""jack-in-a-box (that is 3 hyphens) and shouldn't (1 apostrophe)."""
categ = [ "pa"[c.isalpha()] if c != " " else "s" for c in string ]
triples = [ triple for triple in zip(categ,categ[1:],categ[2:]) ]
pChars = [ p for p,triple in zip(s[1:],triples) if triple==("a","p","a") ]
result = { p:pChars.count(p) for p in set(pChars) }
print(result) # {"'": 1, '-': 3}
If you're not allowed to use isAlpha() or zip(), you can code the equivalent using the in operator and for loops.
Here is an example that does it in a very spelled out way:
end_cap_characters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
special_characters = [";", ":", "'", "-", ","]
def count_special_characters(in_string):
result = {}
for i in range(1, len(in_string) - 1):
if in_string[i - 1] in end_cap_characters:
if in_string[i + 1] in end_cap_characters:
if in_string[i] in special_characters:
if in_string[i] not in result:
result[in_string[i]] = 1
else:
result[in_string[i]] +=1
return result
print(count_special_characters("jack-in-the-box"))
print(count_special_characters("shouldn't"))
print(count_special_characters("jack-in-the-box, shouldn't and a comma that works,is that one"))
Output:
{'-': 3}
{"'": 1}
{'-': 3, "'": 1, ',': 1}
Obviously this can be condensed, but I will leave that as an exercise for you ;).
Update
Based on your edited question and posted code, you need to update the line:
if count[i-1] in letters and count[i+1] in letters:
to:
if wordlist[i-1] in letters and wordlist[i+1] in letters:

Using .index() function in nested lists

I am trying to make a program that finds a certain value in a nested list, so I wrote this code:
list = [['S', 'T', 'U', 'T'], ['O', 'P', 'Q', 'R']]
However, when I inputted
list.index('O')
It gave me an error message saying
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
list.index('O')
ValueError: 'O' is not in list
Any ideas?
Well it is really simple, 'O' is not in the list, it only contains the other lists. Here is an example:
list_you_have = [['S', 'T', 'U', 'T'], ['O', 'P', 'Q', 'R']]
print list_you_have.index(['O','P','Q','R']) #outputs 1
Now if you do it like:
print list_you_have[1].index('O') # it outputs 0 because you're pointing to
#list which acctualy contains that 'O' char.
Now a function for nested char search would be
def nested_find(list_to_search,char):
for i, o in enumerate(list_to_search):
if char in o:
print "Char %s found at list %s at index %s" % (char, i, o.index(char))
Or maybe an even simpler solution as #zondo commented would be:
def nested_find(list_to_search,char):
newlist = sum(list_to_search, [])
if char in newlist:
print "Char %s is at position %s" % (char, newlist.index(char))
You can solve your problem in one-line:
print item in reduce(lambda x, y: x + y, nestedlists)

How to get all substrings in a list of characters (python)

I want to iterate over a list of characters
temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
so that I can obtain two strings, "hello" and "world"
My current way to do this is:
#temp is the name of the list
#temp2 is the starting index of the first alphabetical character found
for j in range(len(temp)):
if temp[j].isalpha() and temp[j-1] != '#':
temp2 = j
while (temp[temp2].isalpha() and temp2 < len(temp)-1:
temp2 += 1
print(temp[j:temp2+1])
j = temp2
The issue is that this prints out
['h', 'e', 'l', 'l', 'o']
['e', 'l', 'l', 'o']
['l', 'l', 'o']
['l', 'o']
['o']
etc. How can I print out only the full valid string?
Edit: I should have been more specific about what constitutes a "valid" string. A string is valid as long as all characters within it are either alphabetical or numerical. I didn't include the "isnumerical()" method within my check conditions because it isn't particularly relevant to the question.
If you want only hello and world and your words are always # seperated, you can easily do it by using join and split
>>> temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
>>> "".join(temp).split('#')
['hello', 'world']
Further more if you need to print the full valid string you need to
>>> t = "".join(temp).split('#')
>>> print(' '.join(t))
hello world
You can do it like this:
''.join(temp).split('#')
List has the method index which returns position of an element. You can use slicing to join the characters.
In [10]: temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
In [11]: pos = temp.index('#')
In [14]: ''.join(temp[:pos])
Out[14]: 'hello'
In [17]: ''.join(temp[pos+1:])
Out[17]: 'world'
An alternate, itertools-based solution:
>>> temp = ['h', 'e', 'l', 'l', 'o', '#', 'w', 'o', 'r', 'l', 'd']
>>> import itertools
>>> ["".join(str)
for isstr, str in itertools.groupby(temp, lambda c: c != '#')
if isstr]
['hello', 'world']
itertools.groupby is used to ... well ... group consecutive items depending if they are of not equal to #. The comprehension list will discard the sub-lists containing only # and join the non-# sub-lists.
The only advantage is that way, you don't have to build the full-string just to split it afterward. Probably only relevant if the string in really long.
If you just want alphas just use isalpha() replacing the # and any other non letters with a space and then split of you want a list of words:
print("".join(x if x.isalpha() else " " for x in temp).split())
If you want both words in a single string replace the # with a space and join using the conditional expression :
print("".join(x if x.isalpha() else " " for x in temp))
hello world
To do it using a loop like you own code just iterate over items and add to the output string is isalpha else add a space to the output:
out = ""
for s in temp:
if s.isalpha():
out += s
else:
out += " "
Using a loop to get a list of words:
words = []
out = ""
for s in temp:
if s.isalpha():
out += s
else:
words.append(out)
out = ""

Iteration over table seems to not be working in python

I have this code:
table = [
['a', 'b', 'c', 'd', 'e'],
['f', 'g', 'h', 'i', 'k'],
['l', 'm', 'n', 'o', 'p'],
['q', 'r', 's', 't', 'u'],
['v', 'w', 'x', 'y', 'z']]
m = raw_input()
for row in table:
for column in row:
for letter in m:
if letter == 'j':
letter = 'i'
if column == letter:
print column
Which iterates over the alphabet, and checks each letter in the text provided, and if they match, prints out the current letter in the alphabet. My problem is that when passing hello to it, it prints out:
e
h
l
l
o
instead of:
h
e
l
l
o
What is causing this? There are errors with several other examples of text, but not all. Is there something wrong with my logic?
It works as expected, printing the letters in alphabetical order. Just like what is in the table.
To print your output as received you need to iterate the user input first:
for letter in m: # this first
for row in table:
for column in row:
if letter == 'j':
letter = 'i'
if column == letter:
print column
Have you considered using translate instead of a table?
If you want to check a string and replace chr inside it it will be much easier.
from string import maketrans
in_str = "i"
out_str = "j"
translate_in_out = maketrans(in_str, out_str)
m = raw_input()
print m.translate(translate_in_out, 'xm')
It work fine. Just instead of checking if letter is in your array, you checking if letter from array is in your word. You take first row, than first column (letter a) than check if letter a is == any letter in m. You need start form other side:
m = raw_input()
for letter in m:
if letter == 'j'
letter = 'i'
for row in table:
for column in row:
if column == letter:
print column
Replace the unwanted characters in m.
m = m.replace('j', 'i')
Iterate over m and check if each character is in a table row.
for char in m:
for row in table:
if char in row:
print char

Categories