Python- Look for string only in the beginning of a string - python

I am using difflib.Differ() on two lists.
The way Differ works, it appends a + if a line is unique to sequence 2 and a - if a line is unique to sequence 1. It appends this right at the beginning of the sequence.
I want to search for sequences in my list that begin with - or + but only if the string begins with this character as the majority of my sequences have these characters in other places within the string.
In the code snippet below, diff_list is the list. I want it to check for a + or - in the very first place in the string value of every sequence in this list:
for x in diff_list:
if "+" or "-" in x[0]:
print x
This output seems to print all of the lines even those that don't begin with - or +

Did you try startswith?
s = '+asdf' # sample data
if s.startswith('+') or s.startswith('-'):
pass # do work here
Docs:
https://docs.python.org/3.4/library/stdtypes.html#str.startswith

Related

Python: How to move the position of an output variable using the split() method

This is my first SO post, so go easy! I have a script that counts how many matches occur in a string named postIdent for the substring ff. Based on this it then iterates over postIdent and extracts all of the data following it, like so:
substring = 'ff'
global occurences
occurences = postIdent.count(substring)
x = 0
while x <= occurences:
for i in postIdent.split("ff"):
rawData = i
required_Id = rawData[-8:]
x += 1
To explain further, if we take the string "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff", it is clear there are 3 instances of ff. I need to get the 8 preceding characters at every instance of the substring ff, so for the first instance this would be 909a9090.
With the rawData, I essentially need to offset the variable required_Id by -1 when I get the data out of the split() method, as I am currently getting the last 8 characters of the current string, not the string I have just split. Another way of doing it could be to pass the current required_Id to the next iteration, but I've not been able to do this.
The split method gets everything after the matching string ff.
Using the partition method can get me the data I need, but does not allow me to iterate over the string in the same way.
Get the last 8 digits of each split using a slice operation in a list-comprehension:
s = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
print([x[-8:] for x in s.split('ff') if x])
# ['909a9090', '90434390', 'sdfs9000']
Not a difficult problem, but tricky for a beginner.
If you split the string on 'ff' then you appear to want the eight characters at the end of every substring but the last. The last eight characters of string s can be obtained using s[-8:]. All but the last element of a sequence x can similarly be obtained with the expression x[:-1].
Putting both those together, we get
subject = '090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff'
for x in subject.split('ff')[:-1]:
print(x[-8:])
This should print
909a9090
90434390
sdfs9000
I wouldn't do this with split myself, I'd use str.find. This code isn't fancy but it's pretty easy to understand:
fullstr = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
search = "ff"
found = None # our next offset of
last = 0
l = 8
print(fullstr)
while True:
found = fullstr.find(search, last)
if found == -1:
break
preceeding = fullstr[found-l:found]
print("At position {} found preceeding characters '{}' ".format(found,preceeding))
last = found + len(search)
Overall I like Austin's answer more; it's a lot more elegant.

Operate on part of sequence while returning whole sequence

I want to shorten a python class name by truncating all but the last part ie: module.path.to.Class => mo.pa.to.Class.
This could be accomplished by splittin the string and storing the list in a variable and then operating on all but the last part and joining them back.
I would like to know if there is a way to do this in one step ie:
split to parts
create two copies of sequence (tee ?)
apply truncation to one sequence and not the other
join selected parts of sequence
Something like:
'.'.join( [chain(map(lambda x: x[:2], foo[:-1]), bar[-1]) for foo, bar in tee(name.split('.'))] )
But I'm unable to figure out working with ...foo, bar in tee(...
If you want to do it by splitting, you can split once on the last dot first, and then process only the first part by splitting it again to get the package indices, then shorten each to its first two characters, and finally join everything back together in the end. If you insist on doing it inline:
name = "module.path.to.Class"
short = ".".join([[x[:2] for x in p.split(".")] + [n] for p, n in [name.rsplit(".", 1)]][0])
print(short) # mo.pa.to.Class
This creates unnecessary lists just so it can traverse the list comprehension waters safely, in reality it probably ends up being slower than just doing it in a normal, procedural fashion:
def shorten_path(source):
indices = source.split(".")
return ".".join(x[:2] for x in indices[:-1]) + "." + indices[-1]
name = "module.path.to.Class"
print(shorten_path(name)) # mo.pa.to.Class
You could do this in one line with a regular expression:
>>> re.sub(r'(\b\w{2})\w*(\.)', r'\1\2', 'module.path.to.Class')
'mo.pa.to.Class'
The pattern r'(\b\w{2})\w*(\.)' captures two matches: the first two letters of a word, and the dot at the end of the word.
The substitution pattern r'\1\2' concatenates the two captured groups - the first two letters of the word and the dot.
No count parameter is passed to re.sub so all occurrences of the pattern are substituted.
The final word - the class name - is not truncated because it isn't follwed by a dot, so it doesn't match the pattern.

In python Im trying to parse a string where the element and the index plus one returns the desired index

Hers is my code challenge and my code. I'm stuck, not sure why its not working properly
-write a function named plaintext that takes a single parameter of a string encoded in this format: before each character of the message, add a digit and a series of other characters. the digit should correspond to the number of characters that will precede the message's actual, meaningful character. it should return the decoded word in string form
""" my pseudocode:
#convert string to a list
#enumerate list
#parse string where the element and the index plus one returns the desired index
#return decoded message of desired indexes """
encoded_message = "0h2ake1zy"
#encoded_message ="2xwz"
#encoded_message = "0u2zyi2467"
def plaintext(string):
while(True):
#encoded_message = raw_input("enter encoded message:")
for index, character in enumerate(list(encoded_message)):
character = int(character)
decoded_msg = index + character + 1
print decoded_msg
You need to go iterate over the string's characters, and in each iteration skip the specified number of characters and take the following one:
def plaintext(s):
res = ''
i = 0
while i < len(s):
# Skip the number of chars specified
i += int(s[i])
# Take the letter after them
i += 1
res += s[i]
# Move on to the next position
i += 1
return res
Here are some hints.
First decide what looping construct you want to use. Python offers choices: iterate over individual characters, loop over the indices of the characters, while loop. You certainly don't want both a while and a for loop.
You're going to be processing the string in groups, "0h", then "2ake", then "1zy" to take your first example string. What is the condition that will cause you to exit the loop?
Now, look at your line decoded_msg = index + character + 1. To construct the decoded string, you want to index into the string itself, based on the digit's value. So, this line should contain something like, encoded_message[x] for some x, that you have to figure out using the digit.
Also, you'll want to accumulate characters as you go along. So you'll need to begin the loop with an empty result string decoded_msg="" and add a character to it decoded_msg += ... for each iteration of the loop.
I hope this helps a little more than just giving the answer.

Replacing symbols using list comprehension

I want to simplify replacing specific characters of a string in-situ - with a list comprehension. Attempts so far simply return a list of strings - each list item with each character replaced from the check string.
Advice / solutions?
Inputs:
reveal = "password"
ltrTried = "sr"
Required Output:
return = "**ss**r*"
Getting:
('**ss****', '******r*')
If you want to do this using a list comprehension, you'd want to replace it letter by letter like this:
reveal = "".join((letter if letter in ltrFound else "*") for letter in reveal)
Notice that
We're iterating over your reveal string, not your ltrFound list (or string).
Each item is replaced using the ternary operator letter if letter in ltrFound else "*". This ensures that if the letter in reveal is not in ltrFound, it will get replaced with a *.
We end by joining together all the letters.
Just for fun, here's a different way to do this immutably, by using a translation map.
If you wanted to replace everything that was in ltrFound, that would be easy:
tr = str.maketrans(ltrFound, '*' * len(ltrFound))
print(reveal.translate(tr))
But you want to do the opposite, replace everything that's not in ltrFound. And you don't want to build a translation table of all of the 100K+ characters that aren't s. So, what can you do?
You can build a table of the 6 characters that aren't in s but are in reveal:
notFound = ''.join(set(reveal) - set(ltrFound)) # 'adoprw'
tr = str.maketrans(notFound, '*' * len(notFound))
print(reveal.translate(tr))
The above is using Python 3.x; for 2.x, maketrans is a function in the string module rather than a classmethod of the str class (and there are a few other differences, but they don't matter here). So:
import string
notFound = ''.join(set(reveal) - set(ltrFound)) # 'adoprw'
tr = string.maketrans(notFound, '*' * len(notFound))
print(reveal.translate(tr))
try this
re.sub("[^%s]"%guesses,"*",solution_string)
assuming guesses is a string

How to slicing a list element (get the substring of a list element)?

This program:
lijst = ('123-Abc','456-Def','789-Ghi')
print "lijst[1] = " + lijst[1]
print "lijst[1][4:] = " + lijst[1][4:]
print "lijst[1][4:1] = " + lijst[1][4:1]
has this output:
lijst[1] = 456-Def
lijst[1][4:] = Def
lijst[1][4:1] =
??
i had hoped that last line to be "D" !
So what is the correct syntax in order to get a substring from a list element?
(i'm running python 2.7.3 on a raspberry pi)
The correct syntax for slicing is [start:stop] and not [start:count] the way that it was used in the question. So you are actually looking for lijst[1][4:4+1] or lijst[1][4:5] for your last line.
There are all sorts of nice reasons for having this. You can use the same index to split a string into two parts, for example
lijst[1][:4] = "456-"
lijst[1][4:] = "Def"
and
lijst[1][:4] + lijst[1][4:] == lijst[1]
Note how you can leave out the first or last entry to indicate either the start or the end of the string.
Another nice feature of using indices like this is that the length of the string is given by stop-start. So
lijst[1][2:6] = "6-De"
and the length of this substring is 6 - 2 = 4
One last note is that you can also skip entries in the string by adding another step index:
lijst[1][0:7:2] = "46Df"
This goes from the start (index 0) to the end (index 7) and shows every second entry. Since you can leave out the start and the end indices, this is equivalent to
lijst[1][::2]
Getting a substring between the indices a and b (excluding b) is achieved using the slice [a:b]. And to get the character at index i you simply use [i] (think of the string as an array of characters).
>>> test = "456-Def"
>>> test[4:5]
'D'
>>> test[4:8]
'Def'
>>> test[4]
'D'

Categories