PYTHON: How do I grab a specific part of a string? - python

I have the following string: "https://www.instagram.com/paula.mtzm/"
I want to put the user "paula.mtzm" to a variable.
Anyone know how to do this ? Maybe you can somehow delete a part of the string like "https://www.instagram.com/" and then delete the last character "/" ?

"https://www.instagram.com/paula.mtzm/".split(".com/")[-1].replace("/", "")
This should do what you want. Effectively it splits the string into a list using the separator .com/, gets the last item of that list ("paula.mtzm/"), and finally removes any remaining /s
I'm not sure how specific your use-case is so I don't know how suitable this is in general.

This is actually pretty easy:
Strings are indexed in Python just like a list. So:
string = "potato"
print string[0] #this will print "p" to the console
#we can 'slice' in the index too
print string[0:3] #this will print "pot" to the console
So for your specific problem you could have your code search for the 3rd
forward slash and grab everything after that.
If you always know the web address you can just start your index at the end of
the address and where the user begins:
string = "https://www.instagram.com/paula.mtzm/"
string_index = 26 # the 'p' in paula begins here
user_name = string[string_index:string.len]
print user_name #outputs paula.mtzm

Related

"# this is a string", How python identifies it as a string but not a comment?

I really want to know how python identifies # in quotes as a string and normal # as a comment
I mean how the code to identify difference between these actually works, like will the python read a line and how it excludes the string to find the comment
"# this is a string" # this is a comment
How the comment is identified, will python exclude the string and if so, How?
How can we write a code which does the same, like to design a compiler for our own language with python
I am a newbie, please help
You need to know that whether something is a string or a comment can be determined from just one single character. That is the job of the scanner (or lexical analyzer if you want to sound fancy).
If it starts with a ", it's a string. If it starts with #, it's a comment.
In the code that makes up Python itself, there's probably a loop that goes something like this:
# While there is still source code to read
while not done:
# Get the current character
current = source[pos]
# If the current character is a pound sign
if current == "#":
# While we are not at the end of the line
while current != "\n":
# Get the next character
pos += 1
current = source[pos]
elif current == '"':
# Code to read a string omitted for brevity...
else:
done = True
In the real Python lexer, there are probably dozens more of those if statements, but I hope you have a better idea of how it works now. :)
Because of the quotes
# This is a comment
x = "# this is a string"
x = '# this is a also string'
x = """# this string
spans
multiple
lines"""
"# this is a string" # this is a comment
In simple terms, the interpreter sees the first ", then it takes everything that follows as part of the string until it finds the matching " which terminates the string. Then it sees the subsequent # and interprets everything to follow as a comment. The first # is ignored because it is between the two quotes, and hence is taken as part of the string.

How to get everything after string x in python

I have a string:
s3://tester/test.pdf
I want to exclude s3://tester/ so even if i have s3://tester/folder/anotherone/test.pdf I am getting the entire path after s3://tester/
I have attempted to use the split & partition method but I can't seem to get it.
Currently am trying:
string.partition('/')[3]
But i get an error saying that it out of index.
EDIT: I should have specified that the name of the bucket will not always be the same so I want to make sure that it is only grabbing anything after the 3rd '/'.
You can use str.split():
path = 's3://tester/test.pdf'
print(path.split('/', 3)[-1])
Output:
test.pdf
UPDATE: With regex:
import re
path = 's3://tester/test.pdf'
print(re.split('/',path,3)[-1])
Output:
test.pdf
Have you tried .replace?
You could do:
string = "s3://tester/test.pdf"
string = string.replace("s3://tester/", "")
print(string)
This will replace "s3://tester/" with the empty string ""
Alternatively, you could use .split rather than .partition
You could also try:
string = "s3://tester/test.pdf"
string = "/".join(string.split("/")[3:])
print(string)
To answer "How to get everything after x amount of characters in python"
string[x:]
PLEASE SEE UPDATE
ORIGINAL
Using the builtin re module.
p = re.search(r'(?<=s3:\/\/tester\/).+', s).group()
The pattern uses a lookbehind to skip over the part you wish to ignore and matches any and all characters following it until the entire string is consumed, returning the matched group to the p variable for further processing.
This code will work for any length path following the explicit s3://tester/ schema you provided in your question.
UPDATE
Just saw updates duh.
Got the wrong end of the stick on this one, my bad.
Below re method should work no matter S3 variable, returning all after third / in string.
p = ''.join(re.findall(r'\/[^\/]+', s)[1:])[1:]

Why is my string variable not working when I try to combine it with other strings inside a loop?

I have problem with my code, maybe a scope issue, but not sure.
I have a list of objects with a key "systems", that has subkeys that are the names of the systems with further key/values.
I can use the variable in the loop, can print it or assign it to another variable. But when I try to combine the variable with other strings it fails.
I have already checked the variable type with print type() and I get string back.
def get_systems(a_objects):
for a in a_objects:
name = (a['name'])
if 'systems' in a:
for sys in a['systems']:
print name # This works and prints the name
print name + '-' + (sys) # This does not work and prints only the '-' and the sys
return
When I printed each element of the list I did not notice, but when I printed the full list I noticed that every element had a '\r' at the end. That made it impossible to combine with any other string.
A simple .strip() in the code that extracted each string with a regex removed the carriage return and problem was fixed.

Extract e-mail addresses from .txt files in python

I would like to parse out e-mail addresses from several text files in Python. In a first attempt, I tried to get the following element that includes an e-mail address from a list of strings ('2To whom correspondence should be addressed. E-mail: joachim+pnas#uci.edu.\n').
When I try to find the list element that includes the e-mail address via i.find("#") == 0 it does not give me the content[i]. Am I misunderstanding the .find() function? Is there a better way to do this?
from os import listdir
TextFileList = []
PathInput = "C:/Users/p282705/Desktop/PythonProjects/ExtractingEmailList/text/"
# Count the number of different files you have!
for filename in listdir(PathInput):
if filename.endswith(".txt"): # In case you accidentally put other files in directory
TextFileList.append(filename)
for i in TextFileList:
file = open(PathInput + i, 'r')
content = file.readlines()
file.close()
for i in content:
if i.find("#") == 0:
print(i)
The standard way of checking whether a string contains a character, in Python, is using the in operator. In your case, that would be:
for i in content:
if "#" in i:
print(i)
The find method, as you where using, returns the position where the # character is located, starting at 0, as described in the Python official documentation.
For instance, in the string abc#google.com, it will return 3. In case the character is not located, it will return -1. The equivalent code would be:
for i in content:
if i.find("#") != -1:
print(i)
However, this is considered unpythonic and the in operator usage is preferred.
Find returns the index if you find the substring you are searching for. This isn't correct for what you are trying to do.
You would be better using a Regular Expression or RE to search for an occurence of #. In your case, you may come into as situation where there are more than one email address per line (Again I don't know your input data so I can't take a guess)
Something along these lines would benefit you:
import re
for i in content:
findEmail = re.search(r'[\w\.-]+#[\w\.-]+', i)
if findEmail:
print(findEmail.group(0))
You would need to adjust this for valid email addresses... I'm not entirely sure if you can have symbols like +...
'Find' function in python returns the index number of that character in a string. Maybe you can try this?
list = i.split(' ') # To split the string in words
for x in list: # search each word in list for # character
if x.find("#") != -1:
print(x)

Python not reading from list corretly

Okay, below is my issue:
this program reads from a file, makes a list without using rstrip('\n'), which I did on purpose. From there, it prints the list, sorts it, prints it again, saves the new, sorted list to a text file, and allows you to search the list for a value.
The problem I am having is this:
when I search for a name, no matter how I type it in, it tells me that its not in the list.
the code worked til I changed the way I was testing for the variable. Here is the search function:
def searchNames(nameList):
another = 'y'
while another.lower() == 'y':
search = input("What name are you looking for? (Use 'Lastname, Firstname', including comma: ")
if search in nameList:
print("The name was found at index", nameList.index(search), "in the list.")
another = input("Check another name? Y for yes, anything else for no: ")
else:
print("The name was not found in the list.")
another = input("Check another name? Y for yes, anything else for no: ")
For the full code, http://pastebin.com/PMskBtzJ
For the content of the text file: http://pastebin.com/dAhmnXfZ
Ideas? I feel like I should note that I have tried to add ( + '\n') to the search variable
You say you explicitly did not strip off the newlines.
So, your nameList is a list of strings like ['van Rossum, Guido\n', 'Python, Monty\n'].
But your search is the string returned by input, which will not have a newline. So it can't possibly match any of the strings in the list.
There are a few ways to fix this.
First, of course, you could strip the newlines in your list.
Alternatively, you could strip them on the fly during the search:
if search in (name.rstrip() for name in nameList):
Or you could even add them onto the search string:
if search+'\n' in nameList:
If you're doing lots of searches, I would do the stripping just once and keep a list of stripped names around.
As a side note, searching the list to find out if the name is in the list, and then searching it again to find the index, is a little silly. Just search it once:
try:
i = nameList.index(search)
except ValueError:
print("The name was not found in the list.")
else:
print("The name was found at index", i, "in the list.")
another = input("Check another name? Y for yes, anything else for no: ")
Reason for this error is that any input in your list ends with a "\n". SO for example "john, smith\n". Your search function than uses the input which does NOT include "\n".
You've not given us much to go on, but maybe using sys.stdin.readline() instead of input() would help? I don't believe 2.x input() is going to leave a newline on the end of your inputs, which would make the "in" operator never find a match. sys.stdin.readline() does leave the newline at the end.
Also 'string' in list_ is slow compared to 'string' in set_ - if you don't really need indices, you might use a set instead, particularly if your collection is large.

Categories