title = 'Example####+||'
blacklisted_chars = ['#','|','#','+']
for i in blacklisted_chars:
convert = title.replace(i, '')
print(convert)
# Example####||
I want to remove all blacklisted characters in a list and replace them with '', however when the code is run only the final 'blacklisted_char' is replaced within the print statement
I am wondering how I would make it that all characters are replaced and only 'Example' is printed
Strings are immutable in python. You assign a new string with
convert = title.replace(i, '')
title remains unchanged after this statement. convert is an entirely new string that is missing i.
On the next iteration, you replace a different value of i, but still from the original title. So in the end it looks like you only ran
convert = title.replace('+', '')
You have two very similar options, depending on whether you want to keep the original title around or not.
If you do, make another reference to it, and keep updating that reference with the results, so that each successive iteration builds on the result of the previous removal:
convert = title
for i in blacklisted_chars:
convert = convert.replace(i, '')
print(convert)
If you don't care to retain the original title, use that name directly:
for i in blacklisted_chars:
title = title.replace(i, '')
print(title)
You can achieve a similar result without an explicit loop using re.sub:
convert = re.sub('[#|#+]', '', title)
Try this :
title = 'Example####+||'
blacklisted_chars = ['#','|','#','+']
for i in blacklisted_chars:
title = title.replace(i, '')
print(title)
Explanation: Since you were storing the result of title.replace in the convert variable, every iteration it was being overwritten. What you need is to apply replace to the result of the previous iteration, which can be the variable with the original string or another variable containing a copy of it if you want to keep the original value unchanged.
P.S.: strings are iterables so you can also achieve the same results with something like this:
blacklisted_chars = '#|#+'
Related
I have list with one item in it, then I try to dismantle, & rebuild it.
Not really sure if it is the 'right' way, but for now it will do.
I tried using replace \ substitute, other means of manipulating the list, but it didn't go too far, so this is what I came up with:
This is the list I get : alias_account = ['account-12345']
I then use this code to remove the [' in the front , and '] from the back.
NAME = ('%s' % alias_account).split(',')
for x in NAME:
key = x.split("-")[0]
value = x.split("-")[1]
alias_account = value[:-2]
alias_account1 = key[2:]
alias_account = ('%s-%s') % (alias_account1, alias_account)
This works beautifully when running print alias_account.
The problem starts when I have a list that have ['acc-ount-12345'] or ['account']
So my question is, how to include all of the possibilities?
Should I use try\except with other split options?
or is there more fancy split options ?
To access a single list element, you can index its position in square brackets:
alias_account[0]
To hide the quotes marking the result as a string, you can use print():
print(alias_account[0])
For example, given a list of strings prices = ["US$200", "CA$80", "GA$500"],
I am trying to only return ["US", "CA", "GA"].
Here is my code - what am I doing wrong?
def get_country_codes(prices):
prices = ""
list = prices.split()
list.remove("$")
"".join(list)
return list
Since each of the strings in the prices argument has the form '[country_code]$[number]', you can split each of them on '$' and take the first part.
Here's an example of how you can do this:
def get_country_codes(prices):
return [p.split('$')[0] for p in prices]
So get_country_codes(['US$200', 'CA$80', 'GA$500']) returns ['US', 'CA', 'GA'].
Also as a side note, I would recommend against naming a variable list as this will override the built-in value of list, which is the type list itself.
There are multiple problems with your code, and you have to fix all of them to make it work:
def get_country_codes(prices):
prices = ""
Whatever value your caller passed in, you're throwing that away and replacing it with "". You don't want to do that, so just get rid of that last line.
list = prices.split()
You really shouldn't be calling this list list. Also, split with no argument splits on spaces, so what you get may not be what you want:
>>> "US$200, CA$80, GA$500".split()
['US$200,', 'CA$80,', 'GA$500']
I suppose you can get away with having those stray commas, since you're just going to throw them away. But it's better to split with your actual separators, the ', '. So, let's change that line:
prices = prices.split(", ")
list.remove("$")
This removes every value in the list that's equal to the string "$". There are no such values, so it does nothing.
More generally, you don't want to throw away any of the strings in the list. Instead, you want to replace the strings, with strings that are truncated at the $. So, you need a loop:
countries = []
for price in prices:
country, dollar, price = price.partition('$')
countries.append(country)
If you're familiar with list comprehensions, you can rewrite this as a one-liner:
countries = [price.partition('$')[0] for price in prices]
"".join(list)
This just creates a new string and then throws it away. You have to assign it to something if you want to use it, like this:
result = "".join(countries)
But… do you really want to join anything here? It sounds like you want the result to be a list of strings, ['US', 'CA', 'GA'], not one big string 'USCAGA', right? So, just get rid of this line.
return list
Just change the variable name to countries and you're done.
Since your data is structured where the first two characters are the county code you can use simple string slicing.
def get_country_codes(prices):
return [p[:2] for p in prices]
You call the function sending the prices parameter but your first line initialize to an empty string:
prices = ''
I would also suggest using the '$' character as the split character, like:
list = prices.split('$')
try something like this:
def get_country_codes(prices):
list = prices.split('$')
return list[0]
I have a dataframe of which one column ('entity) contains various names of countries and non-state entities. I need to clean the column because the string values (provided by manual data-entry) are all lower-case (china instead of China). I can't just perform the .title() operation on the column since there are string values for which I want nothing to done (e.g., al Something should not be turned into AL Something).
I'm have trouble creating a function to help me with this problem and could use some guidance from the community. In the past I've used dictionaries to help map/replace incorrect strings with correct strings, and I can still revert to that way of doing things, but I thought creating this function might be more straightforward and efficient and plus I wanted to challenge myself. But no changes occurs to the entity column when I execute the function. Thanks in advance!
myString = ['al Group1', 'al Group2']
entities = df['entity']
def title_fix(entities):
new_titles = []
for entity in entities:
if entity in myString:
new_titles.append(myString)
else:
new_title.append(entity.title())
return new_title
title_fix(df)
The entities in the line entities = df['entity'] is not the same variable as the entities in the line def title_fix(entities):. This second entities variable is the argument to the function title_fix, and it exists only within the function. It takes on whatever argument you pass into your call to title_fix, which is df.
Try this instead of your function:
# A list of entity names to leave alone (must exactly match character-for-character)
myString = ['al Group1', 'al Group2']
# Apply title case to every entity NOT in myString
df['entity'] = df['entity'].apply(lambda x: x if x in myString else x.title())
# Print the modified DataFrame
df
Note that this solution requires that each string in myString exactly matches the target string in df['entity'], otherwise the target string will not be replaced.
Your code had several bugs, such as spelling and indentation. Fixed code:
myString = ['al Group1', 'al Group2']
entities = df['entity']
def title_fix(entities):
new_titles = []
for entity in entities:
if entity in myString:
new_titles.append(entity)
else:
new_titles.append(entity.title())
return new_titles
df['entity'] = title_fix(entities)
However, what you want to achieve can be done in a one-liner. I came up with 3 solutions. I don't know pandas that well and I have no idea about the performance differences between these solutions, but here they are.
ignored makes a little bit more sense than myString so I'll use it.
ignored = ['al Group1', 'al Group2']
First solution:
df['entity'] = df['entity'].apply(lambda x: x.title() if x not in ignored else x)
Second:
df.entity[~df.entity.isin(ignored)] = df.entity.str.title()
Third:
df.loc[~df.entity.isin(ignored), 'entity'] = df.entity.str.title()
This has taken me over a day of trial and error. I am trying to keep a dictionary of queries and their respective matches in a search. My problem is that there can be one or more matches. My current solution is:
match5[query_site] will already have the first match but if it finds another match it will append it using the code below.
temp5=[] #temporary variable to create array
if isinstance(match5[query_site],list): #check if already a list
temp5.extend(match5[query_site])
temp5.append(match_site)
else:
temp5.append(match5[query_site])
match5[query_site]=temp5 #add new location
That if statement is literally to prevent extend converting my str element into an array of letters. If I try to initialize the first match as a single element array I get None if I try to directly append. I feel like there should be a more pythonic method to achieve this without a temporary variable and conditional statement.
Update: Here is an example of my output when it works
5'flank: ['8_73793824', '6_133347883', '4_167491131', '18_535703', '14_48370386']
3'flank: X_11731384
There's 5 matches for my "5'flank" and only 1 match for my "3'flank".
So what about this:
if query_site not in match5: # here for the first time
match5[query_site] = [match_site]
elif isinstance(match5[query_site], str): # was already here, a single occurrence
match5[query_site] = [match5[query_site], match_site] # make it a list of strings
else: # already a list, so just append
match5[query_site].append(match_site)
I like using setdefault() for cases like this.
temp5 = match5.setdefault(query_site, [])
temp5.append(match_site)
It's sort of like get() in that it returns an existing value if the key exists but you can provide a default value. The difference is that if the key doesn't exist already setdefault inserts the default value into the dict.
This is all you need to do
if query_site not in match5:
match5[query_site] = []
temp5 = match5[query_site]
temp5.append(match_site)
You could also do
temp5 = match5.setdefault(query_site, [])
temp5.append(match_site)
Assuming match5 is a dictionary, what about this:
if query_site not in match5: # first match ever
match5[query_site] = [match_site]
else: # entry already there, just append
match5[query_site].append(temp5)
Make the entries of the dictionary to be always a list, and just append to it.
I have this code:
topic = "test4"
topics = sns.get_all_topics()
topicsList = topics['ListTopicsResponse']['ListTopicsResult']['Topics']
topicsListNames = [t['TopicArn'] for t in topicsList]
That returns a list:
[u'arn:aws:sns:us-east-1:10:test4', u'arn:aws:sns:us-east-1:11:test7']
What Im trying now is create a variable that returns the complete string relative to the topic variable.
I have the variable topic = "test4", and I want to have a variable topicResult that returns u'arn:aws:sns:us-east-1:10:test4.
The string relative to topic its not always in list 1st position.
Do you know how to do this?
topicResult = " ".join([t['TopicArn'] for t in topicsList if t['TopicArn'].endswith(topic)])
This will check the strings in the list to see if the topic variable is the end of one of the strings. " ".join() gives you a string, but if you want to keep a list of the strings that end with topic, you can get rid of it. If topic won't always be at the end of the string, you can just check if topicis inside the string.
topicResult = " ".join([t['TopicArn'] for t in topicsList if topic in t['TopicArn']])
You could use intention lists, with a check statement in, but I think built-in filter will be faster:
topicsListNames = filter(lambda item: item['TopicArn'].endswith(topic), topicsList)
Basically, this line take the topicsList, then takes only the items item for which item['TopicArn'].endswith(topic) is True, ie. the items whose 'TopicArn' element ends with the reference of the topic variable. Finally, all these "good" items are returned, and topicsListNames references them.