This question already has answers here:
How would you make a comma-separated string from a list of strings?
(15 answers)
Closed 3 years ago.
I know the desired syntax lies in the first function but I for the life of me can't find where it is.
I've attempted to remove commas and add spaces to each .split() each has yielded an undesired return value.
def get_country_codes(prices):
price_list = prices.split(',')
results = ''
for price in price_list:
results += price.split('$')[0]
return results
def main():
prices = "US$40, AU$89, JP$200"
price_result = get_country_codes(prices)
print(price_result)
if __name__ == "__main__":
main()
The current output:
US AU JP
The desired output:
US, AU, JP
It looks like you could benefit from using a list to collect the country codes of the prices instead of a string. Then you can use ', '.join() later.
Maybe like this:
def get_country_codes(prices):
country_code_list = []
for price in prices.split(','):
country_code = price.split('$')[0].strip()
country_code_list.append(country_code)
return country_code_list
if __name__ == '__main__':
prices = "US$40, AU$89, JP$200"
result_list = get_country_codes(prices)
print(', '.join(result_list))
Or if you like really short code:
prices = "US$40, AU$89, JP$200"
print(
', '.join(
price.split('$')[0].strip()
for price in prices.split(',')))
You could also use regex if you want to. Since you know country codes will be two capital letters only (A-Z), you can look for a match of two capital letters that precede a dollar sign.
def get_country_codes(prices):
country_codes = re.findall(r'([A-Z]{2})\$', prices)
return ', '.join(country_codes)
See regex demo here.
Look at the successive steps:
Your string:
In [1]: prices = "US$40, AU$89, JP$200"
split into a list on comma
In [2]: alist = prices.split(',')
In [3]: alist
Out[3]: ['US$40', ' AU$89', ' JP$200']
split the substrings on $
In [4]: [price.split('$') for price in alist]
Out[4]: [['US', '40'], [' AU', '89'], [' JP', '200']]
select the first element:
In [5]: [price.split('$')[0] for price in alist]
Out[5]: ['US', ' AU', ' JP']
Your += joins the strings as is; same as join with ''. Note that the substrings still have the initial blank for the original string.
In [6]: ''.join([price.split('$')[0] for price in alist])
Out[6]: 'US AU JP'
Join with comma:
In [7]: ','.join([price.split('$')[0] for price in alist])
Out[7]: 'US, AU, JP'
join is the easiest way of joining a list of strings with a specific delimiter between, in effect reversing a split. += in a loop is harder to use, since it tends to add an extra delimiter at the start or end.
Related
I have a dataframe as follows (example is simplified):
id prediction1 prediction2
1234 Cocker_spaniel german_Shepard
5678 rhodesian_ridgeback australian_shepard
I need to remove the underscores and make sure the string is in lower case so I can search it easier later.
I am not quite sure how to loop through this. My initial student thought is something like what follows:
for row in image_predictions['p1']:
image_predictions['p1'] = image_predictions['p1'].replace('_', ' ')
The above code is for replacing the underscore with a space and I believe the code would be similar for lowercase using the .lower() method.
Any advice to point me in the right direction?
For in place modification you can use:
df.update(df[['prediction1', 'prediction2']]
.apply(lambda c: c.str.lower()
.str.replace('_', ' ', regex=False))
)
Output:
id prediction1 prediction2
0 1234 cocker spaniel german shepard
1 5678 rhodesian ridgeback australian shepard
You can use image_predictions['p1'].apply() to apply a function to each cell of the p1 column:
def myFunction(x):
return x.replace('_', ' ')
image_predictions['p1'] = image_predictions['p1'].apply(myFunction)
Wanted to see if it was possible to not have to specify the columns for replacement. This approach creates a dict to replace A -> a, B -> b, etc, and _ -> space. Then uses replace with regex=True
import string
replace_dict = dict(zip(string.ascii_uppercase,string.ascii_lowercase))
replace_dict['_'] = ' '
df.replace(replace_dict, regex=True, inplace=True)
print(df)
I need some help please.
I have a dataframe with multiple columns where 2 are:
Content_Clean = Column filled with Content - String
Removals: list of strings to be removed from Content_Clean Column
Problem: I am trying to replace words in Content_Clean with spaces if in Removals Column:
Example Image
Example:
Content Clean: 'Johnny and Mary went to the store'
Removals: ['Johnny','Mary']
Output: 'and went to the store'
Example Code:
for i in data_eng['Removals']:
for u in i:
data_eng['Content_Clean_II'] = data_eng['Content_Clean'].str.replace(u,' ')
This does not work as Removals columns contain lists per row.
Another Example:
data_eng['Content_Clean_II'] = data_eng['Content_Clean'].apply(lambda x: re.sub(data_eng.loc[data_eng['Content_Clean'] == x, 'Removals'].values[0], '', x))
Does not work as this code is only looking for one string.
The problem is that Removals column is a list that I want use to remove/ replace with spaces in the Content_Clean column on a per row basis.
The example image link might help
Here you go. This worked on my test data. Let me know if it works for you
def repl(row):
for word in row['Removals']:
row['Content_Clean'] = row['Content_Clean'].replace(word, '')
return row
data_eng = data_eng.apply(repl, axis=1)
You can call the str.replace(old, new) method to remove unwanted words from a string.
Here is one small example I have done.
a_string = "I do not like to eat apples and watermelons"
stripped_string = a_string.replace(" do not", "")
print(stripped_string)
This will remove "do not" from the sentence
I have the following string for which I want to extract data:
text_example = '\nExample text \nTECHNICAL PARTICULARS\nLength oa: ...............189.9m\nLength bp: ........176m\nBreadth moulded: .......26.4m\nDepth moulded to main deck: ....9.2m\n
Every variable I want to extract starts with \n
The value I want to get starts with a colon ':' followed by more than 1 dot
When it doesnt start with a colon followed by dots, I dont want to extract that value.
For example my preferred output looks like:
LOA = 189.9
LBP = 176.0
BM = 26.4
DM = 9.2
import re
text_example = '\nExample text \nTECHNICAL PARTICULARS\nLength oa: ...............189.9m\nLength bp: ........176m\nBreadth moulded: .......26.4m\nDepth moulded to main deck: ....9.2m\n'
# capture all the characters BEFORE the ':' character
variables = re.findall(r'(.*?):', text_example)
# matches all floats and integers (does not account for minus signs)
values = re.findall(r'(\d+(?:\.\d+)?)', text_example)
# zip into dictionary (this is assuming you will have the same number of results for both regex expression.
result = dict(zip(variables, values))
print(result)
--> {'Length oa': '189.9', 'Breadth moulded': '26.4', 'Length bp': '176', 'Depth moulded to main deck': '9.2'}
You can create a regex and workaround the solution-
re.findall(r'(\\n|\n)([A-Za-z\s]*)(?:(\:\s*\.+))(\d*\.*\d*)',text_example)[2]
('\n', 'Breadth moulded', ': .......', '26.4')
Suppose I have this following list
[('2015-2016-regular', '2016-playoff'), ('2016-2017-regular', '2017-playoff'), ('2017-2018-regular',)]
which represents the two previous complete NHL years and the current one.
I would like to convert it so that It will give me
[('Regular Season 2015-2016 ', 'Playoff 2016'), ('Regular Season 2016-2017', 'Playoff 2017'), ('Regular Season 2017-2018 ',)]
My English is bad and those writing will be used as titles. Are there any errors in the last list?
How could I construct a function which will do such conversions in respecting the 80 characters long norm?
This is a little hacky, but it's an odd question and use case so oh well. Since you have a really limited set of replacements, you can just use a dict to define them and then use a list comprehension with string formatting:
repl_dict = {
'-regular': 'Regular Season ',
'-playoff': 'Playoff '
}
new_list = [
tuple(
'{}{}'.format(repl_dict[name[name.rfind('-'):]], name[:name.rfind('-')])
for name in tup
)
for tup in url_list
]
I tried this. So, I unpacked the tuple. I know where I have to split and which parts to join and did the needful. capitalize() function is for making the first letter uppercase. Also I need to be careful whether the tuple has one or two elements.
l = [('2015-2016-regular', '2016-playoff'), ('2016-2017-regular', '2017-playoff'), ('2017-2018-regular',)]
ans = []
for i in l:
if len(i)==2:
fir=i[0].split('-')
sec = i[1].split('-')
ans.append((fir[2].capitalize()+" "+fir[0]+'-'+fir[1],sec[1].capitalize()+" "+sec[0]))
else:
fir=i[0].split('-')
ans.append((fir[2].capitalize()+" "+fir[0]+'-'+fir[1],))
print ans
Output:
[('Regular 2015-2016', 'Playoff 2016'), ('Regular 2016-2017', 'Playoff 2017'), ('Regular 2017-2018',)]
I am writing a program in Python to parse a Ledger/hledger journal file.
I'm having problems coming up with a regex that I'm sure is quite simple. I want to parse a string of the form:
expenses:food:food and wine 20.99
and capture the account sections (between colons, allowing any spaces), regardless of the number of sub-accounts, and the total, in groups. There can be any number of spaces between the final character of the sub-account name and the price digits.
expenses:food:wine:speciality 19.99 is also allowable (no space in sub-account).
So far I've got (\S+):|(\S+ \S+):|(\S+ (?!\d))|(\d+.\d+) which is not allowing for any number of sub-accounts and possible spaces. I don't think I want to have OR operators in there either as this is going to concatenated with other regexes with .join() as part of the parsing function.
Any help greatly appreciated.
Thanks.
You can use the following:
((?:[^\s:]+)(?:\:[^\s:]+)*)\s*(\d+\.\d+)
Now we can use:
s = 'expenses:food:wine:speciality 19.99'
rgx = re.compile(r'((?:[^\s:]+)(?:\:[^\s:]+)*)\s*(\d+\.\d+)')
mat = rgx.match(s)
if mat:
categories,price = mat.groups()
categories = categories.split(':')
Now categories will be a list containing the categories, and price a string with the price. For your sample input this gives:
>>> categories
['expenses', 'food', 'wine', 'speciality']
>>> price
'19.99'
You don't need regex for such a simple thing at all, native str.split() is more than enough:
def split_ledger(line):
entries = line.split(":") # first split all the entries
last = entries.pop() # take the last entry
return entries + last.rsplit(" ", 1) # split on last space and return all together
print(split_ledger("expenses:food:food and wine 20.99"))
# ['expenses', 'food', 'food and wine ', '20.99']
print(split_ledger("expenses:food:wine:speciality 19.99"))
# ['expenses', 'food', 'wine', 'speciality ', '19.99']
Or if you don't want the leading/trailing whitespace in any of the entries:
def split_ledger(line):
entries = [e.strip() for e in line.split(":")]
last = entries.pop()
return entries + [e.strip() for e in last.rsplit(" ", 1)]
print(split_ledger("expenses:food:food and wine 20.99"))
# ['expenses', 'food', 'food and wine', '20.99']
print(split_ledger("expenses:food:wine:speciality 19.99"))
# ['expenses', 'food', 'wine', 'speciality', '19.99']