I have a python script that scrapes a webpage and downloads the mp3s found on it.
I am trying to name the files using elements that I have successfully captured in a separate function.
I am having trouble naming the downloaded files, this is what I have so far:
def make_safe_filename(text: str) -> str:
"""convert forbidden chars to underscores"""
return ''.join(c if (c.isalnum() or c.isspace()) else '_' for c in text).strip('_ ')
filename = make_safe_filename(a['artist'] + a['title'] + a['date'] + a['article_url'])
I am trying to save the file name as "Artist - Title - Date - Article_url" however I am struggling to do this. At the moment the variables are all mashed together without spaces, eg. ArtistTitleDateArticle_url.mp3
I've tried
filename = make_safe_filename(a['artist'] + "-" + a['title'] + "-" + a['date'] + "-" +
a['article_url'])
but this throws up errors.
Can anyone shed some light on where I am going wrong? I know it's something to do with combining variables but I am stuck. Thanks in advance.
I am guessing your a is a dictionary? Maybe you could clarify this in your question? Also what do you typically have in a['article_url']? Could you also post the traceback?
This is my attempt (note: no changes to the function):
def make_safe_filename(text: str) -> str:
"""convert forbidden chars to underscores"""
return ''.join(c if (c.isalnum() or c.isspace()) else '_' for c in text).strip('_ ')
a = {
'artist': 'Metallica',
'title': 'Nothing Else Matters',
'date': '1991',
'article_url': 'unknown',
}
filename = make_safe_filename(a['artist'] + '-' + a['title'] + '-' + a['date'] + '-' + a['article_url'])
print(filename)
Which produced the following output:
Metallica_Nothing Else Matters_1991_unknown
You code should actually work, but if you add the - before passing the joined string to the function, it will just replace those with _ as well. Instead, you could pass the individual fields and then join those in the function, after replacing the "illegal" characters for each field individually. Also, you could regular expressions and re.sub for the actual replacing:
import re
def safe_filename(*fields):
return " - ".join(re.sub("[^\w\s]", "_", s) for s in fields)
>>> safe_filename("Art!st", "S()ng", "ยง$%")
'Art_st - S__ng - ___'
Of course, if your a is a dictionary and you always want the same fields from that dict (artist, title, etc.) you could also just pass the dict itself and extract the fields within the function.
I had a similar problem recently, the best solution is probably to use regex, but I'm too lazy to learn that, so I wrote a replaceAll function:
def replaceAll(string, characters, replacement):
s = string
for i in characters:
s = s.replace(i, replacement)
return s
and then I used it to make a usable filename:
fName = replaceAll(name, '*<>?|"/\\.,\':', "")
in your case it would be:
filename = replaceAll(a['artist'] + a['title'] + a['date'] + a['article_url'], '*<>?|"/\\.,\':', "-")
Related
I add different Values to the Houdini Variables with Python.
Some of these Variables are file pathes and end with an "/" - others are just names and do not end with an "/".
In my current code I use [:-1] to remove the last character of the filepath, so I dont have the "/".
The problem is, that if I add a Value like "Var_ABC", the result will be "Var_AB" since it also removes the last character.
How can i remove the last character only if the last character is a "/"?
Thats what I have and it works so far:
def set_vars():
count = hou.evalParm('vars_names')
user_name = hou.evalParm('user_name')
for idx in range( 1,count+1):
output = hou.evalParm('vars_' + str(idx))
vars_path_out = hou.evalParm('vars_path_' + str(idx))
vars_path = vars_path_out[:-1]
hou.hscript("setenv -g " + output + "=" + vars_path)
final_vars = hou.hscript("setenv -g " + output + "=" + vars_path)
hou.ui.displayMessage(user_name +", " + "all variables are set.")
Thank you
As #jasonharper mentioned in the comments, you should probably use rstrip here. It is built-in and IMO more readable than the contitional one-liner:
vars_path_out.rstrip('/')
This will strip out those strings which end with / and return without that ending. Otherwise it will return your string as-is.
Try this in your code:
vars_path_out = hou.evalParm('vars_path_' + str(idx))
if vars_path_out[-1] == '/':
vars_path = vars_path_out[:-1]
or
based on the comment of jasonharper
vars_path = vars_path_out.rstrip('/')
This is much better than the first
Use endswith method to check if it ends with /
if vars_path_out.endswith('/')
Or simply check the last character:
if vars_path_out[-1] == '/'
Like this:
vars_path = vars_path_out[:-1] if vars_path_out.endswith('/') else vars_path_out
Or like this:
if vars_path_out.endswith('\'):
vars_path = vars_path_out[:-1]
else:
vars_path = vars_path_out
another way is rstrip method:
vars_path = vars_path_out.rstrip('/')
I have a very simple problem that I have been unable to find the solution to, so I thought I'd try my "luck" here.
I have a string that is created using variables and static text altogether. It is as follows:
filename_gps = 'id' + str(trip_id) + '_gps_did' + did + '_start' + str(trip_start) + '_end' + str(trip_end) + '.json'
However my problem is that pylint is complaining about this string reprensentation as it is too long. And here is the problem. How would I format this string representation over multiple lines without it looking weird and still stay within the "rules" of pylint?
At one point I ended up having it looking like this, however that is incredible "ugly" to look at:
filename_gps = 'id' + str(
trip_id) + '_gps_did' + did + '_start' + str(
trip_start) + '_end' + str(
trip_end) + '.json'
I found that it would follow the "rules" of pylint if I formatted it like this:
filename_gps = 'id' + str(
trip_id) + '_gps_did' + did + '_start' + str(
trip_start) + '_end' + str(
trip_end) + '.json'
Which is much "prettier" to look at, but in case I didn't have the "str()" casts, how would I go about creating such a string?
I doubt that there is a difference between pylint for Python 2.x and 3.x, but if there is I am using Python 3.x.
Don't use so many str() calls. Use string formatting:
filename_gps = 'id{}_gps_did{}_start{}_end{}.json'.format(
trip_id, did, trip_start, trip_end)
If you do have a long expression with a lot of parts, you can create a longer logical line by using (...) parentheses:
filename_gps = (
'id' + str(trip_id) + '_gps_did' + did + '_start' +
str(trip_start) + '_end' + str(trip_end) + '.json')
This would work for breaking up a string you are using as a template in a formatting operation, too:
foo_bar = (
'This is a very long string with some {} formatting placeholders '
'that is broken across multiple logical lines. Note that there are '
'no "+" operators used, because Python auto-joins consecutive string '
'literals.'.format(spam))
I want to write mulitiple values in a text file using python.
I wrote the following line in my code:
text_file.write("sA" + str(chart_count) + ".Name = " + str(State_name.groups())[2:-3] + "\n")
Note: State_name.groups() is a regex captured word. So it is captured as a tuple and to remove the ( ) brackets from the tuple I have used string slicing.
Now the output comes as:
sA0.Name = GLASS_OPEN
No problem here
But I want the output to be like this:
sA0.Name = 'GLASS_HATCH_OPENED_PROTECTION_FCT'
I want the variable value to be enclosed inside the single quotes.
Does this work for you?
text_file.write("sA" + str(chart_count) + ".Name = '" + str(State_name.groups())[2:-3] + "'\n")
# ^single quote here and here^
I'm not sure why this isn't working:
import re
import csv
def check(q, s):
match = re.search(r'%s' % q, s, re.IGNORECASE)
if match:
return True
else:
return False
tstr = []
# test strings
tstr.append('testthisisnotworking')
tstr.append('This is a TEsT')
tstr.append('This is a TEST mon!')
f = open('testwords.txt', 'rU')
reader = csv.reader(f)
for type, term, exp in reader:
for i in range(2):
if check(exp, tstr[i]):
print exp + " hit on " + tstr[i]
else:
print exp + " did NOT hit on " + tstr[i]
f.close()
testwords.txt contains this line:
blah, blah, test
So essentially 'test' is the RegEx pattern. Nothing complex, just a simple word. Here's the output:
test did NOT hit on testthisisnotworking
test hit on This is a TEsT
test hit on This is a TEST mon!
Why does it NOT hit on the first string? I also tried \s*test\s* with no luck. Help?
The csv module by default returns blank spaces around words in the input (this can be changed by using a different "dialect"). So exp contains " test" with a leading space.
A quick way to fix this would be to add:
exp = exp.strip()
after you read from the CSV file.
Adding a print repr(exp) to the top of the first for loop shows that exp is ' test', note the leading space.
This isn't that surprising since csv.reader() splits on commas, try changing your code to the following:
for type, term, exp in reader:
exp = exp.strip()
for s in tstr:
if check(exp, s):
print exp + " hit on " + s
else:
print exp + " did NOT hit on " + s
Note that in addition to the strip() call which will remove the leading a trailing whitespace, I change your second for loop to just loop directly over the strings in tstr instead of over a range. There was actually a bug in your current code because tstr contained three values but you only checked the first two because for i in range(2) will only give you i=0 and i=1.
I'm programming an IRC and XMPP bot that needs to convert user provided input to a filename. I have already written a function to do this. Is it sane enough?
Here is the code:
allowednamechars = string.ascii_letters + string.digits + '_+/$.-'
def stripname(name, allowed=""):
""" strip all not allowed chars from name. """
n = name.replace(os.sep, '+')
n = n.replace("#", '+')
n = n.replace("#", '-')
n = n.replace("!", '.')
res = u""
for c in n:
if ord(c) < 31: continue
elif c in allowednamechars + allowed: res += c
else: res += "-" + str(ord(c))
return res
It's a whitelist with extra code to remove control characters and replace os.sep, as well as some repaces to make the filename Google App Engine compatible.
The bot in question is at http://jsonbot.googlecode.com.
So what do you think of it?
urllib.quote(name.encode("utf8")) will produce something human-readable, which should also be safe. Example:
In [1]: urllib.quote(u"foo bar$=+:;../..(boo)\u00c5".encode('utf8'))
Out[1]: 'foo%20bar%24%3D%2B%3A%3B../..%28boo%29%C3%85'
You might consider just doing base64.urlsafe_b64encode(name), which will always produce a safe name, unless you really want a human-readable file name. Otherwise, the number of edge cases is pretty long, and if you forget one of them, you've got a security problem.