I have a very simple problem that I have been unable to find the solution to, so I thought I'd try my "luck" here.
I have a string that is created using variables and static text altogether. It is as follows:
filename_gps = 'id' + str(trip_id) + '_gps_did' + did + '_start' + str(trip_start) + '_end' + str(trip_end) + '.json'
However my problem is that pylint is complaining about this string reprensentation as it is too long. And here is the problem. How would I format this string representation over multiple lines without it looking weird and still stay within the "rules" of pylint?
At one point I ended up having it looking like this, however that is incredible "ugly" to look at:
filename_gps = 'id' + str(
trip_id) + '_gps_did' + did + '_start' + str(
trip_start) + '_end' + str(
trip_end) + '.json'
I found that it would follow the "rules" of pylint if I formatted it like this:
filename_gps = 'id' + str(
trip_id) + '_gps_did' + did + '_start' + str(
trip_start) + '_end' + str(
trip_end) + '.json'
Which is much "prettier" to look at, but in case I didn't have the "str()" casts, how would I go about creating such a string?
I doubt that there is a difference between pylint for Python 2.x and 3.x, but if there is I am using Python 3.x.
Don't use so many str() calls. Use string formatting:
filename_gps = 'id{}_gps_did{}_start{}_end{}.json'.format(
trip_id, did, trip_start, trip_end)
If you do have a long expression with a lot of parts, you can create a longer logical line by using (...) parentheses:
filename_gps = (
'id' + str(trip_id) + '_gps_did' + did + '_start' +
str(trip_start) + '_end' + str(trip_end) + '.json')
This would work for breaking up a string you are using as a template in a formatting operation, too:
foo_bar = (
'This is a very long string with some {} formatting placeholders '
'that is broken across multiple logical lines. Note that there are '
'no "+" operators used, because Python auto-joins consecutive string '
'literals.'.format(spam))
Related
I am trying to build a quick script that extracts only certain information from invoice PDFs without using regex.
When I try to define the grammar for, say, electric usage, I get an error "cannot import name 'parseString' from 'pyparsing'"
I have tried reinstalling, modifying casing from camel to snake, etc etc but I am at a loss at this point.
Here is the (I think) relevant documentation:
https://pyparsing-docs.readthedocs.io/en/latest/pyparsing.html
The code:
electric_usage = pp.Word(nums) + ',' + pp.Word(nums) + 'kwh'
dates_1 = pp.Word(nums) + '-' + pp.Word(nums) + '-' + pp.Word(nums)
dates_2 = pp.Word(nums) + '/' + pp.Word(nums) + '/' + pp.Word(nums)
for str in pdf_text:
usage_pulled = electric_usage.parseString(pdf_text)
print(usage_pulled)
here is an example of one of the regex patterns that actually seems to work to pull usage values:
'[0-9]+[0-9]+[0-9]+[,]+[0-9]+[0-9]+[0-9]'
and cost:
'[$]+[0-9]+[0-9]+[,]+[0-9]+[0-9]+[0-9]+[.]+[0-9]+[0-9]+$'
I have a python script that scrapes a webpage and downloads the mp3s found on it.
I am trying to name the files using elements that I have successfully captured in a separate function.
I am having trouble naming the downloaded files, this is what I have so far:
def make_safe_filename(text: str) -> str:
"""convert forbidden chars to underscores"""
return ''.join(c if (c.isalnum() or c.isspace()) else '_' for c in text).strip('_ ')
filename = make_safe_filename(a['artist'] + a['title'] + a['date'] + a['article_url'])
I am trying to save the file name as "Artist - Title - Date - Article_url" however I am struggling to do this. At the moment the variables are all mashed together without spaces, eg. ArtistTitleDateArticle_url.mp3
I've tried
filename = make_safe_filename(a['artist'] + "-" + a['title'] + "-" + a['date'] + "-" +
a['article_url'])
but this throws up errors.
Can anyone shed some light on where I am going wrong? I know it's something to do with combining variables but I am stuck. Thanks in advance.
I am guessing your a is a dictionary? Maybe you could clarify this in your question? Also what do you typically have in a['article_url']? Could you also post the traceback?
This is my attempt (note: no changes to the function):
def make_safe_filename(text: str) -> str:
"""convert forbidden chars to underscores"""
return ''.join(c if (c.isalnum() or c.isspace()) else '_' for c in text).strip('_ ')
a = {
'artist': 'Metallica',
'title': 'Nothing Else Matters',
'date': '1991',
'article_url': 'unknown',
}
filename = make_safe_filename(a['artist'] + '-' + a['title'] + '-' + a['date'] + '-' + a['article_url'])
print(filename)
Which produced the following output:
Metallica_Nothing Else Matters_1991_unknown
You code should actually work, but if you add the - before passing the joined string to the function, it will just replace those with _ as well. Instead, you could pass the individual fields and then join those in the function, after replacing the "illegal" characters for each field individually. Also, you could regular expressions and re.sub for the actual replacing:
import re
def safe_filename(*fields):
return " - ".join(re.sub("[^\w\s]", "_", s) for s in fields)
>>> safe_filename("Art!st", "S()ng", "ยง$%")
'Art_st - S__ng - ___'
Of course, if your a is a dictionary and you always want the same fields from that dict (artist, title, etc.) you could also just pass the dict itself and extract the fields within the function.
I had a similar problem recently, the best solution is probably to use regex, but I'm too lazy to learn that, so I wrote a replaceAll function:
def replaceAll(string, characters, replacement):
s = string
for i in characters:
s = s.replace(i, replacement)
return s
and then I used it to make a usable filename:
fName = replaceAll(name, '*<>?|"/\\.,\':', "")
in your case it would be:
filename = replaceAll(a['artist'] + a['title'] + a['date'] + a['article_url'], '*<>?|"/\\.,\':', "-")
I started learning python two days ago. Today I built a web scraping script which pulls data from yahoo finance and puts it in a csv file. The problem I have is that some values are string because yahoo finance displays them as such.
For example: Revenue: 806.43M
When I copy them into the csv I cant use them for calculation so I was wondering if it is possible to separate the "806.43" and "M" while still keeping both to see the unit of the number and put them in two different columns.
for the excel writing I use this command:
f.write(revenue + "," + revenue_value + "\n")
where:
print(revenue)
Revenue (ttm)
print(revenue_value)
806.43M
so in the end I should be able to use a command which looks something like this
f.write(revenue + "," + revenue_value + "," + revenue_unit + "\n")
where revenue_value is 806.43 and revenue_unit is M
Hope someone could help with the problem.
I believe the easiest way is to parse the number as string and convert it to a float based on the unit in the end of the string.
The following should do the trick:
def parse_number(number_str) -> float:
mapping = {
"K": 1000,
"M": 1000000,
"B": 1000000000
}
unit = number_str[-1]
number_float = float(number_str[:-1])
return number_float * mapping[unit]
And here's an example:
my_number = "806.43M"
print(parse_number(my_number))
>>> 806430000.0
You can always try regular expressions.
Here's a pretty good online tool to let you practice using Python-specific standards.
import re
sample = "Revenue (ttm): 806.43M"
# Note: the `(?P<name here>)` section is a named group. That way we can identify what we want to capture.
financials_pattern = r'''
(?P<category>.+?):?\s+? # Capture everything up until the colon
(?P<value>[\d\.]+) # Capture only numeric values and decimal points
(?P<unit>[\w]*)? # Capture a trailing unit type (M, MM, etc.)
'''
# Flags:
# re.I -> Ignore character case (upper vs lower)
# re.X -> Allows for 'verbose' pattern construction, as seen above
res = re.search(financials_pattern, sample, flags = re.I | re.X)
Print our dictionary of values:
res.groupdict()
Output:
{'category': 'Revenue (ttm)',
'value': '806.43',
'unit': 'M'}
We can also use .groups() to list results in a tuple.
res.groups()
Output:
('Revenue (ttm)', '806.43', 'M')
In this case, we'll immediately unpack those results into your variable names.
revenue = None # If this is None after trying to set it, don't print anything.
revenue, revenue_value, revenue_unit = res.groups()
We'll use fancy f-strings to print out both your f.write() call along with the results we've captured.
if revenue:
print(f'f.write(revenue + "," + revenue_value + "," + revenue_unit + "\\n")\n')
print(f'f.write("{revenue}" + "," + "{revenue_value}" + "," + "{revenue_unit}" + "\\n")')
Output:
f.write(revenue + "," + revenue_value + "," + revenue_unit + "\n")
f.write("Revenue (ttm)" + "," + "806.43" + "," + "M" + "\n")
I have three UV sensors - integers output; one BME280 - float output (temperature and pressure); and one GPS Module - float output.
I need to build a string in this form - #teamname;temperature;pressure;uv_1;uv_2;uv_3;gpscoordinates#
and send them via ser.write at least one time per second- I'm using APC220 Module
Is this the right (and fastest) way to do it?
textstr = str("#" + "teamname" + ";" + str(temperature) + ";" + str(pressure) + ";" + str(uv_1) + ";" + str(uv_2) + ";" + str(uv_3) + "#")
(...)
ser.write(('%s \n'%(textstr)).encode('utf-8'))
You may try something like this:
vars = [teamname, temperature, pressure, uv_1, uv_2, uv_3, gpscoordinates]
joined = ';'.join( map( str, vars ))
ser.write( '#%s# \n', joined )
If using python 3.6+ then you can do this instead
textstr = f"#teamname;{temperature};{pressure};{uv_1};{uv_2};{uv_3}# \n"
(...)
ser.write((textstr).encode('utf-8'))
If teamname and gpscoordinates are also variables then add them the same way
textstr = f"#{teamname};{temperature};{pressure};{uv_1};{uv_2};{uv_3};{gpscoordinates}# \n"
(...)
ser.write((textstr).encode('utf-8'))
For more info about string formatting
https://realpython.com/python-f-strings/
It might improve readability to use python's format:
textstr = "#teamname;{};{};{};{};gpscoordinates#".format(temperature, pressure, uv_1, uv_2, uv_3)
ser.write(('%s \n'%(textstr)).encode('utf-8'))
assuming gpscoordinates is text (it's not in your attempted code). If it's a variable, then replace the text with {} and add it as a param to format.
I want to write mulitiple values in a text file using python.
I wrote the following line in my code:
text_file.write("sA" + str(chart_count) + ".Name = " + str(State_name.groups())[2:-3] + "\n")
Note: State_name.groups() is a regex captured word. So it is captured as a tuple and to remove the ( ) brackets from the tuple I have used string slicing.
Now the output comes as:
sA0.Name = GLASS_OPEN
No problem here
But I want the output to be like this:
sA0.Name = 'GLASS_HATCH_OPENED_PROTECTION_FCT'
I want the variable value to be enclosed inside the single quotes.
Does this work for you?
text_file.write("sA" + str(chart_count) + ".Name = '" + str(State_name.groups())[2:-3] + "'\n")
# ^single quote here and here^