I am a beginner in Python and I decided to attempt a script to help me in my part time job as a weather observer. Essentially, I have a list of observations and I have written regex's to gather the information that I need and put it in a spreadsheet using xlwings. I have a list of partial observations and I am trying to extract the sky condition from these partial obs. The sky condition contains any of the words that I have in the list called "words". I am sure there is a much better way to do this but I am trying to have the script look at the items in each element of the list and then determine whether or not one of the key words is in the element. If it finds one, I am adding it to a new list called 'found' and then I eventually want to add this information to the excel spreadsheet. My problem is I am having trouble finding where to increment over the different sky conditions. I need it to iterate over each element in a line on the skycons list, then I need it to increment to the next line. I feel like I have moved the increment to several different parts of the script but it still won't work properly. It will either increment too soon and will not iterate over all the elements of the line or it will not increment at all. Here is the code...
skycons = [' -RA BR BKN008 OVC012 09/08 ',
' RA BR BKN008 OVC012 09/08 ',
' R02/2600VP6000FT -RA BR BKN008 OVC012 09/08 ',
' -RA BR SCT007 BKN013 OVC019 09/08 ',
' CLR 09/08 ']
i = 0
words = ['FEW', 'SCT', 'BKN', 'OVC', 'CLR']
found = []
for item in skycons[i].split():
print(skycons[i].split())
for word in words:
print(word)
if item.startswith(word):
found += item
print(found)
sky = " ".join(found)
print(sky)
i+=1
This may be a mess and I am open to suggestions. I am basically needing to grab the sky condition from each observation and insert it in a spreadsheet. I have tried doing this strictly with a regex but it would end up grabbing other elements of the observation that is not present in the above list. Any assistance is greatly appreciated.
found += item doesn't do what you seem to think it does. Look at this:
>>> found = []
>>> found += 'OVC'
>>> found
['O', 'V', 'C']
I don't think that's what you're expecting. It appends each character to the list, not the entire string. Try append() instead:
found.append(item)
Also, i isn't getting incremented at the right times. I think you need another outer loop to go through the METARs. Here's what I ended up with:
skycons = [' -RA BR BKN008 OVC012 09/08 ',
' RA BR BKN008 OVC012 09/08 ',
' R02/2600VP6000FT -RA BR BKN008 OVC012 09/08 ',
' -RA BR SCT007 BKN013 OVC019 09/08 ',
' CLR 09/08 ']
words = ['FEW', 'SCT', 'BKN', 'OVC', 'CLR']
for metar in skycons:
found = []
for item in metar.split():
for word in words:
if item.startswith(word):
found.append(item)
sky = " ".join(found)
print(sky)
... which outputs...
BKN008 OVC012
BKN008 OVC012
BKN008 OVC012
SCT007 BKN013 OVC019
CLR
Related
my list = [
'<instance id="line-nw8_059:8174:">',
' advanced micro devices inc sunnyvale calif and siemens ag of west germany '
'said they agreed to jointly develop manufacture and market microchips for '
'data communications and telecommunications with an emphasis on the '
'integrated services digital network the integrated services digital '
'network or isdn is an international standard used to transmit voice data '
'graphics and video images over telephone <head>line</head> ',
'<instance id="line-nw7_098:12684:">',
' in your may 21 story about the phone industry billing customers for '
'unconnected calls i was surprised that you did not discuss whether such '
'billing is appropriate a caller who keeps a <head>line</head> open '
'waiting for a connection uses communications switching and transmission '
'equipment just as if a conversation were taking place ',
'<instance id="line-nw8_106:13309:">'
]
I have to replace all of the <instance id="line-nw8_106:13309:"> (any variation) with a whitespace, along with added them all to their own list. I have figured out how to add them to their own list with regex like this:
instanceList =[]
instanceMatch = '<instance id="([^"]*)"'
for i in contentsTestSplit:
matchy = re.match(instanceMatch,i)
if matchy:
instanceMatchy = matchy.group(0)
instanceList.append(instanceMatchy)
print("instance list: ",instanceList)
So this works, but I can't figure out how to replace all of them with white spaces? I have attempted this along with using replace methods and it is not working, any help would be appreicated:
instanceList =[]
instanceMatch = '<instance id="([^"]*)"'
pat = re.compile(r'<instance id="([^"]*)"')
for i in contentsTestSplit:
matchy = re.match(instanceMatch,i)
if matchy:
instanceMatchy = matchy.group(0)
instanceList.append(instanceMatchy)
i = pat.sub("",i)
print("instance list: ",instanceList)
Also have attempted this: but it doesn't replace, but does locate the occurrences accurately
for i in contentsTestSplit:
if i.startswith("<instance id="):
i.replace(i,"")
You can use regex with substitution to replace all instances with a whitespace. You can then pass it a custom function to return the matches and append the results to your instances list.
def _sub(match):
instanceList.append(match[0])
return ''
instanceList =[]
instanceMatch = '<instance id="([^"]*)"'
for i in my_list:
re.sub(instanceMatch, _sub, i)
I didn't know what you wanted to do with the processed data, but re.sub(instanceMatch, _sub, i) returns your text with the substitutions.
I wonder why id="([^"])" represent a string and if id="[^"]" it represent nothing. what's
the using of ().
I'm trying to get rid of the whitespace on this list however I am unable to. Anyone know where I am going wrong with my code?
love_maybe_lines = ['Always ', ' in the middle of our bloodiest battles ', 'you lay down your arms', ' like flowering mines ', ' to conquer me home. ']
love_maybe_lines_joined = '\n'.join(love_maybe_lines)
love_maybe_lines_stripped = love_maybe_lines_joined.strip()
print(love_maybe_lines_stripped)
Terminal:
Always
in the middle of our bloodiest battles
you lay down your arms
like flowering mines
to conquer me home.
love_maybe_lines = ['Always ', ' in the middle of our bloodiest battles ', 'you lay down your arms', ' like flowering mines ', ' to conquer me home. ']
love_maybe_lines = [item.strip() for item in love_maybe_lines]
That may help.
I have the below text:
subject = "Madam / Dear Sir, ', ' ', 'The terrorist destroyed the building at 23:45 with a remote
detonation device', ' ', 'so a new line character is appended to the string"
I have used the below regex code to search :
[p for p in re.split('\,', str(subject)) if re.search('(M[a-z]+ / \w+ \w+r)', p)]
getting output: Madam / Dear Sir
Expected output : The terrorist destroyed the building at 23:45 with a remote
detonation device
Please note the expected output should always be after the regex expression is found.
Can you please help me on this?
You can extend the split a bit more \s*',\s*'\s* to match all the parts that you don't want until the next part that you do want.
Then use a loop to first match your pattern M[a-z]+ / \w+ \w+r. The get the next item if there is an item present.
Example code
import re
subject = "Madam / Dear Sir, ', ' ', 'The terrorist destroyed the building at 23:45 with a remote detonation device', ' ', 'so a new line character is appended to the string"
filteredList = list(filter(None, re.split("\s*',\s*'\s*", subject)))
l = len(filteredList)
for i, s in enumerate(filteredList):
if re.match(r"M[a-z]+ / \w+ \w+r", s) and i + 1 < l:
print(filteredList[i + 1])
Output
The terrorist destroyed the building at 23:45 with a remote detonation device
Python demo
I have code that looks like this:
import re
activity = "Basketball - Girls 9th"
activity = re.sub(r'\s', ' ', activity).split('-')
activity = str(activity [1:2]) + str(activity [0:1])
print("".join(activity))
I want the output to look like Girl's 9th Basketball, but the current output when printed is
[' Girls 9th']['Basketball ']
I want to get rid of the square brackets. I know I can simply trim it, but I would rather know how to do it right.
You're almost there. When you use .join on a list it creates a string so you can omit that step.
import re
activity = "Basketball - Girls 9th"
activity = re.sub(r'\s', ' ', activity).split('-')
activity = activity[1:2] + activity[0:1]
print(" ".join(activity))
You are stringyfying the lists which is the same as using print(someList) - it is the representation of a list wich puts the [] around it.
import re
activity = "Basketball - Girls 9th"
activity = re.sub(r'\s', ' ', activity).split('-')
activity = activity [1:2] + [" "] + activity [0:1] # reorders and reassignes list
print("".join(activity))
You could consider adding a step:
# remove empty list items and remove trailing/leading strings
cleaned = [x.strip() for x in activity if x.strip() != ""]
print(" ".join(cleaned)) # put a space between each list item
This just resorts the lists and adds the needed space item in between so you output fits.
You can solve it in one line:
new_activity = ' '.join(activity.split(' - ')[::-1])
You can try something like this:
import re
activity = "Basketball - Girls 9th"
activity = re.sub(r'\s', ' ', activity).split('-')
activity = str(activity [1:2][0]).strip() + ' ' + str(activity [0:1][0])
print(activity)
output:
Girls 9th Basketball
So I have 100 million sentences, and for each sentence I'd like to see whether it contains one of 6000 smaller sentences (matching whole words only). So far my code is
smaller_sentences = [...]
for large_sentence in file:
for small_sentence in smaller_sentences:
if ((' ' + small_sentence + ' ') in large_sentence)
or (large_sentence.startswith(small_sentence + ' ')
or (large_sentence.endswith(' ' + small_sentence):
outfile.write(large_sentence)
break
But this code runs prohibitively slowly. Do you know of a faster way to go about doing this?
Without knowing more about the domain (word/sentence length), frequency of read/write/query and specifics around the algorithm.
But, in the first instance you can switch your condition around.
This checks the whole string (slow), then the head (fast), then the tail (fast).
((' ' + small_sentence + ' ') in large_sentence)
or (large_sentence.startswith(small_sentence + ' ')
or (large_sentence.endswith(' ' + small_sentence):
This checks the head then the tail (fast), then the head (fast), then the whole string. Not huge bump in the Big-O sense, but it might add some speed if you know that the strings might be more likely at the start or finish.
(large_sentence.startswith(small_sentence + ' ')
or (large_sentence.endswith(' ' + small_sentence)
or ((' ' + small_sentence + ' ') in large_sentence)