I have a complicated JSON object with dictionary information about words and I want to get only the synonyms. I managed to retrieve them, but some words have two or more lists of synonyms (since they can be for example a verb and a noun at the same time). I would like to get only the first list of synonyms. Here is what I've done:
import requests
import json
with open(r'C:\Users...') as file:
list = []
for line in file.readlines():
list += line.split()
for keyword in list:
print(keyword)
ship_api_url = "https://..."
request_data = requests.get(ship_api_url)
data = request_data.text
parsed = json.loads(data)
# print(json.dumps(parsed, indent=3))
for item in parsed:
print(item['meta']['syns'][0])
And here's what I get - note that the word 'watch' has three lists of synonyms, the word 'create' has only one list of synonyms and the word 'created' has two lists of synonyms:
watch
['custodian', 'guard', 'guardian', 'keeper', 'lookout', 'minder', 'picket', 'sentinel', 'sentry', 'warden', 'warder', 'watcher', 'watchman']
['eye', 'follow', 'observe']
['anticipate', 'await', 'expect', 'hope (for)']
create
['beget', 'breed', 'bring', 'bring about', 'bring on', 'catalyze', 'cause', 'do', 'draw on', 'effect', 'effectuate', 'engender', 'generate', 'induce', 'invoke', 'make', 'occasion', 'produce', 'prompt', 'result (in)', 'spawn', 'translate (into)', 'work', 'yield']
created
['begot', 'bred', 'brought', 'brought about', 'brought on', 'catalyzed', 'caused', 'did', 'drew on', 'effected', 'effectuated', 'engendered', 'generated', 'induced', 'invoked', 'made', 'occasioned', 'produced', 'prompted', 'resulted (in)', 'spawned', 'translated (into)', 'worked', 'yielded']
['beget', 'breed', 'bring', 'bring about', 'bring on', 'catalyze', 'cause', 'do', 'draw on', 'effect', 'effectuate', 'engender', 'generate', 'induce', 'invoke', 'make', 'occasion', 'produce', 'prompt', 'result (in)', 'spawn', 'translate (into)', 'work', 'yield']
If I add another [0] after the [0] I already have, I get the first word of each list, not the first whole list as I need...
If I got it right, you want to do something like this:
import requests
import json
with open(r'C:\Users...') as file:
list = []
for line in file.readlines():
list += line.split()
for keyword in list:
print(keyword)
ship_api_url = "https://..."
request_data = requests.get(ship_api_url)
data = request_data.text
parsed = json.loads(data)
# print(json.dumps(parsed, indent=3))
for item in parsed:
for i in item['meta']['syns']:
print(item['meta']['syns'][i])
Also, don't name your variable list as it is reserved variable in Python.
As suggested in a comment by martineau, I solved the problem by adding a break statement after the print(item['meta']['syns'][0]) to stop the loop.
Related
I have list1 let's say:
items=['SETTLEMENT DATE:', 'CASH ACCOUNT:', 'ISIN:', 'TRADE DATE:', 'PRICE CFA', 'CASH ACCOUNT:', 'SECURITY NAME:']
I have a list2 let's say:
split_t=['{1:F01SCBLMUMUXSSU0438794344}{2:O5991054200218SCBLGHACXSSU04387943442002181454N}{3:{108:2175129}}{4:', ':20:EPACK', 'SALE', 'CDI', ':21:EPACK', 'SALE', 'CDI', ':79:ATTN:MU', 'TEAM', 'KINDLY', 'ACCEPT', 'THIS', 'AS', 'AUTHORISATION', 'TO', 'SETTLE', 'TRADE', 'WITH', 'DETAILS', 'BELOW', 'MARKET:', 'COTE', 'DIVOIRE', 'CLIENT', 'NAME:', 'EPACK', 'OFFSHORE', 'ACCOUNT', 'NAME:', 'STANDARD', 'CHARTERED', 'GHANA', 'NOMINEE', 'RE', 'DATABANK', 'EPACK', 'INVESTMENT', 'FUND', 'LTD', 'IVORY', 'COAST', 'TRADE', 'TYPE:', 'DELIVER', 'AGAINST', 'PAYMENT', 'SCA:', '2CEPACKIVO', 'CASH', 'ACCOUNT:', '420551901501', 'TRADE', 'DETAILS:', 'TRADE', 'DATE:', '17.02.2020', 'SETTLEMENT', 'DATE:', '20.02.2020', 'SECURITY', 'NAME:', 'SONATEL', 'ISIN:', 'SN0000000019', 'CLEARING', 'BIC:', 'SCBLCIABSSUXXX', 'QUANTITY:', '10,500', 'PRICE', 'CFA', '14,500.4667', 'CONSIDERATION', 'CFA', '152,254,900.00', 'TOTAL', 'FEES', '1,796,608.00', 'SETTLEMENT', 'AMOUNT', 'CFA', '150,458,292.35', 'CURRENCY:', 'CFA', 'AC:', 'CI0000010373', 'REGARDS', 'STANDARD', 'CHARTERED', 'BANK', '-}']
I want to search contiguously the items of list1 in list2 and return the immediate next element of list2 when there's a match.
As you can see, one item of list1 is probably two contiguous item in list2.
For example, the 1st element of list1, 'SETTLEMENT DATE:', There's a match in list2 and I want to return the next element of the match in list2, '20.02.2020'.
I have written my python function accordingly:
def test(items, split_t):
phrases = [w for w in items]
for i, t in enumerate(split_t):
to_match = split_t[i+1: i+1+len(phrases)]
if to_match and all(p == m for p,m in zip(phrases, to_match)):
return [*map(lambda x:split_t[i])]
Which is returning None even when it has matches as you can see. I might be wrong in implementing the *map in the return statement which I'm failing to understand from debugging. Any help is highly appreciated.
One way is:
>>> import re
>>> def test(items, split_t):
... split_t_str = ' '.join(split_t)
... res = {}
... for i in items:
... m = re.search(rf'(?<={i})\s(.*?)\s', split_t_str)
... res[i] = m.group(1)
... return res
...
>>> test(items, split_t)
{'SETTLEMENT DATE:': '20.02.2020', 'CASH ACCOUNT:': '420551901501', 'ISIN:': 'SN0000000019', 'TRADE DATE:': '17.02.2020', 'PRICE CFA': '14,500.4667', 'SECURITY NAME:': 'SONATEL'}
The above:
creates a str from split_t, i.e., split_t_str,
iterates over items using each element to construct a regex for performing a positive lookbehind assertion (see re's docs) against split_t_str,
stores each element as key in a dict, called res, and the corresponding match as value, and
returns the dict
If there is no spaces in "list 2" items. This way you can.
def match(l1, l2):
result = []
string = ' '.join(l2) + ' '
for i in l1:
index = string.find(i)
if index != -1:
result.append(string[index + len(i) + 1:string.find(' ', index + len(i) + 1)])
return result
print(match(items, split_t))
Output:
['20.02.2020', '420551901501', 'SN0000000019', '17.02.2020', '14,500.4667', '420551901501', 'SONATEL']
I'm doing this for a project. for which I need to do some web-scraping from Wikipedia specifically. This is the second phase of the project, so I need to create a poem about a person that the user enters (they have to have a Wikipedia page). I am using the Datamuse API for python to get some rhyming words which works really well.
Function ->
import requests
def get_10_rhyme_words(word):
key = 'https://api.datamuse.com/words?rel_rhy='
rhyme_words = []
rhymes = requests.get(key + word)
for i in rhymes.json()[0:10]:
rhyme_words.append(i['word'])
return rhyme_words
The criteria for the poem is that it needs to be at least 50 words long and make sense, so I came up with something like this:
“firstName” is nothing like “nameWord1”,
but it sounds a lot like “nameWord2”.
“genderPronoun” is a “professionFinal”,
Which sounds a lot like “professionWord1”.
“genderPronoun”’s favourite food might be waffles,
But it might also be “foodWord1”.
I now close this poem about the gorgeous “firstName”,
By saying “genderPronoun”’s name sounds a lot like “nameWord3”.
professionFinal was a variable used to describe their profession.
It works well for the name, but I get an IndexError every time I run it for the profession.
Name ->
The name poem
Here is a short poem on Serena:
Serena is nothing like hyena, but it sounds a lot like marina.
Profession ->
The Profession Poem (Error)
Here is a short poem on Serena:
Traceback (most recent call last): File "main.py", line 153, in <module> line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.' File "/usr/lib/python3.8/random.py", line 290, in choice raise IndexError('Cannot choose from an empty sequence') from None IndexError: Cannot choose from an empty sequence
Here is the code I am using to make the poem ->
#Writing a poem about the person
firstName = person.split()[0]
foodWord = 'waffles'
print('\nHere is a short poem on {}:\n'.format(firstName))
nameRhymes = get_10_rhyme_words(firstName)
professionRhymes = get_10_rhyme_words(professionFinal)
foodRhymes = get_10_rhyme_words(foodWord)
if gender == 'Male':
heOrShe = 'He'
else:
heOrShe = 'She'
if gender == 'Male':
himOrHer = 'Him'
else:
himOrHer = 'Her'
line1 = firstName + ' is nothing like ' + random.choice(nameRhymes) + ','
line2 = 'but it sounds a lot like ' + random.choice(nameRhymes) + '.'
line3 = heOrShe + ' is a ' + professionFinal + ','
line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.'
line5 = heOrShe + '\'s favourite food might be foodWord,'
line6 = 'but it might also be ' + random.choice(foodRhymes) + '.'
line7 = 'I now close this poem about the gorgeous {},'.format(firstName)
line8 = 'By saying {0}\'s name sounds a lot like {1}'.format(firstName, random.choice(nameRhymes))
print(line1)
print(line2)
print(line3)
print(line4)
print(line5)
print(line6)
print(line7)
print(line8)
**ignore the inconsistency and the lack of loops for printing each line
How do I make it so I don't get the error because frankly, I don't even know why I'm getting it...
Thanks!
(P.S.) Sorry for making it this long. Bye!
You should add a check for what the request returns. If it returns an empty list, it cannot be used as a random.choice() argument, since it requires a list with one or more item.
This part of this error
line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.'
File "/usr/lib/python3.8/random.py",
line 290, in choice
raise IndexError('Cannot choose from an empty sequence')
from None IndexError: Cannot choose from an empty sequence
professionRhymes is probably returning an empty list.
Thanks to everyone that responded. It seems the consensus was enough to make me print the list and see that it comes up empty. Sadly, I am using repl and it doesn't have a debugger. But thanks guys, I found out the problem and will alter my poem to suit the needs of the program. As for the people asking the code, I only needed to check if their profession was that of a scientist, sportsperson, or politician. So I made a list, made a for loop check for keywords related to professions, then picked the right one. That is what professionFinal was.
Code:
#Finding their profession
#Declaring keywords for each profession
sportspersonKeywords = ['Sportsperson', 'Sportsman', 'Sportsman', 'Sports', 'Sport', 'Coach', 'Game', 'Olympics', 'Paralympics', 'Medal', 'Bronze', 'Silver', 'Gold', 'Player', 'sportsperson', 'sportsman', 'sportsman', 'sports', 'sport', 'coach', 'game', 'olympics', 'paralympics', 'medal', 'bronze', 'silver', 'gold', 'player', 'footballer', 'Footballer']
scientistKeywords = ['Scientist', 'Mathematician', 'Chemistry', 'Biology', 'Physics', 'Nobel Prize', 'Invention', 'Discovery', 'Invented', 'Discovered', 'science', 'scientist', 'mathematician', 'chemistry', 'biology', 'physics', 'nobel prize', 'invention', 'discovery', 'invented', 'discovered', 'science', 'Physicist', 'physicist', 'chemist', 'Chemist', 'Biologist', 'biologist']
politicianKeywords = ['Politician', 'Politics', 'Election', 'President', 'Vice-President', 'Vice President', 'Senate', 'Senator', 'Representative', 'Democracy', 'politician', 'politics', 'election', 'president', 'vice-president', 'vice president', 'senate', 'senator', 'representative', 'democracy']
#Declaring the first sentence (from the summary)
firstSentence = summary.split('.')[0]
profession = ['Scientist', 'Sportsperson', 'Politician']
professionFinal = ''
#Splitting the first sentence of the summary into separate words
firstSentenceList = firstSentence.split()
#Checking each word in the first sentence against the keywords in each profession to try to get a match
for i in firstSentenceList:
if i in sportspersonKeywords:
professionFinal = profession[1]
break
elif i in scientistKeywords:
professionFinal = profession[0]
break
elif i in politicianKeywords:
professionFinal = profession[2]
break
#if a match is found, then that person has that profession, if not, then their profession is not in our parameters
if professionFinal == '':
print('[PROFESSION]: NOT A SPORTPERSON, SCIENTIST, OR POLITICIAN')
else:
print('[PROFESSION]: ' + professionFinal)
Thanks guys!
So I have a data set with user, date, and post columns. I'm trying to generate a column of the calories that foods contain in the post column for each user. This dataset has a length of 21, and the code below finds the food words, get their calorie value, append it to that user's respective calorie list, and append that list to the new column. The new generated column, however, somehow has a length of 25:
Current data: 21
New column: 25
Does anybody know why this occurs? Here is the code below and samples of what the original dataset and the new column look like:
while len(col) < len(data['post']):
for post, api_id, api_key in zip(data['post'], ids_keys.keys(), ids_keys.values()): # cycles through text data & api keys
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'x-app-id': api_id,
'x-app-key': api_key,
'x-remote-user-id': '0'
}
calories = []
print('Current data:', len(data['post']), '\n New column: ', len(col)) # prints length of post vs new cal column
for word in eval(post):
if word not in food:
continue
else:
print('Detected Word: ', word)
query = {'query': '{}'.format(word)}
try:
response = requests.request("POST", url, headers=headers, data=query)
except KeyError as ke:
print(ke, 'Out of calls, next key...')
ids_keys.pop(api_id) # drop current api id & key from dict if out of calls
print('API keys left:', len(ids_keys))
finally:
stats = response.json()
print('Food Stats: \n', stats)
print('Calories in food: ', stats['foods'][0]['nf_calories'])
calories.append(stats['foods'][0]['nf_calories'])
print('Current Key', api_id, ':', api_key)
col.append(calories)
if len(col) == len(data['post']):
break
I attempted to use the while loop to only append up to the length of the dataset, but to no avail.
Original Data Set:
pd.DataFrame({'user':['avskk', 'janejellyn', 'firlena227','...'],
'date': ['October 22', 'October 22', 'October 22','...'],
'post': [['autumn', 'fully', 'arrived', 'cooking', 'breakfast', 'toaster','...'],
['breakfast', 'chinese', 'sticky', 'rice', 'tempeh', 'sausage', 'cucumber', 'salad', 'lunch', 'going', 'lunch', 'coworkers', 'probably', 'black', 'bean', 'burger'],
['potato', 'bean', 'inspiring', 'food', 'day', 'today', '...']]
})
New Column:
pd.DataFrame({'Calories': [[22,33,45,32,2,5,7,9,76],
[43,78,54,97,32,56,97],
[23,55,32,22,7,99,66,98,54,35,33]]
})
I have a list of items : eg:
a = ['when', '#i am here','#go and get it', '#life is hell', 'who', '#i am here','#go and get it',]
I want to merge the list items based on condition i.e merge all the items till the item has # in first place and replace it with when or who. The output I want is :
['when', 'when i am here','when go and get it', 'when life is hell', 'who', 'who i am here','who go and get it',]
You can iterate over a, save the word if it does not start with'#', or replace '#' with the saved word if it does:
for i, s in enumerate(a):
if s.startswith('#'):
a[i] = p + s[1:]
else:
p = s + ' '
a becomes:
['when', 'when i am here', 'when go and get it', 'when life is hell', 'who', 'who i am here', 'who go and get it']
Just going off the info you provided, you could do this.
a = ['when', '#i am here','#go and get it', '#life is hell', 'who', '#i am here','#go and get it']
whoWhen = "" #are we adding 'who or when'
output = [] #new list
for i in a: #loop through
if " " not in i: #if there's only 1 word
whoWhen = i + " " #specify we will use that word
output.append(i.upper()) #put it in the list
else:
output.append(i.replace("#", whoWhen)) #replace hashtag with word
print(output)
Prints:
['WHEN', 'when i am here', 'when go and get it', 'when life is hell', 'WHO', 'who i am here', 'who go and get it']
Process returned 0 (0x0) execution time : 0.062 s
Press any key to continue . . .
Here you go:
def carry_concat(string_list):
replacement = "" # current replacement string ("when" or "who" or whatever)
replaced_list = [] # the new list
for value in string_list:
if value[0] == "#":
# add string with replacement
replaced_list.append(replacement + " " + value[1:])
else:
# set this string as the future replacement value
replacement = value
# add string without replacement
replaced_list.append(value)
return replaced_list
a = ['when', '#i am here','#go and get it', '#life is hell', 'who', '#i am here','#go and get it',]
print(a)
print(carry_concat(a))
This prints:
['when', '#i am here', '#go and get it', '#life is hell', 'who', '#i am here', '#go and get it']
['when', 'when i am here', 'when go and get it', 'when life is hell', 'who', 'who i am here', 'who go and get it']
I was getting from website a json with tags.
['adventure in the comments', 'artist:s.guri', 'clothes', 'comments locked down', 'dashie slippers', 'edit', 'fractal', 'no pony', 'recursion', 'safe', 'simple background', 'slippers', 'tanks for the memories', 'the ride never ends', 'transparent background', 'vector', 'wat', 'we need to go deeper']
And i want to print it more or less like that
#adventureinthecomments #artist:s.guri #clothes #commentslockeddown #dashie #slippers #edit #fractal #nopony #recursion
Does somebody knows what method i need to use to remove all comas an add hashtag before word?
P.S Using Python 3
One of the ways is to join to the single string with '#' and strip all white spaces and replace '#' with ' #' (with space)
arr = ['adventure in the comments', 'artist:s.guri', 'clothes', 'comments locked down', 'dashie slippers', 'edit', 'fractal', 'no pony', 'recursion', 'safe', 'simple background', 'slippers', 'tanks for the memories', 'the ride never ends', 'transparent background', 'vector', 'wat', 'we need to go deeper']
s= "#"
res = '#' + s.join(arr)
newVal = res.replace(' ','')
newNew = newVal.replace('#', ' #')
print(newNew)
What's the rule for split the original sentences?, because the first one looks like
'adventure in the comments' = '#adventureinthecomments'
but
'comments locked down' is splitted to #comments #locked #down
?
If there is no rules this could works
>>> jsontags = ['adventure in the comments', 'artist:s.guri', 'clothes', 'comments locked down', 'dashie slippers', 'edit', 'fractal', 'no pony', 'recursion', 'safe', 'simple background', 'slippers', 'tanks for the memories', 'the ride never ends', 'transparent background', 'vector', 'wat', 'we need to go deeper']
>>> '#'+' #'.join([tag.replace(' ','') for tag in jsontags])
This will be the result
'#adventureinthecomments #artist:s.guri #clothes #commentslockeddown #dashieslippers #edit #fractal #nopony #recursion #safe #simplebackground #slippers #tanksforthememories #therideneverends #transparentbackground #vector #wat #weneedtogodeeper'