I was getting from website a json with tags.
['adventure in the comments', 'artist:s.guri', 'clothes', 'comments locked down', 'dashie slippers', 'edit', 'fractal', 'no pony', 'recursion', 'safe', 'simple background', 'slippers', 'tanks for the memories', 'the ride never ends', 'transparent background', 'vector', 'wat', 'we need to go deeper']
And i want to print it more or less like that
#adventureinthecomments #artist:s.guri #clothes #commentslockeddown #dashie #slippers #edit #fractal #nopony #recursion
Does somebody knows what method i need to use to remove all comas an add hashtag before word?
P.S Using Python 3
One of the ways is to join to the single string with '#' and strip all white spaces and replace '#' with ' #' (with space)
arr = ['adventure in the comments', 'artist:s.guri', 'clothes', 'comments locked down', 'dashie slippers', 'edit', 'fractal', 'no pony', 'recursion', 'safe', 'simple background', 'slippers', 'tanks for the memories', 'the ride never ends', 'transparent background', 'vector', 'wat', 'we need to go deeper']
s= "#"
res = '#' + s.join(arr)
newVal = res.replace(' ','')
newNew = newVal.replace('#', ' #')
print(newNew)
What's the rule for split the original sentences?, because the first one looks like
'adventure in the comments' = '#adventureinthecomments'
but
'comments locked down' is splitted to #comments #locked #down
?
If there is no rules this could works
>>> jsontags = ['adventure in the comments', 'artist:s.guri', 'clothes', 'comments locked down', 'dashie slippers', 'edit', 'fractal', 'no pony', 'recursion', 'safe', 'simple background', 'slippers', 'tanks for the memories', 'the ride never ends', 'transparent background', 'vector', 'wat', 'we need to go deeper']
>>> '#'+' #'.join([tag.replace(' ','') for tag in jsontags])
This will be the result
'#adventureinthecomments #artist:s.guri #clothes #commentslockeddown #dashieslippers #edit #fractal #nopony #recursion #safe #simplebackground #slippers #tanksforthememories #therideneverends #transparentbackground #vector #wat #weneedtogodeeper'
Related
I'm doing this for a project. for which I need to do some web-scraping from Wikipedia specifically. This is the second phase of the project, so I need to create a poem about a person that the user enters (they have to have a Wikipedia page). I am using the Datamuse API for python to get some rhyming words which works really well.
Function ->
import requests
def get_10_rhyme_words(word):
key = 'https://api.datamuse.com/words?rel_rhy='
rhyme_words = []
rhymes = requests.get(key + word)
for i in rhymes.json()[0:10]:
rhyme_words.append(i['word'])
return rhyme_words
The criteria for the poem is that it needs to be at least 50 words long and make sense, so I came up with something like this:
“firstName” is nothing like “nameWord1”,
but it sounds a lot like “nameWord2”.
“genderPronoun” is a “professionFinal”,
Which sounds a lot like “professionWord1”.
“genderPronoun”’s favourite food might be waffles,
But it might also be “foodWord1”.
I now close this poem about the gorgeous “firstName”,
By saying “genderPronoun”’s name sounds a lot like “nameWord3”.
professionFinal was a variable used to describe their profession.
It works well for the name, but I get an IndexError every time I run it for the profession.
Name ->
The name poem
Here is a short poem on Serena:
Serena is nothing like hyena, but it sounds a lot like marina.
Profession ->
The Profession Poem (Error)
Here is a short poem on Serena:
Traceback (most recent call last): File "main.py", line 153, in <module> line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.' File "/usr/lib/python3.8/random.py", line 290, in choice raise IndexError('Cannot choose from an empty sequence') from None IndexError: Cannot choose from an empty sequence
Here is the code I am using to make the poem ->
#Writing a poem about the person
firstName = person.split()[0]
foodWord = 'waffles'
print('\nHere is a short poem on {}:\n'.format(firstName))
nameRhymes = get_10_rhyme_words(firstName)
professionRhymes = get_10_rhyme_words(professionFinal)
foodRhymes = get_10_rhyme_words(foodWord)
if gender == 'Male':
heOrShe = 'He'
else:
heOrShe = 'She'
if gender == 'Male':
himOrHer = 'Him'
else:
himOrHer = 'Her'
line1 = firstName + ' is nothing like ' + random.choice(nameRhymes) + ','
line2 = 'but it sounds a lot like ' + random.choice(nameRhymes) + '.'
line3 = heOrShe + ' is a ' + professionFinal + ','
line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.'
line5 = heOrShe + '\'s favourite food might be foodWord,'
line6 = 'but it might also be ' + random.choice(foodRhymes) + '.'
line7 = 'I now close this poem about the gorgeous {},'.format(firstName)
line8 = 'By saying {0}\'s name sounds a lot like {1}'.format(firstName, random.choice(nameRhymes))
print(line1)
print(line2)
print(line3)
print(line4)
print(line5)
print(line6)
print(line7)
print(line8)
**ignore the inconsistency and the lack of loops for printing each line
How do I make it so I don't get the error because frankly, I don't even know why I'm getting it...
Thanks!
(P.S.) Sorry for making it this long. Bye!
You should add a check for what the request returns. If it returns an empty list, it cannot be used as a random.choice() argument, since it requires a list with one or more item.
This part of this error
line4 = 'which sounds a lot like ' + random.choice(professionRhymes) + '.'
File "/usr/lib/python3.8/random.py",
line 290, in choice
raise IndexError('Cannot choose from an empty sequence')
from None IndexError: Cannot choose from an empty sequence
professionRhymes is probably returning an empty list.
Thanks to everyone that responded. It seems the consensus was enough to make me print the list and see that it comes up empty. Sadly, I am using repl and it doesn't have a debugger. But thanks guys, I found out the problem and will alter my poem to suit the needs of the program. As for the people asking the code, I only needed to check if their profession was that of a scientist, sportsperson, or politician. So I made a list, made a for loop check for keywords related to professions, then picked the right one. That is what professionFinal was.
Code:
#Finding their profession
#Declaring keywords for each profession
sportspersonKeywords = ['Sportsperson', 'Sportsman', 'Sportsman', 'Sports', 'Sport', 'Coach', 'Game', 'Olympics', 'Paralympics', 'Medal', 'Bronze', 'Silver', 'Gold', 'Player', 'sportsperson', 'sportsman', 'sportsman', 'sports', 'sport', 'coach', 'game', 'olympics', 'paralympics', 'medal', 'bronze', 'silver', 'gold', 'player', 'footballer', 'Footballer']
scientistKeywords = ['Scientist', 'Mathematician', 'Chemistry', 'Biology', 'Physics', 'Nobel Prize', 'Invention', 'Discovery', 'Invented', 'Discovered', 'science', 'scientist', 'mathematician', 'chemistry', 'biology', 'physics', 'nobel prize', 'invention', 'discovery', 'invented', 'discovered', 'science', 'Physicist', 'physicist', 'chemist', 'Chemist', 'Biologist', 'biologist']
politicianKeywords = ['Politician', 'Politics', 'Election', 'President', 'Vice-President', 'Vice President', 'Senate', 'Senator', 'Representative', 'Democracy', 'politician', 'politics', 'election', 'president', 'vice-president', 'vice president', 'senate', 'senator', 'representative', 'democracy']
#Declaring the first sentence (from the summary)
firstSentence = summary.split('.')[0]
profession = ['Scientist', 'Sportsperson', 'Politician']
professionFinal = ''
#Splitting the first sentence of the summary into separate words
firstSentenceList = firstSentence.split()
#Checking each word in the first sentence against the keywords in each profession to try to get a match
for i in firstSentenceList:
if i in sportspersonKeywords:
professionFinal = profession[1]
break
elif i in scientistKeywords:
professionFinal = profession[0]
break
elif i in politicianKeywords:
professionFinal = profession[2]
break
#if a match is found, then that person has that profession, if not, then their profession is not in our parameters
if professionFinal == '':
print('[PROFESSION]: NOT A SPORTPERSON, SCIENTIST, OR POLITICIAN')
else:
print('[PROFESSION]: ' + professionFinal)
Thanks guys!
I'm making a program that counts how many times a band has played a song from a webpage of all their setlists. I have grabbed the webpage and converted all the songs played into one big list so all I wanted to do was see if the song name was in the list and add to a counter but it isn't working and I can't seem to figure out why.
I've tried using the count function instead and that didn't work
sugaree_counter = 0
link = 'https://www.cs.cmu.edu/~mleone/gdead/dead-sets/' + year + '/' + month+ '-' + day + '-' + year + '.txt'
page = requests.get(link)
page_text = page.text
page_list = [page_text.split('\n')]
print(page_list)
This code returns the list:
[['Winterland Arena, San Francisco, CA (1/2/72)', '', "Truckin'", 'Sugaree',
'Mr. Charlie', 'Beat it on Down the Line', 'Loser', 'Jack Straw',
'Chinatown Shuffle', 'Your Love At Home', 'Tennessee Jed', 'El Paso',
'You Win Again', 'Big Railroad Blues', 'Mexicali Blues',
'Playing in the Band', 'Next Time You See Me', 'Brown Eyed Women',
'Casey Jones', '', "Good Lovin'", 'China Cat Sunflower', 'I Know You Rider',
"Good Lovin'", 'Ramble On Rose', 'Sugar Magnolia', 'Not Fade Away',
"Goin' Down the Road Feeling Bad", 'Not Fade Away', '',
'One More Saturday Night', '', '']]
But when I do:
sugaree_counter = int(sugaree_counter)
if 'Sugaree' in page_list:
sugaree_counter += 1
print(str(sugaree_counter))
It will always be zero.
It should add 1 to that because 'Sugaree' is in that list
Your page_list is a list of lists, so you need two for loops to get the pages, you need to do
for page in page_list:
for item in page:
sugaree_counter += 1
Use sum() and list expressions:
sugaree_counter = sum([page.count('Sugaree') for page in page_list])
I have a list of items : eg:
a = ['when', '#i am here','#go and get it', '#life is hell', 'who', '#i am here','#go and get it',]
I want to merge the list items based on condition i.e merge all the items till the item has # in first place and replace it with when or who. The output I want is :
['when', 'when i am here','when go and get it', 'when life is hell', 'who', 'who i am here','who go and get it',]
You can iterate over a, save the word if it does not start with'#', or replace '#' with the saved word if it does:
for i, s in enumerate(a):
if s.startswith('#'):
a[i] = p + s[1:]
else:
p = s + ' '
a becomes:
['when', 'when i am here', 'when go and get it', 'when life is hell', 'who', 'who i am here', 'who go and get it']
Just going off the info you provided, you could do this.
a = ['when', '#i am here','#go and get it', '#life is hell', 'who', '#i am here','#go and get it']
whoWhen = "" #are we adding 'who or when'
output = [] #new list
for i in a: #loop through
if " " not in i: #if there's only 1 word
whoWhen = i + " " #specify we will use that word
output.append(i.upper()) #put it in the list
else:
output.append(i.replace("#", whoWhen)) #replace hashtag with word
print(output)
Prints:
['WHEN', 'when i am here', 'when go and get it', 'when life is hell', 'WHO', 'who i am here', 'who go and get it']
Process returned 0 (0x0) execution time : 0.062 s
Press any key to continue . . .
Here you go:
def carry_concat(string_list):
replacement = "" # current replacement string ("when" or "who" or whatever)
replaced_list = [] # the new list
for value in string_list:
if value[0] == "#":
# add string with replacement
replaced_list.append(replacement + " " + value[1:])
else:
# set this string as the future replacement value
replacement = value
# add string without replacement
replaced_list.append(value)
return replaced_list
a = ['when', '#i am here','#go and get it', '#life is hell', 'who', '#i am here','#go and get it',]
print(a)
print(carry_concat(a))
This prints:
['when', '#i am here', '#go and get it', '#life is hell', 'who', '#i am here', '#go and get it']
['when', 'when i am here', 'when go and get it', 'when life is hell', 'who', 'who i am here', 'who go and get it']
I have a list that with items that I would like to split again and add as new items in the list. For example given the below list:
pre_songs = ['Totem', 'One, Two, Three', 'Rent', 'Vapors', 'Get Loud > Inspire Strikes Back',
'Enceladus', 'Moon Socket', 'Out of This World > Scheme', 'Walk to the Light',
'When The Dust Settles', 'Click Lang Echo']
I would like to take the "Get Loud > Inspire Strikes Back", and the "Out of This World > Scheme" Item and split them by the ">" and make "Get Loud", "Inspire Strikes Back", "Out of This World", and "Scheme" as separate items in the list.
I tried using the code below but it doesn't work:
pre_setlist = []
for song in pre_songs:
if song.contains('>'):
pre_setlist.append(song.split('>'))
else:
pre_setlist.append(song)
Use extend:
pre_setlist = []
for song in pre_songs:
pre_setlist.extend([x.strip() for x in song.split('>')])
Shorter, if there is always one space before and after the >:
pre_setlist = []
for song in pre_songs:
pre_setlist.extend(song.split(' > '))
Result for both version with your example list:
>>> pre_setlist
['Totem',
'One, Two, Three',
'Rent',
'Vapors',
'Get Loud',
'Inspire Strikes Back',
'Enceladus',
'Moon Socket',
'Out of This World',
'Scheme',
'Walk to the Light',
'When The Dust Settles',
'Click Lang Echo']
This is one way.
from itertools import chain
pre_setlist = []
for song in pre_songs:
if '>' in song:
pre_setlist.append(song.split(' > '))
else:
pre_setlist.append([song])
list(chain.from_iterable(pre_setlist))
# ['Totem', 'One, Two, Three', 'Rent', 'Vapors', 'Get Loud',
# 'Inspire Strikes Back', 'Enceladus', 'Moon Socket',
# 'Out of This World', 'Scheme', 'Walk to the Light',
# 'When The Dust Settles', 'Click Lang Echo']
This can be written more succinctly as a list comprehension:
from itertools import chain
pre_setlist = [[song] if '>' not in song \
else song.split(' > ') for song in pre_songs]
list(chain.from_iterable(pre_setlist))
You can try this:
pre_songs = ['Totem', 'One, Two, Three', 'Rent', 'Vapors', 'Get Loud > Inspire Strikes Back',
'Enceladus', 'Moon Socket', 'Out of This World > Scheme', 'Walk to the Light',
'When The Dust Settles', 'Click Lang Echo']
final_songs = [i for b in [c.split(' > ') for c in pre_songs] for i in b]
Output:
['Totem', 'One, Two, Three', 'Rent', 'Vapors', 'Get Loud', 'Inspire Strikes Back', 'Enceladus', 'Moon Socket', 'Out of This World', 'Scheme', 'Walk to the Light', 'When The Dust Settles', 'Click Lang Echo']
for song in pre_songs:
if '>' in song:
pre_setlist.extend(song.split('>'))
else:
pre_setlist.append(song)
You need to change 2 things
'>' in song instead of song.contains('>').
Use extend instead of append, since song.split gives you a list back
Im trying to learn html scraping for a project, I'm using python and lxml. I've been successful so far in getting the data I needed but now I have another problem. The site that I'm scraping from (op.gg) when you scroll down it adds new tables with more information. When I run my script (below) it only gets the first 50 entries and nothing more. My question is how can I get at least the first 200 names on the page or if it is even possible.
from lxml import html
import requests
page = requests.get('https://na.op.gg/ranking/ladder/')
tree = html.fromstring(page.content)
names = tree.xpath('//td[#class="SummonerName Cell"]/a/text()')
print (names)
Borrow the idea from Pedro, https://na.op.gg/ranking/ajax2/ladders/start=number will give you 50 records start from number, for example:
https://na.op.gg/ranking/ajax2/ladders/start=0 get (1-50),
https://na.op.gg/ranking/ajax2/ladders/start=50 get (51-100),
https://na.op.gg/ranking/ajax2/ladders/start=100 get (101-150),
https://na.op.gg/ranking/ajax2/ladders/start=150 get (151-200),
etc....
After that you could change your scrap code, as the page is different as your original one, suppose you want get first 200 names, here is the amended code:
from lxml import html
import requests
start_url = 'https://na.op.gg/ranking/ajax2/ladders/start='
names_200 = list()
for i in [0,50,100,150]:
dest_url = start_url + str(i)
page = requests.get(dest_url)
tree = html.fromstring(page.content)
names_50 = tree.xpath('//a[not(#target) and not(#onclick)]/text()')
names_200.extend(names_50)
print names_200
print len(names_200)
Output:
[u'am\xc3\xa9liorer', 'pireaNn', 'C9 Ray', 'P1 Pirean', 'Pobelter', 'mulgokizary', 'consensual clown', 'Jue VioIe Grace', 'Deep Learning', 'Keegun', 'Free Papa Chau', 'C9 Gun', 'Dhokla', 'Arrowlol', 'FOX Brandini', 'Jurassiq', 'Win or Learn', 'Acoldblazeolive', u'R\xc3\xa9venge', u'M\xc3\xa9ru', 'Imaqtpie', 'Rohammers', 'blaberfish2', 'qldurtms', u'd\xc3\xa0wolfsclaw', 'TheOddOrange', 'PandaTv 656826', 'stuntopolis', 'Butler Delta', 'P1 Shady', 'Entranced', u'Linsan\xc3\xadty', 'Ablazeolive', 'BukZacH', 'Anivia Kid', 'Contractz', 'Eitori', 'MistyStumpey', 'Prodedgy', 'Splitting', u'S\xc4\x99b B\xc4\x99rnal', 'N For New York', 'Naeun', '5tunt', 'C9 Winter', 'Doubtfull', 'MikeYeung', 'Rikara', u'RAH\xc3\x9cLK', ' Sudzzi', 'joong ki song', 'xWeixin VinLeous', 'rhubarbs', u'Ch\xc3\xa0se', 'XueGao', 'Erry', 'C9 EonYoung', 'Yeonbee', 'M ckg', u'Ari\xc3\xa1na Lovato', 'OmarGod', 'Wiggily', 'lmpactful', 'Str1fe', 'LL Stylish', '2017', 'FlREFLY', 'God Fist Monk', 'rWeiXin VinLeous', 'Grigne', 'fantastic ad', 'bobqinX', 'grigne 1v10', 'Sora1', 'Juuichi san ', 'duoking2', 'SandPaperX', 'Xinthus', 'TwichTv CoMMa', 'xFSN Rin', 'UBC CJ', 'PotIuck', 'DarkWingsForSale', 'Get After lt', 'old chicken', u'\xc4\x86ris', 'VK Deemo', 'Pekin Woof', 'YIlIlIlIlI', 'RiceLegend', 'Chimonaa1', 'DJNDREE5', u'CloudNguy\xc3\xa9n', 'Diamond 1 Khazix', 'dawolfsfang', 'clg imaqtpie69', 'Pyrites', 'Lava', 'Rathma', 'PieCakeLord', 'feed l0rd', 'Eygon', 'Autolycus1', 'FateFalls 20xx', 'nIsHIlEzHIlA', 'C9 Sword', 'TET Fear', 'a very bad time', u'Jur\xc3\xa1ssiq', 'Ginormous Noob', 'Saskioo', 'S D 2 NA', 'C9 Smoothie', 'dufTlalgkqtlek', 'Pants are Dragon', u'H\xc3\xb3llywood', 'Serenitty', 'Waggily ', 'never lucky help', u'insan\xc3\xadty', 'Joyul', 'TheeBrandini', 'FoTheWin', 'RyuShoryu', 'avi is me', 'iKingVex', 'PrismaI', 'An Obese Panda', 'TdollasAKATmoney', 'feud999', 'Soligo', 'Steel I', 'SNH48 Ruri', 'BillyBoss1', 'Annie Bot', 'Descraton', 'Cris', 'GrayHoves', 'RegisZZ', 'lron Pyrite', 'Zaion', 'Allorim', 't d', u'Alex \xc3\xafch', 'godrjsdnd', 'DOUBLELIFTSUCKS', 'John Mcrae', u'Lobo Solitari\xc3\xb3', 'MikeYeunglol', 'i xo u', 'NoahMost', 'Vsionz', 'GladeGleamBright', 'Tuesdayy', 'RealDarkness', 'CC Dean', 'na mid xd LFT', 'Piggy Kitten', 'Abou222', 'TG Strompest', 'MooseHater', 'Day after Day', 'bat8man', 'AxAxAxAxA', 'Boyfriend', 'EvanRL', '63FYWJMbam', 'Fiftygbl', u'Br\xc4\xb1an', 'MlST', u'S\xc3\xb8ren Bjerg', 'FOX Akaadian', '5word', 'tchikou', 'Hakuho', 'Noobkiller291', 'woxiangwanAD', 'Doublelift', 'Jlaol', u'z\xc3\xa3ts', 'Cow Goes Mooooo', u'Be Like \xc3\x91e\xc3\xb8\xc3\xb8', 'Liquid Painless', 'Zergy', 'Huge Rooster', 'Shiphtur', 'Nikkone', 'wiggily1', 'Dylaran', u'C\xc3\xa0m', 'byulbit', 'dirtybirdy82', 'FreeXpHere', u'V\xc2\xb5lcan', 'KaNKl', 'LCS Actor 4', 'bie sha wo', 'Mookiez', 'BKSMOOTH', 'FatMiku']
200
BTW, you could expand it based on your requirement.