I have a list
list = ['1a-b2', '2j-u3', '5k-hy', '1h-j3']
and I have a string like below
main = '{"datatype: null" "country_code":"eu","offset":0,"id":"2y-9k"}'
How can I replace the id value in string, with its respective index in the master list ?
For Eg.,
I want to replace the "1h-j3" in the main string with its index in list. This to be done in a loop for all as well.
I have tried to concatenate using +, % but they did not work, kindly help me with this. Both the list indexes and main variable are of data type string
expected output is as follows
in first loop
main = '{"datatype: null" "country_code":"eu","offset":0,"id":"1a-b2"}'
in second loop
main = '{"datatype: null" "country_code":"eu","offset":0,"id":"2j-u3"}'
in third loop
main = '{"datatype: null" "country_code":"eu","offset":0,"id":"5k-hy"}'
and so on
Well, I can think of 2 approaches, based on the type of data you have for the main variable. See below.
In case, the value is a proper JSON
import json
items_list = ['1a-b2', '2j-u3', '5k-hy', "1h-j3"]
# if main_dict was a valid json
main_dict = json.loads('{"datatype": "null", "country_code":"eu","offset":0,"id":"1h-j3"}')
main_dict["id"] = items_list.index(main_dict["id"])
main_dict = json.dumps(main_dict)
Other case, its a dirty string manipulation. May be there are better ways,
# If its not a valid JSON
str_main = '{"datatype: null" "country_code":"eu","offset":0,"id":"1h-j3"}'
import re
# Use a regex to find the key for replacement.
found = re.findall(r'"id":".*"', str_main, re.IGNORECASE)
if found and len(found) > 0:
key = found[0].split(":")[1].replace('"', '')
_id = items_list.index(key)
str_main = str_main.replace(key, str(_id))
print(str_main)
produced output
{"datatype: null" "country_code":"eu","offset":0,"id":"3"}
--UPDATE--
As per your requirement updated in question, then it will be a simple loop I assume like below.
items_list = ['1a-b2', '2j-u3', '5k-hy', "1h-j3"]
base_str = '{"datatype: null" "country_code":"eu","offset":0,"id":"_ID_"}'
for item in items_list:
main = base_str.replace('_ID_', item)
print(main)
Produces output like
{"datatype: null" "country_code":"eu","offset":0,"id":"1a-b2"}
{"datatype: null" "country_code":"eu","offset":0,"id":"2j-u3"}
{"datatype: null" "country_code":"eu","offset":0,"id":"5k-hy"}
{"datatype: null" "country_code":"eu","offset":0,"id":"1h-j3"}
Related
I am making an item menu in console app whereby i get the data from a text file and print it as shown in the code snippet below.
with open("itemList.txt", "r") as itemFile:
for row in itemFile:
row = row.strip("\n")
itemlist.append(row.split())
print("\n---------------------")
print("Welcome!"+userName)
print("---------------------\n")
for everything in itemlist:
itemCode = everything[0]
itemName = str(everything[1]).split("_")
itemPrice = everything[2]
itemQuantity = everything[3]
print(itemCode+"\t|"+itemName+"\t|"+itemPrice+"\t|"+itemQuantity+"\n")
My problem here is that, in my data there are names like "Full_Cream_Milk" which will be joined together with a "_" so i am using .split() to try to remove it and change print it as "Full Cream Milk", but in doing so it will change my itemName variables into a list which causes the error:
Exception has occurred: TypeError
can only concatenate str (not "list") to str
my question now is that, how do i not make my itemName into a list? Or are there any better ways to remove the "_"?
I have also tried writing it as shown below and it still changes it into string and I'm not sure is it because im getting the data from a list or what because it worked before adding the split() function
itemName = everything[1]
itemName = itemName.split("_")
My guess is you want to split 'Full_Cream_Milk' by '_' and later join the split part as 'FullCreamMilk'. In that case, the following snippet will do the work.
itemName = everything[1]
split_words = itemName.split("_")
itemName = ''.join(split_words)
If you wish to remove all of the underscores, you can use re.sub.
import re
itemName = re.sub('_', '', everything[1])
Or just str.replace.
itemName = everything[1].replace('_', '')
I would like to scrape all of the MLB batters stats for 2018. Here is my code so far:
#import modules
from urllib.request import urlopen
from lxml import html
#fetch url/html
response = urlopen("https://www.baseball-reference.com/leagues/MLB/2018-standard-batting.shtml")
content = response.read()
tree = html.fromstring( content )
#parse data
comment_html = tree.xpath('//comment()[contains(., "players_standard_batting")]')[0]
comment_html = str(comment_html).replace("-->", "")
comment_html = comment_html.replace("<!--", "")
tree = html.fromstring( comment_html )
for batter_row in tree.xpath('//table[#id="players_standard_batting"]/tbody/tr[contains(#class, "full_table")]'):
csk = batter_row.xpath('./td[#data-stat="player"]/#csk')[0]
When I scraped all of the batters there is 0.01 attached to each name. I tried to remove attached numbers using the following code:
bat_data = [csk]
string = '0.01'
result = []
for x in bat_data :
if string in x:
substring = x.replace(string,'')
if substring != "":
result.append(substring)
else:
result.append(x)
print(result)
This code removed the number, however, only the last name was printed:
Output:
['Zunino, Mike']
Also, there is a bracket and quotations around the name. The name is also in reverse order.
1) How can I print all of the batters names?
2) How can I remove the quotation marks and brackets?
3) Can I reverse the order of the names so the first name gets printed and then the last name?
The final output I am hoping for would be all of the batters names like so: Mike Zunino.
I am new to this site... I am also new to scraping/coding and will greatly appreciate any help I can get! =)
You can do the same in different ways. Here is one such approach which doesn't require post processing. You get the names how you wanted to get:
from urllib.request import urlopen
from lxml.html import fromstring
url = "https://www.baseball-reference.com/leagues/MLB/2018-standard-batting.shtml"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for batter_row in tree.xpath('//table[contains(#class,"stats_table")]//tr[contains(#class,"full_table")]'):
csk = batter_row.xpath('.//td[#data-stat="player"]/a')[0].text
print(csk)
Output you may get like:
Jose Abreu
Ronald Acuna
Jason Adam
Willy Adames
Austin L. Adams
You get only the last batter because you are overwriting the value of csk each time in your first loop. Initialize the empty list bat_data first and then add each batter to it.
bat_data= []
for batter_row in blah:
csk = blah
bat_data.append(csk)
This will give you a list of all batters, ['Abreu,Jose0.01', 'Acuna,Ronald0.01', 'Adam,Jason0.01', ...]
Then loop through this list but you don't have to check if string is in the name. Just do x.replace('0.01', '') and then check if the string is empty.
To reverse the order of the names
substring = substring.split(',')
substring.reverse()
nn = " ".join(substring)
Then append nn to the result.
You are getting the quotes and the brackets because you are printing the list. Instead iterate through the list and print each item.
Your code edited assuming you got bat_data correctly:
for x in bat_data :
substring = x.replace(string,'')
if substring != "":
substring = substring.split(',')
substring.reverse()
substring = ' '.join(substring)
result.append(substring)
for x in result:
print(x)
1) Print all batter names
print(result)
This will print everything in the result object. If it’s not printing what you expect then there’s something else wrong going on.
2) Remove quotations
The brackets are due to it being an array object. Try this...
print(result[0])
This will tell the interpreter to print result at the 0 index.
3) Reverse order of names
Try
name = result[0].split(“ “).reverse()[::-1]
I have a list in Python i.e:
['E/123', 'E/145']
I want to add this to a SQL statement that I made:
WHERE GrantInformation.GrantRefNumber LIKE 'E/123'
OR GrantInformation.GrantRefNumber LIKE 'E/145'
The code I have is:
items = []
i = 0
while 1:
i += 1
item = input('Enter item %d: '%i)
if item == '':
break
items.append(item)
print(items)
refList = " OR GrantInformation.GrantRefNumber LIKE ".join(items)
The problem is, when I insert that String into my SQL it is a String so it is looking for WHERE GrantInformation.GrantRefNumber Like 'ES/P004355/1 OR GrantInformation.GrantRefNumber LIKE ES/P001452/1'
Which obviously returns nothing as 'ES/P004355/1 OR GrantInformation.GrantRefNumber LIKE ES/P001452/1' does not appear in the field.
How do i do it so the ' GrantInformation.GrantRefNumber LIKE ' is not a String?
Thank you
The preferred way to do this, is to use a ORM like SQLAlchemy, which
does the query construction for you and you dont have to make the string concentrations yourself.
join(), adds the string between all the items in the array, that is passed as argument. You would need to add the condition into the array as well:
>>> items = ['A', 'B']
>>> " OR ".join(["GrantInformation.GrantRefNumber LIKE '%s'" % num for num in items])
"GrantInformation.GrantRefNumber LIKE 'A' OR GrantInformation.GrantRefNumber LIKE 'B'"
I have the following dataframe. It is from imdb. What i need to do is to extract movies with a score lower than 5 that receive more than 100000 votes. My problem is that I dont understand what the last code lines about the voting really do.
# two lists, one for movie data, the other of vote data
movie_data=[]
vote_data=[]
# this will do some reformating to get the right unicode escape for
hexentityMassage = [(re.compile('&#x([^;]+);'), lambda m: '&#%d;' % int(m.group(1), 16))] # converts XML/HTML entities into unicode string in Python
for i in range(20):
next_url = 'http://www.imdb.com/search/title?sort=num_votes,desc&start=%d&title_type=feature&year=1950,2012'%(i*50+1)
r = requests.get(next_url)
bs = BeautifulSoup(r.text,convertEntities=BeautifulSoup.HTML_ENTITIES,markupMassage=hexentityMassage)
# movie info is found in the table cell called 'title'
for movie in bs.findAll('td', 'title'):
title = movie.find('a').contents[0].replace('&','&') #get '&' as in 'Batman & Robin'
genres = movie.find('span', 'genre').findAll('a')
year = int(movie.find('span', 'year_type').contents[0].strip('()'))
genres = [g.contents[0] for g in genres]
runtime = movie.find('span', 'runtime').contents[0]
rating = float(movie.find('span', 'value').contents[0])
movie_data.append([title, genres, runtime, rating, year])
# rating info is found in a separate cell called 'sort_col'
for voting in bs.findAll('td', 'sort_col'):
vote_data.append(int(voting.contents[0].replace(',','')))
Your problem is this snippet,
for voting in bs.findAll('td', 'sort_col'):
vote_data.append(int(voting.contents[0].replace(',','')))
Here you are looping over all the td tags which have a attribute sort_col. In this case they have class="sort_col".
In the second line,
you are replacing the ',' with '' (empty string) of the first element of the list returned by voting.contents.
casting it to int.
then appending it to vote_data.
If I break this up, It will be like this,
for voting in bs.findAll('td', 'sort_col'):
# voting.contents returns a list like this [u'377,936']
str_vote = voting.contents[0]
# str_vote will be '377,936'
int_vote = int(str_vote.replace(',', ''))
# int_vote will be 377936
vote_data.append(int_vote)
Print the values in the loop to get more understanding. If you tagged your question right you might get a good answer faster.
I have a string that contains dictionary:
data = 'IN.Tags.Share.handleCount({"count":17737,"fCnt":"17K","fCntPlusOne":"17K","url":"www.test.com\\/"});'
How can i get value of an dictionary element count? (In my case 17737)
P.S. maybe I need to delete IN.Tags.Share.handleCount from string before getting a dictionary by i.e.
k = data.replace("IN.Tags.Share.handleCount", "") but the problem that '()' remains after delete?
Thanks
import re, ast
data = 'IN.Tags.Share.handleCount({"count":17737,"fCnt":"17K","fCntPlusOne":"17K","url":"www.test.com\/"});'
m = re.match('.*({.*})', data)
d = ast.literal_eval(m.group(1))
print d['count']