I am getting this output as pasted below .
[{'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/92E58C7C69D015DA528D8D7F22844BF49D702DFC'}, {'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/3086E306E7CB623F377B6F99261F82CC8BB57115'}, {'accel-world-infinite-burst-2016': 'https://yifysubtitles.org/movie-imdb/tt5923132'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/E92B664EE87663D7E5EC8E9FEED574C586A95A62'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/4F6F194996AC29924DB7596FB646C368C4E4224B'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/movies/anna-to-the-infinite-power-1983/request-subtitle'}, {'infinite-2021': 'https://yts.mx/torrent/download/304DB2FEC8901E996B066B74E5D5C010D2F818B4'}, {'infinite-2021': 'https://yts.mx/torrent/download/1320D6D3B332399B2F4865F36823731ABD1444C0'}, {'infinite-2021': 'https://yts.mx/torrent/download/45821E5B2E339382E7EAEFB2D89967BB2C9835F6'}, {'infinite-2021': 'https://yifysubtitles.org/movie-imdb/tt6654210'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/47EB04FBC7DC37358F86A5BFC115A0361F019B5B'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/88223BEAA09D0A3D8FB7EEA62BA9C5EB5FDE9282'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/movies/infinite-potential-the-life-ideas-of-david-bohm-2020/request-subtitle'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/0E2ACFF422AF4F62877F59EAE4EF93C0B3623828'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/52437F80F6BDB6FD326A179FC8A63003832F5896'}, {'the-infinite-man-2014': 'https://yifysubtitles.org/movie-imdb/tt2553424'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/DA101D139EE3668EEC9EC5B855B446A39C6C5681'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/8759CD554E8BB6CFFCFCE529230252AC3A22D4D4'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yifysubtitles.org/movie-imdb/tt0981227'}]
As you can see each movie have multiple links and for each link movie name is repeating .I want all links related to same movie must appeared as same object e.g
[{accel-world-infinite-burst-2016:{link1,link2,link3,link4},........]
for item in li:
# print(item.partition("movies/")[2])
movieName["Movies"].append(item.partition("movies/")[2])
req=requests.get(item)
s=soup(req.text,"html.parser")
m=s.find_all("p",{"class":"hidden-xs hidden-sm"})
# print(m[0])
for a in m[0].find_all('a', href=True):
# movieName['Movies'][item.partition("movies/")[2]]=(a['href'])
downloadLinks.append ( {item.partition("movies/")[2]:a['href'] })
you can try this,
# input = your list of dict
otp_dict = {}
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = list([value])
else:
otp_dict[key].append(value)
print(otp_dict)
otp: {'accel-world-infinite-burst-2016':[link1,link2],...}
output is dict containing list of links if you want set as you mentioned in your desired op try this
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = set([value])
else:
otp_dict[key].add(value)
otp: {'accel-world-infinite-burst-2016':{link1,link2},...}
Related
I would like to create a dictionary from an XML file unsing xpath. Here's an example of the XML:
</Contract>
<Contract ID="1">
<UnwantedPatterns>
<Pattern>0</Pattern>
<Pattern>1</Pattern>
</Contract>
<Contract ID="2
<UnwantedPatterns>
<Pattern>0</Pattern>
<Pattern>1</Pattern>
</Contract>
What I would like it's having the contract ID as key and the unwanted patterns as value.
Here's my code:
UnwantedPatterns = []
key = []
DictUP = {}
for ID in root.xpath('//Contracts'):
key = ID.xpath('./Contract/#ID')
for patterns in root.xpath('.//Contract/UnwantedPatterns/Pattern'):
DictUP[key] = UnwantedPatterns.append(patterns.text)
I get the error "unhashable type: 'list'". Thank you for your help, the output should look like that:
{1: 0,1
2: 0,1}
xpath returns list, so instead of
key = ID.xpath('./Contract/#ID')
try
key = ID.xpath('./Contract/#ID')[0]
As for output, as dictionary cannot have multiple values with the same key DictUP[key] = UnwantedPatterns.append(patterns.text) will overwrite value on each iteration.
Try
for ID in root.xpath('//Contracts'):
key = ID.xpath('./Contract/#ID')[0]
_patterns = []
for unwanted in root.xpath('.//Contract/UnwantedPatterns'):
_patterns.extend([pattern.text for pattern in unwanted.xpath('./Pattern')])
DictUP[key] = _patterns
Not really sure how to word this question properly, but I'm basically playing around with python and using Selenium to scrape a website and I'm trying to create a JSON file with the data.
Here's the goal I'm aiming to achieve:
{
"main1" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
},
"main2" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
}
}
The problem I'm facing at the moment is that the website has no indentation or child elements. It looks like this (but longer and actual copy, of course):
<h3>Main1</h3>
<p>Sub1</p>
<p>Sub2</p>
<p>Sub3</p>
<p>Sub4</p>
<h3>Main2</h3>
Now I want to iterate through the HTML in order to use the <h3> tags as the parent ("Main" in the JSON example) and <p> tags as the children(sub[num]). I'm new to both python and Selenium, so I may have done this wrong, but I've tried using items.find_elements_by_tag_name('el') to separate two, but I don't know how to put them back together in the order that they originally came.
I then tried looping through all the elements and separating the tags using if (item.tag_name == "el"): loops. This works perfectly when I print the results of each loop, but when it comes to putting them together in a JSON file, I have the same issue as the previous method where I cannot seem to get the 2 to join. I've tried a few variations and I either get key errors or only the last item in the loop gets recorded.
Just for reference, here's the code for this step:
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//*")
statuses = [
"Status1",
"Status2",
"Status3",
"Status4"
]
for item in itemList: #iterate through the HTML
if (item.tag_name == "h3"): #Separate H3 Tags
main = item.text
print("======================================")
print(main)
print("======================================")
if (item.tag_name == 'p'): #Separate P tags
for status in statuses:
if(status in item.text): #Filter P tags to only display info that contains words in the Status array
delimeters = ":", "(", "See"
regexPattern = "|".join(map(re.escape, delimeters))
zoneData = re.split(regexPattern, item.text)
#Split P tags into separate parts
sub1 = zoneData[0]
sub2 = zoneData[1].translate({ord('*'): None})
sub3 = zoneData[2].translate({ord(")"): None})
print(sub1)
print(sub2)
print(sub3)
The final option I've decided to try is to try going through all the HTML again, but using enumerate() and using the element's IDs and including all the tags between the 2 IDs, but I'm not really sure what my plan of action is with this just yet.
In general, the last option seems a bit convoluted and I'm pretty certain there's a simpler way to do this. What would you suggest?
Here's my idea, but I didn't do the data part, you can add it later.
I assume that there's no duplicate in main name, or else you will lose some info.
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//p|.//h3") # only finds h3 or p
def construct(item_list):
current_main = ''
final_dict: dict = {}
for item in item_list:
if item.tag_name == "h3":
current_main = item.text
final_dict[current_main] = {} # create empty dict inside main. remove if you want to update the main dict
if item.tag_name == "p":
p_name = item.text
final_dict[current_main][p_name] = "data"
return final_dict
I have csv file:
shack_imei.csv:
shack, imei
F10, "5555"
code:
reader = csv.reader(open("shack_imei.csv", "rb"))
my_dict = dict(reader)
shack = raw_input('Enter Shack:')
print shack
def get_imei_from_entered_shack(shack):
for key, value in my_dict.iteritems():
if key == shack:
return value
list = str(get_imei_from_entered_shack(shack))
print list
which gives me "5555"
But I need this value in a list structure like this:
["5555"]
I've tried a lot of different methods, and they all end up with extra ' or""
EDIT 1:
new simpler code:
reader = csv.reader(open("shack_imei.csv", "rb"))
my_dict = dict(reader)
shack = raw_input('Enter Shack:')
imei = my_dict[shack]
print imei
"5555"
list(imei) gives me ['"5555"'], I need it to be ["5555"]
You can change your "return" sentence:
shack = raw_input('Enter Shack:')
print shack
def get_imei_from_entered_shack(shack):
for key, value in my_dict.iteritems():
if key == shack:
return [str(value)]
list = get_imei_from_entered_shack(shack)
print list
As far as I understand, you want to create a list containing the returned string, which you do with [ ]
list = [str(get_imei_from_entered_shack(shack))]
There are a few problems with this code, which are too long to tackle in comments
my_dict
my_dict = dict(reader) works only well if this csv is a collection of keys and values. If there are duplicate keys, this might give some problems
get_imei_from_entered_shack
Why this special method, instead of just asking my_dict the correct value. Even if you don't want it to trow an Exception when you ask for a shack that doesn't exists, you can use the dict.get(<key>, <default>) method
my_dict(shack, None)
does the same as your 4-line method
list
don't name variables the same as builtins
list2
if you want a list, you can do [<value>] or list(<value>) (unless you replaced list with your own variable assignment)
reader = csv.reader(open("shack_imei.csv", "rb"))
my_dict = dict(reader)
shack = raw_input('Enter Shack:')
imei = my_dict[shack]
imei = imei.replace('"',"")
IMEI_LIST =[]
IMEI_LIST.append(imei)
print IMEI_LIST
['5555']
After the end of my code, I have a dictionary like so:
{'"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831}
What I want to do is to find each of the keys in a separate file, teams.txt, which is formatted like this:
1901,'BRO','LAD'
1901,'CHA','CHW'
1901,'WS1','MIN'
Using the year, which is 1901, and the team, which is the key of each item in the dictionary, I want to create a new dictionary where the key is the third column in teams.txt if the year and team both match, and the value is the value of the team in the first dictionary.
I figured this would be easiest if I created a function to "lookup" the year and the team, and return "franch", and then apply that function to each key in the dictionary. This is what I have so far, but it gives me a KeyError
def franch(year, team_str):
team_str = str(team_str)
with open('teams.txt') as imp_file:
teams = imp_file.readlines()
for team in teams:
(yearID, teamID, franchID) = team.split(',')
yearID = int(yearID)
if yearID == year:
if teamID == team_str:
break
franchID = franchID[1:4]
return franchID
And in the other function with the dictionary that I want to apply this function to:
franch_teams={}
for team in teams:
team = team.replace('"', "'")
franch_teams[franch(year, team)] = teams[team]
The ideal output of what I am trying to accomplish would look like:
{'"MIN"': 1475.9778073075058, '"LAD"': 1554.1437268304624, '"CHW"': 1552.228925324831}
Thanks!
Does this code suite your needs?
I am doing an extra check for equality, because there were different string signs in different parts of your code.
def almost_equals(one, two):
one = one.replace('"', '').replace("'", "")
two = two.replace('"', '').replace("'", "")
return one == two
def create_data(year, data, text_content):
""" This function returns new dictionary. """
content = [line.split(',') for line in text_content.split('\n')]
res = {}
for key in data.keys():
for one_list in content:
if year == one_list[0] and almost_equals(key, one_list[1]):
res[one_list[2]] = data[key]
return res
teams_txt = """1901,'BRO','LAD'
1901,'CHA','CHW'
1901,'WS1','MIN'"""
year = '1901'
data = { '"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831 }
result = create_data(year, data, teams_txt)
And the output:
{"'CHW'": 1552.228925324831, "'LAD'": 1554.1437268304624, "'MIN'": 1475.9778073075058}
Update:
To read from text file use this function:
def read_text_file(filename):
with open(filename) as file_object:
result = file_object.read()
return result
teams_txt = read_text_file('teams.txt')
You may try something like:
#!/usr/bin/env python
def clean(_str):
return _str.strip('"').strip("'")
first = {'"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831}
clean_first = dict()
second = dict()
for k,v in first.items():
clean_first[clean(k)] = v
with open("teams.txt", "r") as _file:
lines = _file.readlines()
for line in lines:
_,old,new = line.split(",")
second[new.strip()] = clean_first[clean(old)]
print second
Which gives the expected:
{"'CHW'": 1552.228925324831, "'LAD'": 1554.1437268304624, "'MIN'": 1475.9778073075058}
I am very new to Python and I have a json news feed that I need to get a selected 'title' and image 'src'.
I have managed to get to print all the 'title' and just the image 'src' that says "1024 landscape".
How can I print, for example, just the second title? How do I address that particular one?
The feed is : http://www.stuff.co.nz/_json/ipad-big-picture
for story in data.get('stories', []):
print 'Title:', story['title']
for img in story.get('images', []):
for var in img.get('variants', []):
if var.get('layout') == "1024 Landscape":
print ' img:', (var.get('src')).split('/')[-1], ' layout:', var.get('layout')
Thanks
First just get your stories object (list of dicts):
stories = data.get('stories', [])
Once you have this list you can just access by index:
if len(stories) >= 2:
print stories[1]['title']
Or try first and catch the exception:
i = 1
try:
print stories[i]['title']
except IndexError:
print "Story does not exist at index %d" % i
So, when trying to get all 1024 Landscape images for a specific story, it might look like this:
imgs = set()
for img in stories[1].get('images', []):
for variant in img.get('variants', []):
if variant.get('layout') == '1024 Landscape':
imgs.add(variant['src'])
print imgs
set([u'http://static.stuff.co.nz/1341147692/827/7202827.jpg'])