So currently I am working on extracting data from who.is, there were about 2000 links so I simply iterated over it with the function, but output is some what like this:
[{"email":"email#email.com", "Phone_no.":"+123456789", "more data":"more data", "even more data":"even more data"},
{"email":"email#email.com", "Phone_no.":"+123456789", "more data":"more data", "even more data":"even more data"},
{"email":"email#email.com", "Phone_no.":"+123456789", "more data":"more data", "even more data":"even more data"}]
the desired output is somewhat like:
["email#email.com","email#email.com","email#email.com"],["+123456789","+123456789","+123456789"]
You should iterate on each dictionnary of your list.
That will look like somthething like this :
email = []
phone_no = []
for d in data:
for key,value in d:
if(str(key) == 'email'):
email.append(value)
elif(str(key) == 'Phone_no.'):
phone_no.append(value)
else:
pass
So you have the email list with all email adresses and phone_no with all phone numbers
Assuming your data is in a variable called data:
emails = [d['email'] for d in data]
phone_numbers = [d['Phone_no.'] for d in data]
print(emails)
print(phone_numbers)
Output:
['email#email.com', 'email#email.com', 'email#email.com']
['+123456789', '+123456789', '+123456789']
Related
I am getting this output as pasted below .
[{'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/92E58C7C69D015DA528D8D7F22844BF49D702DFC'}, {'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/3086E306E7CB623F377B6F99261F82CC8BB57115'}, {'accel-world-infinite-burst-2016': 'https://yifysubtitles.org/movie-imdb/tt5923132'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/E92B664EE87663D7E5EC8E9FEED574C586A95A62'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/4F6F194996AC29924DB7596FB646C368C4E4224B'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/movies/anna-to-the-infinite-power-1983/request-subtitle'}, {'infinite-2021': 'https://yts.mx/torrent/download/304DB2FEC8901E996B066B74E5D5C010D2F818B4'}, {'infinite-2021': 'https://yts.mx/torrent/download/1320D6D3B332399B2F4865F36823731ABD1444C0'}, {'infinite-2021': 'https://yts.mx/torrent/download/45821E5B2E339382E7EAEFB2D89967BB2C9835F6'}, {'infinite-2021': 'https://yifysubtitles.org/movie-imdb/tt6654210'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/47EB04FBC7DC37358F86A5BFC115A0361F019B5B'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/88223BEAA09D0A3D8FB7EEA62BA9C5EB5FDE9282'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/movies/infinite-potential-the-life-ideas-of-david-bohm-2020/request-subtitle'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/0E2ACFF422AF4F62877F59EAE4EF93C0B3623828'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/52437F80F6BDB6FD326A179FC8A63003832F5896'}, {'the-infinite-man-2014': 'https://yifysubtitles.org/movie-imdb/tt2553424'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/DA101D139EE3668EEC9EC5B855B446A39C6C5681'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/8759CD554E8BB6CFFCFCE529230252AC3A22D4D4'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yifysubtitles.org/movie-imdb/tt0981227'}]
As you can see each movie have multiple links and for each link movie name is repeating .I want all links related to same movie must appeared as same object e.g
[{accel-world-infinite-burst-2016:{link1,link2,link3,link4},........]
for item in li:
# print(item.partition("movies/")[2])
movieName["Movies"].append(item.partition("movies/")[2])
req=requests.get(item)
s=soup(req.text,"html.parser")
m=s.find_all("p",{"class":"hidden-xs hidden-sm"})
# print(m[0])
for a in m[0].find_all('a', href=True):
# movieName['Movies'][item.partition("movies/")[2]]=(a['href'])
downloadLinks.append ( {item.partition("movies/")[2]:a['href'] })
you can try this,
# input = your list of dict
otp_dict = {}
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = list([value])
else:
otp_dict[key].append(value)
print(otp_dict)
otp: {'accel-world-infinite-burst-2016':[link1,link2],...}
output is dict containing list of links if you want set as you mentioned in your desired op try this
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = set([value])
else:
otp_dict[key].add(value)
otp: {'accel-world-infinite-burst-2016':{link1,link2},...}
So this is a somewhat of a continuation from a previous post of mine except now I have API data to work with. I am trying to get keys Type and Email as columns in a data frame to come up with a final number. My code:
jsp_full=[]
for p in payloads:
payload = {"payload": {"segmentId":p}}
r = requests.post(url,headers = header, json = payload)
#print(r, r.reason)
time.sleep(r.elapsed.total_seconds())
json_data = r.json() if r and r.status_code == 200 else None
json_keys = json_data['payload']['supporters']
json_package = []
jsp_full.append(json_package)
for row in json_keys:
SID = row['supporterId']
Handle = row['contacts']
a_key = 'value'
list_values = [a_list[a_key] for a_list in Handle]
string = str(list_values).split(",")
data = {
'SupporterID' : SID,
'Email' : strip_characters(string[-1]),
'Type' : labels(p)
}
json_package.append(data)
t2 = round(time.perf_counter(),2)
b_key = "Email"
e = len([b_list[b_key] for b_list in json_package])
t = str(labels(p))
#print(json_package)
print(f'There are {e} emails in the {t} segment')
print(f'Finished in {t2 - t1} seconds')
excel = pd.DataFrame(json_package)
excel.to_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(t, str(today)), sheet_name=t)
This part works all well and good. Each payload in the API represents a different segment of people so I split them out into different files. However, I am at a point where I need to combine all records into a single data frame hence why I append out to jsp_full. This is a list of a list of dictionaries.
Once I have that I would run the balance of my code which is like this:
S= pd.DataFrame(jsp_full[0], index = {0})
Advocacy_Supporters = S.sort_values("Type").groupby("Type", as_index=False)["Email"].first()
print(Advocacy_Supporters['Email'].count())
print("The number of Unique Advocacy Supporters is :")
Advocacy_Supporters_Group = Advocacy_Supporters.groupby("Type")["Email"].nunique()
print(Advocacy_Supporters_Group)
Some sample data:
[{'SupporterID': '565f6a2f-c7fd-4f1b-bac2-e33976ef4306', 'Email': 'somebody#somewhere.edu', 'Type': 'd_Student Ambassadors'}, {'SupporterID': '7508dc12-7647-4e95-a8b8-bcb067861faf', 'Email': 'someoneelse#email.somewhere.edu', 'Type': 'd_Student Ambassadors'},...`
My desired output is a dataframe that looks like so:
SupporterID Email Type
565f6a2f-c7fd-4f1b-bac2-e33976ef4306 somebody#somewhere.edu d_Student Ambassadors
7508dc12-7647-4e95-a8b8-bcb067861faf someoneelse#email.somewhere.edu d_Student Ambassadors
Any help is greatly appreciated!!
So because this code creates an excel file for each segment, all I did was read back in the excels via a for loop like so:
filesnames = ['e_S Donors', 'b_Contributors', 'c_Activists', 'd_Student Ambassadors', 'a_Volunteers', 'f_Offline Action Takers']
S= pd.DataFrame()
for i in filesnames:
data = pd.read_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(i, str(today)),sheet_name= i, engine = 'openpyxl')
S= S.append(data)
This did the trick since it was in a format I already wanted.
I have csv file:
shack_imei.csv:
shack, imei
F10, "5555"
code:
reader = csv.reader(open("shack_imei.csv", "rb"))
my_dict = dict(reader)
shack = raw_input('Enter Shack:')
print shack
def get_imei_from_entered_shack(shack):
for key, value in my_dict.iteritems():
if key == shack:
return value
list = str(get_imei_from_entered_shack(shack))
print list
which gives me "5555"
But I need this value in a list structure like this:
["5555"]
I've tried a lot of different methods, and they all end up with extra ' or""
EDIT 1:
new simpler code:
reader = csv.reader(open("shack_imei.csv", "rb"))
my_dict = dict(reader)
shack = raw_input('Enter Shack:')
imei = my_dict[shack]
print imei
"5555"
list(imei) gives me ['"5555"'], I need it to be ["5555"]
You can change your "return" sentence:
shack = raw_input('Enter Shack:')
print shack
def get_imei_from_entered_shack(shack):
for key, value in my_dict.iteritems():
if key == shack:
return [str(value)]
list = get_imei_from_entered_shack(shack)
print list
As far as I understand, you want to create a list containing the returned string, which you do with [ ]
list = [str(get_imei_from_entered_shack(shack))]
There are a few problems with this code, which are too long to tackle in comments
my_dict
my_dict = dict(reader) works only well if this csv is a collection of keys and values. If there are duplicate keys, this might give some problems
get_imei_from_entered_shack
Why this special method, instead of just asking my_dict the correct value. Even if you don't want it to trow an Exception when you ask for a shack that doesn't exists, you can use the dict.get(<key>, <default>) method
my_dict(shack, None)
does the same as your 4-line method
list
don't name variables the same as builtins
list2
if you want a list, you can do [<value>] or list(<value>) (unless you replaced list with your own variable assignment)
reader = csv.reader(open("shack_imei.csv", "rb"))
my_dict = dict(reader)
shack = raw_input('Enter Shack:')
imei = my_dict[shack]
imei = imei.replace('"',"")
IMEI_LIST =[]
IMEI_LIST.append(imei)
print IMEI_LIST
['5555']
After the end of my code, I have a dictionary like so:
{'"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831}
What I want to do is to find each of the keys in a separate file, teams.txt, which is formatted like this:
1901,'BRO','LAD'
1901,'CHA','CHW'
1901,'WS1','MIN'
Using the year, which is 1901, and the team, which is the key of each item in the dictionary, I want to create a new dictionary where the key is the third column in teams.txt if the year and team both match, and the value is the value of the team in the first dictionary.
I figured this would be easiest if I created a function to "lookup" the year and the team, and return "franch", and then apply that function to each key in the dictionary. This is what I have so far, but it gives me a KeyError
def franch(year, team_str):
team_str = str(team_str)
with open('teams.txt') as imp_file:
teams = imp_file.readlines()
for team in teams:
(yearID, teamID, franchID) = team.split(',')
yearID = int(yearID)
if yearID == year:
if teamID == team_str:
break
franchID = franchID[1:4]
return franchID
And in the other function with the dictionary that I want to apply this function to:
franch_teams={}
for team in teams:
team = team.replace('"', "'")
franch_teams[franch(year, team)] = teams[team]
The ideal output of what I am trying to accomplish would look like:
{'"MIN"': 1475.9778073075058, '"LAD"': 1554.1437268304624, '"CHW"': 1552.228925324831}
Thanks!
Does this code suite your needs?
I am doing an extra check for equality, because there were different string signs in different parts of your code.
def almost_equals(one, two):
one = one.replace('"', '').replace("'", "")
two = two.replace('"', '').replace("'", "")
return one == two
def create_data(year, data, text_content):
""" This function returns new dictionary. """
content = [line.split(',') for line in text_content.split('\n')]
res = {}
for key in data.keys():
for one_list in content:
if year == one_list[0] and almost_equals(key, one_list[1]):
res[one_list[2]] = data[key]
return res
teams_txt = """1901,'BRO','LAD'
1901,'CHA','CHW'
1901,'WS1','MIN'"""
year = '1901'
data = { '"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831 }
result = create_data(year, data, teams_txt)
And the output:
{"'CHW'": 1552.228925324831, "'LAD'": 1554.1437268304624, "'MIN'": 1475.9778073075058}
Update:
To read from text file use this function:
def read_text_file(filename):
with open(filename) as file_object:
result = file_object.read()
return result
teams_txt = read_text_file('teams.txt')
You may try something like:
#!/usr/bin/env python
def clean(_str):
return _str.strip('"').strip("'")
first = {'"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831}
clean_first = dict()
second = dict()
for k,v in first.items():
clean_first[clean(k)] = v
with open("teams.txt", "r") as _file:
lines = _file.readlines()
for line in lines:
_,old,new = line.split(",")
second[new.strip()] = clean_first[clean(old)]
print second
Which gives the expected:
{"'CHW'": 1552.228925324831, "'LAD'": 1554.1437268304624, "'MIN'": 1475.9778073075058}
The following list comprehension users = [item for item in out.split() if domain in item and userl in item] as it suggests it should only add users to the users() if they meet the domain and userl criteria. However I'm getting empty results in the sorted_list. Can anyone suggest why?
domain = 'domainanme'
user_list = [test1, test2, test3]
new_list = []
for userl in user_list:
try:
out = subprocess.check_output(["tasklist", "/V", "/FO", "List", "/FI", "USERNAME eq {0}\{1}" .format(domain, userl)], stderr=subprocess.STDOUT)
users = [item for item in out.split() if domain in item and userl in item]
sorted_list = set(users)
print sorted_list
if sorted_list != None: # this was an attempted to remove the EMPTY items
for name in sorted_list:
print name
new_list.append(name)
else:
pass
print name output
set([])
set([])
set([])
This is what the output looks like:
The domain name in the output is uppercased; make sure you take that into account. Normalize the case for both to ensure a case-insensitive match:
users = [item for item in out.split() if domain.upper() in item.upper() and userl in item]
I'd parse that output a little more intelligently as the above can easily lead to false-positives (a process name that has both the domain and username in it, even as overlapping text, would match too).