Making a Python executable for a specific task - python

I'm a neuroscientist and thus not very Python skilled, but i have managed to come up with a code which uses API access to download certain neurons from a specific website (neuromorpho.org). I want this to be publicly available so that other people which are not that familiar with Python can easily get what they need (planing on posting it to GitHub and making other similar stuff like this).
So i wanted to basically create an executable file in which people can select what they want and get a .csv file with neurons at the end. This perfectly works from inside the JupyterNotebook. However, when i use Auto Py to EXE to create and executable it doesn't work. It works for a long time, creates thousands of files (more than 1GB of data) and when you launch the executable nothing happens.
I presume it has something to do with the ipywidget that i have used to create selections for the initial query.
Here is the first part of the code where i try to query the neurons based on the widget selection:
widg1 = widget.Dropdown(options=['abdominal ganglion', 'accessory lobe', 'accessory olfactory bulb', 'adult subesophageal zone', 'amygdala'
'antenna', 'antennal lobe', 'anterior olfactory nucleus', 'basal forbrain', 'basal ganglia',
'brainstem', 'Central complex', 'Central nervous system', 'cerebellum', 'cerebral ganglion',
'Cochlea', 'corpus callosum', 'cortex', 'electrosensory lobe', 'endocrine system', 'enthorinal cortex',
'eye circuit', 'forebrain', 'fornix', 'ganglion', 'hippocampus', 'hypothalamus', 'lateral complex',
'lateral horn', 'lateral line organ', 'left', 'Left Adult Central Complex', 'Left Mushroom Body', 'main olfactory bulb'
'meninges', 'mesencephalon', 'myelencephalon', 'neocortex', 'nuchal organs', 'olfactory cortex', 'olfactory pit', 'optic lobe',
'pallium', 'parasubiculum', ' peptidergic circuit', 'peripheral nervous system', 'pharyngeal nervous system', 'pons', 'Pro-subiculum',
'protocerebrum', 'retina', 'retinorecipient mesencephalon and diencephalon', 'Right Adult Central Complex',
'Right Mushroom Body', 'somatic nervous system', 'spinal cord', 'stomatogastric ganglion', 'subesophageal ganglion',
'subesophageal zone-(SEZ)', 'subiculum', 'subpallium', 'Subventricular zone', 'thalamus', 'ventral nerve cord',
'ventral striatum', 'ventral thalamus', 'ventrolateral neuropils', 'Not reported'],
value= 'cerebellum', description='Brain Region:')
display(widg1)
widg2 = widget.Dropdown(options=['African wild dog', 'agouti', 'Apis mellifera', 'Aplysia', 'Axolotl', 'Baboon',
'Blind mole-rat', 'blowfly', 'Blue wildebeest', 'Bonobo', 'bottlenose dolphin', 'C. elegans',
'Calango lizard', 'capuchin monkey', 'Caracal', 'cat', 'cheetah', 'chicken', 'chimpanzee', 'Clam worm', 'clouded leopard', 'Crab', 'cricket',
'Crisia eburnea', 'Domestic dog', 'domestic pig', 'dragonfly', 'drosophila melanogaster', 'drosophila sechellia',
'elephant', 'ferret', 'giraffe', 'goldfish', 'grasshopper', 'Greater kudu', 'guinea pig', 'Hamster', 'human', 'humpback whale',
'Lemur', 'leopard', 'Lion', 'locust', 'manatee', 'minke whale', 'Mongoose', 'monkey', 'Mormyrid fish', 'moth',
'mouse', 'pouched lamprey', 'Praying mantis (Hierodula membranacea)', 'Praying mantis (Hierodula membranacea)',
'proechimys', 'rabbit', 'Rana esculenta', 'Ranitomeya imitator', 'rat', 'Rhinella arenarum', 'Ruddy turnstone', 'salamander',
'Scinax granulatus', 'Sea lamprey', 'Semipalmated plover', 'Semipalmated sandpiper', 'sheep', 'Silkmoth', 'spiny lobster', 'Stellers Sculpin',
'Tiger', 'Toadfish', 'Treeshrew', 'turtle', 'Wallaby', 'Xenopus laevis', 'Xenopus tropicalis', 'Zebra', 'zebra finch', 'zebrafish', 'Not reported'],
value= 'mouse', description='Animal:')
display(widg2)
widg3 = widget.Dropdown(options=['Glia', 'interneuron', 'principal cell', 'sensory', 'Not reported'],
value= 'principal cell', description='Cell Type:')
display(widg3)
str1 = widg1.value
str2 = widg2.value
str3 = widg3.value
query = (
"http://neuromorpho.org/api/neuron/select?q=brain_region:%s&fq=species:%s&fq=cell_type:%s" % (str1, str2, str3))
print(query)
response = requests.get(query)
json_data = response.json()
rat_data = json_data
rat_data
url = 'http://neuromorpho.org/api/neuron/select'
params = {
'page' : 0,
'q' : 'brain_region:' + widg1.value,
'fq' : [
'cell_type:' + widg3.value,
'species:' + widg2.value,
]
}
first_page_response = requests.get(url, params)
if first_page_response.status_code == 404 or first_page_response.status_code == 500:
exit (1)
print (first_page_response.json())
totalPages = first_page_response.json()['page']['totalPages']
df_dict = {
'NeuronID' : list(),
'Neuron Name' : list(),
'Archive' : list(),
'Note' : list(),
'Age Scale' : list(),
'Gender' : list(),
'Age Classification' : list(),
'Brain Region' : list(),
'Cell Type' : list(),
'Species' : list(),
'Strain' : list(),
'Scientific Name' : list(),
'Stain' : list(),
'Experiment Condition' : list(),
'Protocol' : list(),
'Slicing Direction' : list(),
'Reconstruction Software' : list(),
'Objective Type' : list(),
'Original Format' : list(),
'Domain' : list(),
'Attributes' : list(),
'Magnification' : list(),
'Upload Date' : list(),
'Deposition Date' : list(),
'Shrinkage Reported' : list(),
'Shrinkage Corrected' : list(),
'Reported Value' : list(),
'Reported XY' : list(),
'Reported Z' : list(),
'Corrected Value' : list(),
'Corrected XY' : list(),
'Corrected Z' : list(),
'Slicing Thickness' : list(),
'Min Age' : list(),
'Max Age' : list(),
'Min Weight' : list(),
'Max Weight' : list(),
'Png URL' : list(),
'Reference PMID' : list(),
'Reference DOI' : list(),
'Physical Integrity' : list() }
for pageNum in range(totalPages):
params['page'] = pageNum
response = requests.get(url, params)
print('Querying page {} -> status code: {}'.format(
pageNum, response.status_code))
if (response.status_code == 200): #only parse successful requests
data = response.json()
for row in data['_embedded']['neuronResources']:
df_dict['NeuronID'].append(str(row['neuron_id']))
df_dict['Neuron Name'].append(str(row['neuron_name']))
df_dict['Archive'].append(str(row['archive']))
df_dict['Note'].append(str(row['note']))
df_dict['Age Scale'].append(str(row['age_scale']))
df_dict['Gender'].append(str(row['gender']))
df_dict['Age Classification'].append(str(row['age_classification']))
df_dict['Brain Region'].append(str(row['brain_region']))
df_dict['Cell Type'].append(str(row['cell_type']))
df_dict['Species'].append(str(row['species']))
df_dict['Strain'].append(str(row['strain']))
df_dict['Scientific Name'].append(str(row['scientific_name']))
df_dict['Stain'].append(str(row['stain']))
df_dict['Experiment Condition'].append(str(row['experiment_condition']))
df_dict['Protocol'].append(str(row['protocol']))
df_dict['Slicing Direction'].append(str(row['slicing_direction']))
df_dict['Reconstruction Software'].append(str(row['reconstruction_software']))
df_dict['Objective Type'].append(str(row['objective_type']))
df_dict['Original Format'].append(str(row['original_format']))
df_dict['Domain'].append(str(row['domain']))
df_dict['Attributes'].append(str(row['attributes']))
df_dict['Magnification'].append(str(row['magnification']))
df_dict['Upload Date'].append(str(row['upload_date']))
df_dict['Deposition Date'].append(str(row['deposition_date']))
df_dict['Shrinkage Reported'].append(str(row['shrinkage_reported']))
df_dict['Shrinkage Corrected'].append(str(row['shrinkage_corrected']))
df_dict['Reported Value'].append(str(row['reported_value']))
df_dict['Reported XY'].append(str(row['reported_xy']))
df_dict['Reported Z'].append(str(row['reported_z']))
df_dict['Corrected Value'].append(str(row['corrected_value']))
df_dict['Corrected XY'].append(str(row['corrected_xy']))
df_dict['Corrected Z'].append(str(row['corrected_z']))
df_dict['Slicing Thickness'].append(str(row['slicing_thickness']))
df_dict['Min Age'].append(str(row['min_age']))
df_dict['Max Age'].append(str(row['max_age']))
df_dict['Min Weight'].append(str(row['min_weight']))
df_dict['Max Weight'].append(str(row['max_weight']))
df_dict['Png URL'].append(str(row['png_url']))
df_dict['Reference PMID'].append(str(row['reference_pmid']))
df_dict['Reference DOI'].append(str(row['reference_doi']))
df_dict['Physical Integrity'].append(str(row['physical_Integrity']))
neurons_df = pd.DataFrame(df_dict)
I know that this might be confusing to somebody not familiar to this, but i have placed some markdowns inside the notebook to explain in detail what is the problem.

I recommend to take a look on the PyInstaller and Nuitka. They can produce the standalone executables.
Example with nuitka:
(linux) python -m nuitka --onefile --output-dir=./nuitka --standalone --follow-imports --plugin-enable=qt-plugins ./../updater.py
(windows) python -m nuitka --onefile --windows-uac-admin --windows-disable-console --windows-icon-from-ico=.\updater\resources\ico\au.ico --output-dir=.\nuitka --standalone --follow-imports --plugin-enable=qt-plugins --windows-company-name=Name --windows-product-name=Name --windows-product-version=1.0.0 --windows-file-description=Name .\..\updater.py
Example with pyinstaller:
pyinstaller --onefile --windowed --icon=./updater/resources/ico/au.ico updater.py

Related

My search input functions works, but it only prints the last person's information in the list of dicts

I'm kind of new to python, and I need some help. I'm making an employee list menu. My list of dictionaries is:
person_infos = [ {'name': 'John Doe', 'age': '46', 'job position': 'Chair Builder', 'pay per hour': '14.96','date hired': '2/26/19'},
{'name': 'Phillip Waltertower', 'age': '19', 'job position': 'Sign Holder', 'pay per hour': '10','date hired': '5/9/19'},
{'name': 'Karen Johnson', 'age': '40', 'job position': 'Manager', 'pay per hour': '100','date hired': '9/10/01'},
{'name': 'Linda Bledsoe', 'age': '60', 'job position': 'CEO', 'pay per hour': '700', 'date hired': '8/24/99'},
{'name': 'Beto Aretz', 'age': '22', 'job position': 'Social Media Manager', 'pay per hour': '49','date hired': '2/18/12'}]
and my "search the list of dicts input function" is how the program is supposed to print the correct dictionary based on the name the user inputs:
def search_query(person_infos):
if answer == '3':
search_query = input('Who would you like to find: ')
they_are_found = False
location = None
for i, each_employee in enumerate(person_infos):
if each_employee['name'] == search_query:
they_are_found = True
location = i
if they_are_found:
print('Found: ', person_infos[location]['name'], person_infos[location]['job position'], person_infos[location]['date hired'], person_infos[location]['pay per hour'])
else:
print('Sorry, your search query is non-existent.')
and I also have this-
elif answer =='3':
person_infos = search_query(person_infos)
This seems like a step in the right direction, but for
search_query = input('Who would you like to find: ')
if I input of the names in person_infos, like "John Doe," it just prints the last dictionary's information (no matter which specific dictionary it is, the last one in the order will always be outputted) instead of John Doe's. in this case, it would only print "Beto Aretz's."
Can someone please help? It's something I've been struggling on for a while and it would be awesome.
I've researched so much and I could not find something with things that I either knew how to do, or were the input search.
Thanks,
LR
At first glance it looks like because your location=i is not indented inside your if statement so it is getting set to the latest i on each iteration of the for loop. Let me know if this helps.
def search_query(person_infos):
if answer == '3':
search_query = input('Who would you like to find: ')
they_are_found = False
location = None
for i, each_employee in enumerate(person_infos):
if each_employee['name'] == search_query:
they_are_found = True
location = i
if they_are_found:
print('Found: ', person_infos[location]['name'], person_infos[location]['job position'], person_infos[location]['date hired'], person_infos[location]['pay per hour'])
else:
print('Sorry, your search query is non-existent.')

For loop on a dictionary giving out of range error

I'm having troubles understanding dictionaries and for loop.
I have this example that takes a nested dictionary representing a playlist of songs. On the first example the code runs just fine, but when I try to create a function and try to clean up the code. It keeps saying index out of range. Can anybody throw their 2 cents.
Example playlist from a JSON file:
playlist = {
'title': 'faves',
' author': 'Me',
'songs': [
{
'title': 'song1',
'artist': ['john', 'smith'],
'genre': 'Pop',
'duration' : 3.23
},
{
'title': 'song2',
'artist': ['john2'],
'genre': 'Rock',
'duration' : 3.45
},
{
'title': 'song3',
'artist': ['john3', 'smith3'],
'genre': 'Jazz',
'duration' : 2.45
}
]
}
This first code byte works well and print the right strings.
sa = f" and {song['artist'][1]}"
for song in playlist['songs']:
print(f"{song['title']} by {song['artist'][0]}{sa if len(song['artist']) >= 2 else ''}, runtime: {song['duration']}, genre: {song['genre']}")
song1 by john and smith3, runtime: 3.23, genre: Pop
song2 by john2, runtime: 3.45, genre: Rock
song3 by john3 and smith3, runtime: 2.45, genre: Jazz
But here when I try to run this it says index out of range. It's calling artist_two, but is not supposed to do that unless there is more than one artist for a song.
def print_playlist(songs):
print(songs)
for song in songs:
title = song['title']
duration = song['duration']
genre = song['genre']
artists = song['artist']
artist_one = song['artist'][0]
artist_two = song['artist'][1]
sa = f" and {artist_two}"
print(f"{title} by {artist_one}{sa if len(artists) >=2 else ''}, runtime: {duration}, genre: {genre}")
print_playlist(playlist['songs'])
You can use this method to make a string of the names with " and " in between them.
artist_list=["John","Smith"]
y=" and ".join(str(x) for x in artist_list)
print(y)
This give the output of John and Smith
And if you make the artist list: ["John","Smith","Dave"]
Your output will look like John and Smith and Dave
As mentioned in the comment above, you are assuming there are always at least 2 elements in the artist_list. You should rather use an approach like mine that I found from Concatenate item in list to strings
Thank you Zack Tarr
final code looks like
def print_playlist(songs):
for song in songs:
title = song['title']
duration = song['duration']
genre = song['genre']
artists = song['artist']
artist_plus = " and ".join( artist for artist in artists)
print(f" {title} by {artist_plus if len(artists) >= 2 else artists[0]}, runtime: {duration}, genre: {genre}")
print_playlist(playlist['songs'])

Python How to split text with no fixed blankspace

I have bellow text( get repsond from Zebra):
30.0 DARKNESS
4 IPS PRINT SPEED
+000 TEAR OFF
TEAR OFF PRINT MODE
GAP/NOTCH MEDIA TYPE
WEB SENSOR TYPE
MANUAL SENSOR SELECT
THERMAL-TRANS. PRINT METHOD
480 PRINT WIDTH
0387 LABEL LENGTH
39.0IN 975MM MAXIMUM LENGTH
CONNECTED USB COMM.
BIDIRECTIONAL PARALLEL COMM.
9600 BAUD
8 BITS DATA BITS
NONE PARITY
DTR & XON/XOFF HOST HANDSHAKE
NONE PROTOCOL
AUTO SER COMM. MODE
<~> 7EH CONTROL CHAR
<^> 5EH COMMAND CHAR
<,> 2CH DELIM. CHAR
ZPL II ZPL MODE
NO MOTION MEDIA POWER UP
FEED
I want to get values for each settings via python.
Expect to get something like a dict {'DARKNESS':30,'PRINT SPEED':'4 IPS' ....}
Normally, expect code is
for line in lines:
x=line.split(' ')
the_value=x[0]
the_setting=x[1]
but it's without fixed blankspace.
I don't have good idea to split it.
Using split() function isn't a good choose here.
The value also have blankspace as well.
I was stuck here.
Any idea?
Using my suggestion in combination with yours, I got this to work (I made a txt file with your examples in it):
import re
file = open('untitled.txt','r')
my_dict = {}
for line in file:
x,y = re.split(r'\s{4,}',line.strip())
my_dict[y] = x
This is the output of the dictionary I made with this code:
{'DARKNESS': '30.0', 'PRINT SPEED': '4 IPS', 'TEAR OFF': '+000', 'PRINT MODE': 'TEAR OFF', 'MEDIA TYPE': 'GAP/NOTCH', 'SENSOR TYPE': 'WEB', 'SENSOR SELECT': 'MANUAL', 'PRINT METHOD': 'THERMAL-TRANS.', 'PRINT WIDTH': '480', 'LABEL LENGTH': '0387', 'MAXIMUM LENGTH': '39.0IN 975MM', 'USB COMM.': 'CONNECTED', 'PARALLEL COMM.': 'BIDIRECTIONAL', 'BAUD': '9600', 'DATA BITS': '8 BITS', 'PARITY': 'NONE', 'HOST HANDSHAKE': 'DTR & XON/XOFF', 'PROTOCOL': 'NONE', 'SER COMM. MODE': 'AUTO', 'CONTROL CHAR': '<~> 7EH', 'COMMAND CHAR': '<^> 5EH', 'DELIM. CHAR': '<,> 2CH', 'ZPL MODE': 'ZPL II', 'MEDIA POWER UP': 'NO MOTION'}
well, you can do the following
file=open('yourfile','r').read().split('\n')
lines=[line.split(' ') for line in file]
items=[[i.replace(' ','') for i in item if i!=''] for item in lines]
output_dict={i[0]:i[1] for i in items if i}
I used 3 main features of python here the one line loop
loop=[dosomething(item) for item in array if item=='somevalue'] #if statement is not necessary
,the replace() function
print 'Hello You'.replace('You','world') # outputs hello world
and the split() function
print 'hello,world'.split(',') # outputs ['hello',world]
you can find more documentation here: python string methods
Thanks #TheDetective , you answer is useful.
Now better.
(comments have string limit,so I have to post in answer)
>>> for line in lines:
... re.split(r'\s{4,}',line.rstrip().lstrip())
...
['\x02 30.0', 'DARKNESS']
['4 IPS', 'PRINT SPEED']
['+000', 'TEAR OFF']
['TEAR OFF', 'PRINT MODE']
['GAP/NOTCH', 'MEDIA TYPE']
['WEB', 'SENSOR TYPE']
['MANUAL', 'SENSOR SELECT']
['THERMAL-TRANS.', 'PRINT METHOD']
['480', 'PRINT WIDTH']
['0387', 'LABEL LENGTH']
['39.0IN 975MM', 'MAXIMUM LENGTH']
['CONNECTED', 'USB COMM.']
['BIDIRECTIONAL', 'PARALLEL COMM.']
['9600', 'BAUD']
['8 BITS', 'DATA BITS']
['NONE', 'PARITY']
['DTR & XON/XOFF', 'HOST HANDSHAKE']
['NONE', 'PROTOCOL']
['AUTO', 'SER COMM. MODE']
['<~> 7EH', 'CONTROL CHAR']
['<^> 5EH', 'COMMAND CHAR']
['<,> 2CH', 'DELIM. CHAR']
['ZPL II', 'ZPL MODE']
['NO MOTION', 'MEDIA POWER UP']
['FEED']
>>>
Since I've already done it, you might as well have my answer too.
The two items of information occupy fixed places on each line. Therefore, string slicing can be used to pick them from lines. I omit the last line because there is no information about its field name.
>>> result = {}
>>> with open('temp.txt') as temp:
... for line in temp.readlines():
... if line.startswith('FEED'):
... break
... result[line[20:].strip()] = line[:20].strip()
...
>>> result
{'DARKNESS': '30.0', 'PARITY': 'NONE', 'PRINT WIDTH': '480', 'DATA BITS': '8 BITS', 'PROTOCOL': 'NONE', 'COMMAND CHAR': '<^> 5EH', 'USB COMM.': 'CONNECTED', 'BAUD': '9600', 'PRINT MODE': 'TEAR OFF', 'MEDIA POWER UP': 'NO MOTION', 'DELIM. CHAR': '<,> 2CH', 'MAXIMUM LENGTH': '39.0IN 975MM', 'SENSOR SELECT': 'MANUAL', 'SENSOR TYPE': 'WEB', 'LABEL LENGTH': '0387', 'PARALLEL COMM.': 'BIDIRECTIONAL', 'CONTROL CHAR': '<~> 7EH', 'TEAR OFF': '+000', 'PRINT SPEED': '4 IPS', 'PRINT METHOD': 'THERMAL-TRANS.', 'HOST HANDSHAKE': 'DTR & XON/XOFF', 'ZPL MODE': 'ZPL II', 'MEDIA TYPE': 'GAP/NOTCH', 'SER COMM. MODE': 'AUTO'}
Use the python split function
https://www.tutorialspoint.com/python/string_split.htm
You can iterate over the lines using split('\n')
and then you can use regex to split the rest.
In your accepted answer it only splits when the whitespace between the key and value is 4 or bigger. This can give buggs when it is smaller. My solution normally fixes this.
dict = {}
for line in input.split('\n'):
# Split the line in the correct parts
myArray = re.findall('(^.{20})(.*)', line.lstrip().rstrip())
# Check that you have found both key and value
if len(myArray) > 0:
myTupple = myArray[0]
dict[myTupple[1].rstrip()] = myTupple[0].rstrip()

For loop outputting duplicates

a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
comments_table = []
What I am trying to achieve with this replacer function is replace people's names in the strings found in com(dict) with the a code unique to them which is found in a(dict) via regex. Replacing the name with the code works, but adding that new string with the code instead of the name is where I am going wrong.
def replace_first_name():
for k,v in a.items():
for z, y in com.items():
for item in y:
firstname = a[k][0]
lastname = a[k][1]
full_name = firstname + ' ' + lastname
if firstname in item:
if full_name in item:
t = re.compile(re.escape(full_name), re.IGNORECASE)
comment = t.sub(a[k][2], item)
print ('1')
comments_table.append({
'post_id': z, 'comment': comment
})
continue
else:
t = re.compile(re.escape(firstname), re.IGNORECASE)
comment = t.sub(a[k][2], item)
print ('2')
comments_table.append({
'post_id':z, 'comment':comment
})
else:
print ('3')
if fuzz.ratio(item,item) > 90:
comments_table.append({
'post_id': z, 'comment': item
})
else:
pass
The problem is with the output as seen below:
[{'comment': '1330, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}]
I don't want comments that already have their name replaced with the number to make their way into the final list. Therefore, I want my expected output to look like this:
[{'comment': '1330, getting no points', 'post_id': '6'},{'comment': '0001,played in this game? Didn\'t notice him', 'post_id': '6', {'comment':'Love this shot', 'post_id':'6'}]
I have looked into using an iterator by making y an iter_list, but I didn't get anywhere. Any help would be appreciated. Thanks!
Not sure why you are doing the regexp replace since you are checking if the first name/full name is present with in. Also not sure what the fuzz.ratio(item, item) thing in case 3 is supposed to do, but here's how you can do the simple/naive replacement:
#!/usr/bin/python
import re
def replace_names(authors, com):
res = []
for post_id, comments in com.items():
for comment in comments:
for author_id, author in authors.items():
first_name, last_name = author[0], author[1]
full_name = first_name + ' ' + last_name
if full_name in comment:
comment = comment.replace(full_name, author_id)
break
elif first_name in comment:
comment = comment.replace(first_name, author_id)
break
res.append({'post_id': post_id, 'comment': comment})
return res
a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
for comment in replace_names(a, com):
print comment
Which produces this output:
{'comment': '1330, getting no points', 'post_id': '6'}
{'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}
{'comment': 'Love this shot!', 'post_id': '6'}
It's a bit tricky to understand what your intention is with the original code, but (one of) the reason(s) you are getting duplicates is that you are processing authors in the outher loop, which means you will process each comment one time for each author. By swapping the loop you ensure that each comment is processed only once.
You may also have intended to have a break where you have the continue, but I'm not entirely sure I understand how your original code is supposed to work.
The use of global variables is also a bit confusing.

Python search loop slow

I am running a search on a list of ads (adscrape). Each ad is a dict within adscrape (e.g. ad below). It searches through a list of IDs (database_ids) which could be between 200,000 - 1,000,000 items long. I want to find any ads in adscrape that don't have an ID already in database_ids.
My current code is below. It takes a loooong time, and multiple seconds for each ad to scan through database_ids. Is there a more efficient/faster way of running this (finding which items in a big list, are in another big list)?
database_ids = ['id1','id2','id3'...]
ad = {'body': u'\xa0SUV', 'loc': u'SA', 'last scan': '06/02/16', 'eng': u'\xa06cyl 2.7L ', 'make': u'Hyundai', 'year': u'2006', 'id': u'OAG-AD-12371713', 'first scan': '06/02/16', 'odo': u'168911', 'active': 'Y', 'adtype': u'Dealer: Used Car', 'model': u'Tucson Auto 4x4 ', 'trans': u'\xa0Automatic', 'price': u'9990'}
for ad in adscrape:
ad['last scan'] = date
ad['active'] = 'Y'
adscrape_ids.append(ad['id'])
if ad['id'] not in database_ids:
ad['first scan'] = date
print 'new ad:',ad
newads.append(ad)
`You can use list comprehensions for this as the code base given below. Use the existing database_ids list and adscrape dict as given above.
Code base:
new_adds_ids = [ad for ad in adscrape if ad['id'] not in database_ids]`
You can build ids_map as dict and check whether id in list by accessing key in that ids_map as in code snippet below:
database_ids = ['id1','id2','id3']
ad = {'id': u'OAG-AD-12371713', 'body': u'\xa0SUV', 'loc': u'SA', 'last scan': '06/02/16', 'eng': u'\xa06cyl 2.7L ', 'make': u'Hyundai', 'year': u'2006', 'first scan': '06/02/16', 'odo': u'168911', 'active': 'Y', 'adtype': u'Dealer: Used Car', 'model': u'Tucson Auto 4x4 ', 'trans': u'\xa0Automatic', 'price': u'9990'}
#build ids map
ids_map = dict((k, v) for v, k in enumerate(database_ids))
for ad in adscrape:
# some logic before checking whether id in database_ids
try:
ids_map[ad['id']]
except KeyError:
pass
else:
#error not thrown perform logic for existed ids
print 'id %s in list' % ad['id']

Categories