Outputting Bubble sorting - python

I have this list called countries.txt that list all the countries by their name, area(in km2), population (eg. ["Afghanistan", 647500.0, 25500100]).
def readCountries(filename):
result=[]
lines=open(filename)
for line in lines:
result.append(line.strip('\n').split(',\t'))
for sublist in result:
sublist[1]=float(sublist[1])
sublist[2]=int(sublist[2])
I am trying to sort through the list using a bubble sort according to the are of each country:
>> c = countryByArea(7)
>>> c
>>["India",3287590.0,1239240000]
When typing in the parameter is should return the nth largest area.
I have this but I'm not sure how to output the information
def countryByArea(area):
myList=readCountries('countries.txt')
for i in range(0,len(list)):
for j in range(0,len(list)-1):
if list[j]>list[j+1]:
temp=list[j]
list[j]=list[j+1]
list[j+1]=temp

first of all, implement a generic bubble sort method. this is a correct bubble sort algorithm implementation... Im sure you can find other implementations on http://rosettacode.org
def bubble_sort(a_list,a_key):
changed=True
while changed:
changed = False
for i in range(len(a_list)-1):
if a_key(a_list[i]) > a_key(a_list[i+1]):
a_list[i],a_list[i+1] = a_list[i+1],a_list[i]
changed = True
then simply pass a key function that represents the data you want to sort by (in this case the middle value or index one of each row
import csv
def sort_by_area(fname):
with open(fname) as f:
a = list(csv.reader(f))
bubble_sort(a,lambda row:int(row[1]))
return a
a = sort_by_area("a_file.txt")
print a[-7] #the 7th largest by area
you can take this info and combine it to complete your assignment ... but really this is a question you should have asked a classmate or your teacher for help with ...

Related

Python: Fill data in a list in a tuple

I need to create a function that reads the data given and creates a list that contains tuples each of which has as its first element the name of the airport and as its second and third its geographical coordinates as float numbers.
airport_data = """
Alexandroupoli 40.855869°N 25.956264°E
Athens 37.936389°N 23.947222°E
Chania 35.531667°N 24.149722°E
Chios 38.343056°N 26.140556°E
Corfu 39.601944°N 19.911667°E
Heraklion 35.339722°N 25.180278°E"""
airports = []
import re
airport_data1 = re.sub("[°N#°E]","",airport_data)
def process_airports(string):
airports_temp = list(string.split())
airports = [tuple(airports_temp[x:x+3]) for x in range(0, len(airports_temp), 3)]
return airports
print(process_airports(airport_data1))
This is my code so far but I'm new to Python, so I'm struggling to debug my code.
If you want the second and third element of the tuple to be a float, you have to convert them using the float() function.
One way to do this is creating a tuple with round brackets in your list comprehension and convert the values there:
def process_airports(string):
airports_temp = string.split()
airports = [(airports_temp[x], float(airports_temp[x+1]), float(airports_temp[x+2])) for x in range(0, len(airports_temp), 3)]
return airports
This yields a pretty unwieldy expression, so maybe this problem could be solved more readable with a classical for loop.
Also note that slpit() already returns a list.
Further remark: If you just cut off the letters from coordinates this might come back to bite you when your airports are in different quadrants.
You need to take in account N/S, W/E for longitude and latitude.
May be
def process_airports(string):
airports = []
for line in string.split('\n'):
if not line: continue
name, lon, lat = line.split()
airports.append((name,
float(lon[:-2]) * (1 if lon[-1] == "N" else -1),
float(lat[:-2]) * (-1 if lat[-1] == "E" else 1)
))
return airports
>>> process_airports(airport_data1)
[('Alexandroupoli', 40.855869, -25.956264), ('Athens', 37.936389, -23.947222), ('Chania', 35.531667, -24.149722), ('Chios', 38.343056, -26.140556), ('Corfu', 39.601944, -19.911667), ('Heraklion', 35.339722, -25.180278)]
I prefered the double split to put in evidence the differences lines/tuple elements

ASCII-alphabetical topological sort

I have been trying to figure out a issue, in which my program does the topological sort but not in the format that I'm trying to get. For instance if I give it the input:
Learn Python
Understand higher-order functions
Learn Python
Read the Python tutorial
Do assignment 1
Learn Python
It should output:
Read the python tutorial
Understand higher-order functions
Learn Python
Do assignment 1
Instead I get it where the first two instances are swapped, for some of my other test cases this occurs as well where it will swap 2 random instances, heres my code:
import sys
graph={}
def populate(name,dep):
if name in graph:
graph[name].append(dep)
else:
graph[name]=[dep]
if dep not in graph:
graph[dep]=[]
def main():
last = ""
for line in sys.stdin:
lne=line.strip()
if last == "":
last=lne
else:
populate(last,lne)
last=""
def topoSort(graph):
sortedList=[] #result
zeroDegree=[]
inDegree = { u : 0 for u in graph }
for u in graph:
for v in graph[u]:
inDegree[v]+=1
for i in inDegree:
if(inDegree[i]==0):
zeroDegree.append(i)
while zeroDegree:
v=zeroDegree.pop(0)
sortedList.append(v)
#selection sort for alphabetical sort
for x in graph[v]:
inDegree[x]-=1
if (inDegree[x]==0):
zeroDegree.insert(0,x)
sortedList.reverse()
#for y in range(len(sortedList)):
# min=y
# for j in range(y+1,len(sortedList)):
# if sortedList[min]>sortedList[y]:
# min=j
# sortedList[y],sortedList[min]=sortedList[min],sortedList[y]
return sortedList
if __name__=='__main__':
main()
result=topoSort(graph)
if (len(result)==len(graph)):
print(result)
else:
print("cycle")
Any Ideas as to why this may be occurring?
The elements within dictionaries or sets are not ordered. If you add elements they are randomly inserted and not appended to the end. I think that is the reason why you get random results with your sorting algorithm. I guess it must have to do something with inDegree but I didn't debug very much.
I can't offer you a specific fix for your code, but accordingly to the wanted input and output it should look like this:
# read tuples from stdin until ctrl+d is pressed (on linux) or EOF is reached
graph = set()
while True:
try:
graph |= { (input().strip(), input().strip()) }
except:
break
# apply topological sort and print it to stdout
print("----")
while graph:
z = { (a,b) for a,b in graph if not [1 for c,d in graph if b==c] }
print ( "\n".join ( sorted ( {b for a,b in z} )
+ sorted ( {a for a,b in z if not [1 for c,d in graph if a==d]} ) ) )
graph -= z
The great advantage of Python (here 3.9.1) is the short solution you might get. Instead of lists I would use sets because those can be easier edited: graph|{elements} inserts items to this set and graph-{elements} removes entities from it. Duplicates are ignored.
At first are some tuples red from stdin with ... = input(), input() into the graph item set.
The line z = {result loop condition...} filters the printable elements which are then subtracted from the so called graph set.
The generated sets are randomly ordered so the printed output must be turned to sorted lists at the end which are separated by newlines.

selecting a value using a conditional statement based on a list of tuples

I have a list of tuples converted from a dictionary. I am looking to compare a conditional value against the list of tuples(values) whether it is higher or lower starting from the beginning on the list. When this conditional value is lower than a tuple's(value) I want to use that specific tuple for further coding.
Please can somebody give me an insight into how this is achieved?
I am relatively new to coding, self-learning and I am not 100% sure the example would run but for the sake of demonstrating I have tried my best.
`tuple_list = [(12:00:00, £55.50), (13:00:00, £65.50), (14:00:00, £75.50), (15:00:00, £45.50), (16:00:00, £55.50)]
conditional_value = £50
if conditional_value != for x in tuple_list.values()
y = 0
if conditional_value < tuple_list(y)
y++1
else
///"return the relevant value from the tuple_list to use for further coding. I would be
looking to work with £45.50"///`
Thank you.
Just form a new list with a condition:
tuple_list = [("12:00:00", 55.50), ("13:00:00", 65.50), ("14:00:00", 75.50), ("15:00:00", 45.50), ("16:00:00", 55.50)]
threshold = 50
below = [tpl for tpl in tuple_list if tpl[1] < threshold]
print(below)
Which yields
[('15:00:00', 45.5)]
Note that I added quotation marks and removed the currency sign to be able to compare the values. If you happen to have the £ in your actual values, you'll have to preprocess (stripping) them before.
If I'm understanding your question correctly, this should be what you're looking for:
for key, value in tuple_list:
if conditional_value < value:
continue # Skips to next in the list.
else:
# Do further coding.
You can use
tuple_list = [("12:00:00", 55.50), ("13:00:00", 65.50), ("14:00:00", 75.50), ("15:00:00", 45.50), ("16:00:00", 55.50)]
conditional_value = 50
new_tuple_list = list(filter(lambda x: x[1] > conditional_value, tuple_list))
This code will return a new_tuple_list with all items that there value us greater then the conditional_value.

How can I take the lowest value in this code?

how are you?
I'm trying to take the lowest value of the following code, my idea is that for example the result will be like. country,price,date
im using python for the code
valores= ["al[8075]['2019-05-27']", "de[2177]['2019-05-27']", "at[3946]['2019-05-27']", "be[3019]['2019-05-26']", "by[5741]['2019-05-27']", "ba[0]['2019-05-26', '2019-05-27']", "bg[3223]['2019-05-26']", "hr[4358]['2019-05-26']", "dk[5006]['2019-05-27']", "sk[4964]['2019-05-27']", "si[5253]['2019-05-26']", "es[3813]['2019-05-27']", "ee[4699]['2019-05-27']", "ru[4889]['2019-05-27']", "fi[5410]['2019-05-26']", "fr[2506]['2019-05-26']", "gi[0]['2019-05-26', '2019-05-27']", "gr[1468]['2019-05-26']", "hu[3475]['2019-05-27']", "ie[5360]['2019-05-26']", "is[0]['2019-05-26']", "it[2970]['2019-05-26']", "lv[2482]['2019-05-27']", "lt[1276]['2019-05-27']", "lu[0]['2019-05-26']", "mk[5417]['2019-05-26']", "mt[3532]['2019-05-26']", "md[6158]['2019-05-27']", "me[11080]['2019-05-26']", "no[2967]['2019-05-27']", "nl[3640]['2019-05-27']", "pl[2596]['2019-05-27']", "pt[5409]['2019-05-27']", "uk[5010]['2019-05-27']", "cz[5493]['2019-05-26']", "ro[1017]['2019-05-27']", "rs[6535]['2019-05-27']", "se[3971]['2019-05-26']", "ch[5112]['2019-05-26']", "tr[3761]['2019-05-26']", "ua[5187]['2019-05-26']"]
the idea in this example will be like
as you see country(ro) price(1017) date('2019-05-27') is the lowest
valores= "ro[1017]['2019-05-27']"
Python's max() and min() functions take a key argument. So, whenever you need a minimum or maximum you can often leverage these built-ins. The only code you have to write something to convert a value to the corresponding representation for max/min purposes.
def f(s):
return int(s.split('[')[1].split(']')[0]) or float('inf')
lowest = min(valores, key = f) # ro[1017]['2019-05-27']
There are more than one way of coding this. The following will do this:
lowest = 1000000
target = " "
for i in valores:
ix = i.find("[") + 1
iy = i.find("]")
value = int(i[ix:iy])
if value < lowest and value != 0:
lowest = value
target = i
print(target)
It will output
"ro[1017]['2019-05-27]"
However, here I am assuming you do not want 0 values, otherwise the answer would be
"ba[0]['2019-05-26', '2019-05-27']"
If you want to include 0, just modify the if block.
This should work for you. I assume you want the lowest non-zero price.
I split every string in the lists into sublists via square brackets [ and strip away the extra brackets [ and ] for each item, hence each sublist will have [state, price, dates] .
I then sort on the price, which is the second item of each sublist, and filter out the 0 prices,
The result will then be the first element of the filtered list
import re
import re
valores= ["al[8075]['2019-05-27']", "de[2177]['2019-05-27']", "at[3946]['2019-05-27']", "be[3019]['2019-05-26']", "by[5741]['2019-05-27']", "ba[0]['2019-05-26', '2019-05-27']", "bg[3223]['2019-05-26']", "hr[4358]['2019-05-26']", "dk[5006]['2019-05-27']", "sk[4964]['2019-05-27']", "si[5253]['2019-05-26']", "es[3813]['2019-05-27']", "ee[4699]['2019-05-27']", "ru[4889]['2019-05-27']", "fi[5410]['2019-05-26']", "fr[2506]['2019-05-26']", "gi[0]['2019-05-26', '2019-05-27']", "gr[1468]['2019-05-26']", "hu[3475]['2019-05-27']", "ie[5360]['2019-05-26']", "is[0]['2019-05-26']", "it[2970]['2019-05-26']", "lv[2482]['2019-05-27']", "lt[1276]['2019-05-27']", "lu[0]['2019-05-26']", "mk[5417]['2019-05-26']", "mt[3532]['2019-05-26']", "md[6158]['2019-05-27']", "me[11080]['2019-05-26']", "no[2967]['2019-05-27']", "nl[3640]['2019-05-27']", "pl[2596]['2019-05-27']", "pt[5409]['2019-05-27']", "uk[5010]['2019-05-27']", "cz[5493]['2019-05-26']", "ro[1017]['2019-05-27']", "rs[6535]['2019-05-27']", "se[3971]['2019-05-26']", "ch[5112]['2019-05-26']", "tr[3761]['2019-05-26']", "ua[5187]['2019-05-26']"]
results = []
#Iterate through valores
for item in valores:
#Extract elements from each string by splitting on [ and then stripping extra square brackets
items = [it.strip('][') for it in item.split('[')]
results.append(items)
#Sort on the second element which is price, and filter prices with are 0
res = list(
filter(lambda x: int(x[1]) > 0,
sorted(results, key=lambda x:int(x[1])))
)
#This is your lowest non-zero price
print(res[0])
The output will be
['ro', '1017', "'2019-05-27'"]

Python Linear Search Better Efficiency

I've got a question regarding Linear Searching in Python. Say I've got the base code of
for l in lines:
for f in search_data:
if my_search_function(l[1],[f[0],f[2]]):
print "Found it!"
break
in which we want to determine where in search_data exists the value stored in l[1]. Say my_search_function() looks like this:
def my_search_function(search_key, search_values):
for s in search_values:
if search_key in s:
return True
return False
Is there any way to increase the speed of processing? Binary Search would not work in this case, as lines and search_data are multidimensional lists and I need to preserve the indexes. I've tried an outside-in approach, i.e.
for line in lines:
negative_index = -1
positive_index = 0
middle_element = len(search_data) /2 if len(search_data) %2 == 0 else (len(search_data)-1) /2
found = False
while positive_index < middle_element:
# print str(positive_index)+","+str(negative_index)
if my_search_function(line[1], [search_data[positive_index][0],search_data[negative_index][0]]):
print "Found it!"
break
positive_index = positive_index +1
negative_index = negative_index -1
However, I'm not seeing any speed increases from this. Does anyone have a better approach? I'm looking to cut the processing speed in half as I'm working with large amounts of CSV and the processing time for one file is > 00:15 which is unacceptable as I'm processing batches of 30+ files. Basically the data I'm searching on is essentially SKUs. A value from lines[0] could be something like AS123JK and a valid match for that value could be AS123. So a HashMap would not work here, unless there exists a way to do partial matches in a HashMap lookup that wouldn't require me breaking down the values like ['AS123', 'AS123J', 'AS123JK'], which is not ideal in this scenario. Thanks!
Binary Search would not work in this case, as lines and search_data are multidimensional lists and I need to preserve the indexes.
Regardless, it may be worth your while to extract the strings (along with some reference to the original data structure) into a flat list, sort it, and perform fast binary searches on it with help of the bisect module.
Or, instead of a large number of searches, sort also a combined list of all the search keys and traverse both lists in parallel, looking for matches. (Proceeding in a similar manner to the merge step in merge sort, without actually outputting a merged list)
Code to illustrate the second approach:
lines = ['AS12', 'AS123', 'AS123J', 'AS123JK','AS124']
search_keys = ['AS123', 'AS125']
try:
iter_keys = iter(sorted(search_keys))
key = next(iter_keys)
for line in sorted(lines):
if line.startswith(key):
print('Line {} matches {}'.format(line, key))
else:
while key < line[:len(key)]:
key = next(iter_keys)
except StopIteration: # all keys processed
pass
Depends on problem detail.
For instance if you search for complete words, you could create a hashtable on searchable elements, and the final search would be a simple lookup.
Filling the hashtable is pseudo-linear.
Ultimately, I was broke down and implemented Binary Search on my multidimensional lists by sorting using the sorted() function with a lambda as a key argument.Here is the first pass code that I whipped up. It's not 100% efficient, but it's a vast improvement from where we were
def binary_search(master_row, source_data,master_search_index, source_search_index):
lower_bound = 0
upper_bound = len(source_data) - 1
found = False
while lower_bound <= upper_bound and not found:
middle_pos = (lower_bound + upper_bound) // 2
if source_data[middle_pos][source_search_index] < master_row[master_search_index]:
if search([source_data[middle_pos][source_search_index]],[master_row[master_search_index]]):
return {"result": True, "index": middle_pos}
break
lower_bound = middle_pos + 1
elif source_data[middle_pos][source_search_index] > master_row[master_search_index] :
if search([master_row[master_search_index]],[source_data[middle_pos][source_search_index]]):
return {"result": True, "index": middle_pos}
break
upper_bound = middle_pos - 1
else:
if len(source_data[middle_pos][source_search_index]) > 5:
return {"result": True, "index": middle_pos}
else:
break
and then where we actually make the Binary Search call
#where master_copy is the first multidimensional list, data_copy is the second
#the search columns are the columns we want to search against
for line in master_copy:
for m in master_search_columns:
found = False
for d in data_search_columns:
data_copy = sorted(data_copy, key=lambda x: x[d], reverse=False)
results = binary_search(line, data_copy,m, d)
found = results["result"]
if found:
line = update_row(line, data_copy[results["index"]], column_mapping)
found_count = found_count +1
break
if found:
break
Here's the info for sorting a multidimensional list Python Sort Multidimensional Array Based on 2nd Element of Subarray

Categories