Organizing the list with a pivot - python

I need to organize a list. The item in the zero index will be the pivot, so that every item in the list smaller will be put left to it in the list, and everything greater will be put right to it in the list.
Now I can't use "sort" or any built-in function in Python. Anyone could give me a clue?

Since you give no attempted code, I'll just give you an idea of the algorithm.
Go through the list, skipping index 0 where the pivot value is and starting at index 1, and examine the list entries. If the entry is less than or equal to the pivot value, "swap list entries to move that value to the lower end of the table." If you do the coding right, this will not be needed and you can just leave the value in place. If the value is greater than the pivot value, swap list entries to move the examined value to the upper end of the table. You stop when you have examined all the list entries. If you really need the pivot value between the two sublists, make one last swap to put the pivot value in its proper place.
You need two variables to keep track of the limits of the examined lower part of the list and the examined upper part of the list. The unexamined values will be between these two sublists. These variables are indices into the list--much of your code will deal with using and updating these indices as you progress through the table's values.
Now think about the exact definition of those two key index variables, their initial values, how they change for each examined value, and just when you will stop the loop that examines the list's values.
Note that the algorithm I suggest is not the best way to do the partition. There is another, more complex algorithm that reduces the number of swaps. But you should learn to crawl before you walk, and my suggestion does get the job done with the same order of execution time.

Related

Is there a Python pandas function for retrieving a specific value of a dataframe based on its content?

I've got multiple excels and I need a specific value but in each excel, the cell with the value changes position slightly. However, this value is always preceded by a generic description of it which remains constant in all excels.
I was wondering if there was a way to ask Python to grab the value to the right of the element containing the string "xxx".
try iterating over the excel files (I guess you loaded each as a separate pandas object?)
somehting like for df in [dataframe1, dataframe2...dataframeN].
Then you could pick the column you need (if the column stays constant), e.g. - df['columnX'] and find which index it has:
df.index[df['columnX']=="xxx"]. Maybe will make sense to add .tolist() at the end, so that if "xxx" is a value that repeats more than once, you get all occurances in alist.
The last step would be too take the index+1 to get the value you want.
Hope it was helpful.
In general I would highly suggest to be more specific in your questions and provide code / examples.

How to efficently manage a list of elements that can either have one of it's elements removed or swapped with it's next one?

I have to build a program having two inputs (eventList, a list composed of strings that hold the type of operation and the id of the element that will undergo it, and idList, a list composed of ints, each one being the id of the element).
The two possible events are the deletion of the corresponding id, or having the id swap it's position in the idList with the following one (i.e. if the selected id is located in idList[2], it will swap value with idList[3]).
It has to pass strict tests with a set timeout and has to use dictionaries.
This is for a programmation assignment, I've alredy built this program but I can't find a way to get a decent time and pass the tester's timeouts.
I've alseo tried using lists instead of dicts, but I still can't pass some timeouts because of the time it takes to use .pop() and .index(), and I've been told the only way to pass all of them is to use dicts.
How I currently handle swaps:
def overtake(dictElement, elementId):
elementIndex = dictElement[elementId]
overtakerId = dictSearchOvertaker(dictElement, elementIndex)
dictElement[elementId], dictElement[overtakerId] = dictElement[overtakerId], dictElement[elementId]
return dictElement
How I currently handle deletions:
def eliminate(dictElement, elementId):
#elementIndex = dictElement[elementId]
del dictElement[elementId]
return dictUpdate(dictElement, elementId)
How i update the dictionary after an element is deleted:
def dictUpdate(dictElement, elementIndex):
listedDict = dictElement.items()
i = 0
for item in listedDict:
i += 1
if item[1] > elementIndex:
dictElement[item[0]] -= 1
return dictElement
I'm expected to handle a list of 200k elements where every element gets deleted one by one in 1.5 seconds, but it takes me more than 5 minutes, and even longer for a test where I get an idList with 1500 elements and every elements gets swapped with the following one untill in the end idList is reversed .
One thing that strikes me about this problem is that you're given a single list of operations and expected to return the result of doing all of them. That means you don't necessarily need to do them all one by one, and can instead do operations in a single batch that would otherwise be individually time-consuming.
Swapping two items is O(1) as long as you already know where they are. That's where a dict would come in -- a dict can help you associate one piece of information with another in such a way that you can find it in O(1) time. In this case, you want a way to find the index of an item given its id.
Deleting an item from the middle of a Python list is O(N), even if you already know its index, because internally it's an array and you need to shift everything over to take up the empty space every time you delete something that's not at the end. A naive solution is going to therefore be O(K*N), which is probably the thing the assignment is trying to get you to avoid. But nothing in the problem requires that you actually delete each item from the list one by one, just that the final result you return does not contain those items.
So, my approach would be:
Build a dict of id -> index. (This is just a single O(n) iteration over the list.)
Create an empty set to track deletions.
For each operation:
If it's a swap:
If the id is in your set, raise an exception.
Use your dict to find the indices of the two ids.
Swap the two items in the list.
Update your dict so it continues to match the list.
If it's a delete:
Add the id to your set.
Create a new list to return as the result.
For each item in the original list:
Check to see if it's in your set.
If it's in the set, skip it (it got deleted).
If not, append it to the result.
Return the result.
Where N is the list size and K is the number of operations, this ends up being O(N+K), because you iterated over the entire list of IDs exactly twice, and the entire list of operations exactly once, and everything you did inside those iterations was O(1).

Duplicate element being added through for loop

NOTE: I do not want to use del
I am trying to understand algorithms better which is why I want to avoid the built-in del statement.
I am populating a list of 10 randomly generated numbers. I then am trying to remove an item from the list by index, using a for loop:
if remove_index < lst_size:
for value in range(remove_index, lst_size-1):
lst[value] = lst[value+1]
lst_size -= 1
Everything works fine, except that the loop is adding the last item twice. Meaning, if the 8th item has the value 4, it will add a 9th item also valued 4. I am not sure why it is doing this. I still am able to move the value at the selected index (while moving everything up), but it adds on the duplicate.
Nothing is being added to your list. It starts out with lst_size elements, and, since you don't delete any, it retains the same number by the time you're done.
If, once you've copied all the items from remove_index onwards to the previous index in the list, you want to remove the last item, then you can do so either using del or lst.pop().
At the risk of sounding flippant, this is a general rule: if you want to do something, you have to do it. Saying "I don't want to use del" won't alter that fact.
Merely decrementing lst_size will have no effect on the list - while you may be using it to store the size of your list, they are not connected, and changing one has no effect on the other.

Need help understanding some code (Beginner)

I am trying to learn about while and for loops. This function prints out the highest number in a list. But, I'm not entirely sure how it works. Can anyone break down how it works for me. Maybe step by step and/or with a flowchart. I'm struggling and want to learn.
def highest_number(list_tested):
x=list_tested[0]
for number in list_tested:
if x<number:
x=number
print(x)
highest_number([1,5,3,2,3,4,5,8,5,21,2,8,9,3])
One of the most helpful things for understanding new code is going through it step by step:
PythonTutor has a visualizer: Paste in your code and hit visualize execution.
What this is going form the first to the last number and saying:
Is this new number bigger than the one I have? If so, keep the new number, if not keep the old number.
At the end, x will be the largest number.
See my comments for step by step explanation of each line
def highest_number(list_tested): # function defined to take a list
x=list_tested[0] # x is assigned the value of first element of list
for number in list_tested: # iterate over all the elements of input list
if x<number: # if value in 'x' is smaller than the current number
x=number # then store the value of current element in 'x'
print(x) # after iteration complete, print the value of 'x'
highest_number([1,5,3,2,3,4,5,8,5,21,2,8,9,3]) # just call to the function defined above
So basically, the function finds the largest number in the list by value.
It starts by setting the large number (x) as the first element of list, and then keeps comparing it to other elements of the list, until it finds an element which is greater than the largest number found till now (which is stored in x). So at the end, the largest value is stored in x.
Looks like you are new to the programming world. Maybe you should start with some basic concepts, for/while loops are some among which, that would be helpful for you before jumping into something like this.
Here is one of the explanations you may easily find on the Internet http://www.teamten.com/lawrence/programming/intro/intro8.html

Triple list VS double dictionary

I have 40.000 documents, 93.08 words per doc. on avg., where every word is a number (which can index a dictionary) and every word has a count (frequency). Read more here.
I am between two data structures to store the data and was wondering which one I should choose, which one the Python people would choose!
Triple-list:
A list, where every node:
__ is a list, where every node:
__.... is a list of two values; word_id and count.
Double-dictionary:
A dictionary, with keys the doc_id and values dictionaries.
That value dictionary would have a word_id as a key and the count as a value.
I feel that the first will require less space (since it doesn't store the doc_id), while the second will be more easy to handle and access. I mean, accessing the i-element in the list is O(n), while it is constant in the dictionary, I think. Which one should I choose?
You should use a dictionary. It will make handling your code easier to understand and to program and it will have a lower complexity as well.
The only reason you would use a list, is if you cared about the order of the documents.
If you don't care about the order of the items you should definitely use a dictionary because dictionaries are used to group associated data while lists are generally used to group more generic items.
Moreover lookups in dictionaries are faster than that of a list.
Lookups in lists are O(n) while lookups in dictionaries are O(1). though lists are considerably larger in Memory than lists
Essentially you just want to store a large amount of numbers, for which the most space efficient choice is an array. These are one-dimensional so you could write a class which takes in three indices (the last being 0 for word_id and 1 for count) and does some basic addition and multiplication to find the correct 1D index.

Categories