I've searched around and most of the errors I see are when people are trying to iterate over a list and modify it at the same time. In my case, I am trying to take one list, and remove items from that list that are present in a second list.
import pymysql
schemaOnly = ["table1", "table2", "table6", "table9"]
db = pymysql.connect(my connection stuff)
tables = db.cursor()
tables.execute("SHOW TABLES")
tablesTuple = tables.fetchall()
tablesList = []
# I do this because there is no way to remove items from a tuple
# which is what I get back from tables.fetchall
for item in tablesTuple:
tablesList.append(item)
for schemaTable in schemaOnly:
tablesList.remove(schemaTable)
When I put various print statements in the code, everything looks like proper and like it is going to work. But when it gets to the actual tablesList.remove(schemaTable) I get the dreaded ValueError: list.remove(x): x not in list.
If there is a better way to do this I am open to ideas. It just seemed logical to me to iterate through the list and remove items.
Thanks in advance!
** Edit **
Everyone in the comments and the first answer is correct. The reason this is failing is because the conversion from a Tuple to a list is creating a very badly formatted list. Hence there is nothing that matches when trying to remove items in the next loop. The solution to this issue was to take the first item from each Tuple and put those into a list like so: tablesList = [x[0] for x in tablesTuple] . Once I did this the second loop worked and the table names were correctly removed.
Thanks for pointing me in the right direction!
I assume that fetchall returns tuples, one for each database row matched.
Now the problem is that the elements in tablesList are tuples, whereas schemaTable contains strings. Python does not consider these to be equal.
Thus when you attempt to call remove on tablesList with a string from schemaTable, Python cannot find any such value.
You need to inspect the values in tablesList and find a way convert them to a strings. I suspect it would be by simply taking the first element out of the tuple, but I do not have a mySQL database at hand so I cannot test that.
Regarding your question, if there is a better way to do this: Yes.
Instead of adding items to the list, and then removing them, you can append only the items that you want. For example:
for item in tablesTuple:
if item not in schemaOnly:
tablesList.append(item)
Also, schemaOnly can be written as a set, to improve search complexity from O(n) to O(1):
schemaOnly = {"table1", "table2", "table6", "table9"}
This will only be meaningful with big lists, but in my experience it's useful semantically.
And finally, you can write the whole thing in one list comprehension:
tablesList = [item for item in tablesTuple if item not in schemaOnly]
And if you don't need to keep repetitions (or if there aren't any in the first place), you can also do this:
tablesSet = set(tablesTuple) - schemaOnly
Which is also has the best big-O complexity of all these variations.
Related
I have to build a program having two inputs (eventList, a list composed of strings that hold the type of operation and the id of the element that will undergo it, and idList, a list composed of ints, each one being the id of the element).
The two possible events are the deletion of the corresponding id, or having the id swap it's position in the idList with the following one (i.e. if the selected id is located in idList[2], it will swap value with idList[3]).
It has to pass strict tests with a set timeout and has to use dictionaries.
This is for a programmation assignment, I've alredy built this program but I can't find a way to get a decent time and pass the tester's timeouts.
I've alseo tried using lists instead of dicts, but I still can't pass some timeouts because of the time it takes to use .pop() and .index(), and I've been told the only way to pass all of them is to use dicts.
How I currently handle swaps:
def overtake(dictElement, elementId):
elementIndex = dictElement[elementId]
overtakerId = dictSearchOvertaker(dictElement, elementIndex)
dictElement[elementId], dictElement[overtakerId] = dictElement[overtakerId], dictElement[elementId]
return dictElement
How I currently handle deletions:
def eliminate(dictElement, elementId):
#elementIndex = dictElement[elementId]
del dictElement[elementId]
return dictUpdate(dictElement, elementId)
How i update the dictionary after an element is deleted:
def dictUpdate(dictElement, elementIndex):
listedDict = dictElement.items()
i = 0
for item in listedDict:
i += 1
if item[1] > elementIndex:
dictElement[item[0]] -= 1
return dictElement
I'm expected to handle a list of 200k elements where every element gets deleted one by one in 1.5 seconds, but it takes me more than 5 minutes, and even longer for a test where I get an idList with 1500 elements and every elements gets swapped with the following one untill in the end idList is reversed .
One thing that strikes me about this problem is that you're given a single list of operations and expected to return the result of doing all of them. That means you don't necessarily need to do them all one by one, and can instead do operations in a single batch that would otherwise be individually time-consuming.
Swapping two items is O(1) as long as you already know where they are. That's where a dict would come in -- a dict can help you associate one piece of information with another in such a way that you can find it in O(1) time. In this case, you want a way to find the index of an item given its id.
Deleting an item from the middle of a Python list is O(N), even if you already know its index, because internally it's an array and you need to shift everything over to take up the empty space every time you delete something that's not at the end. A naive solution is going to therefore be O(K*N), which is probably the thing the assignment is trying to get you to avoid. But nothing in the problem requires that you actually delete each item from the list one by one, just that the final result you return does not contain those items.
So, my approach would be:
Build a dict of id -> index. (This is just a single O(n) iteration over the list.)
Create an empty set to track deletions.
For each operation:
If it's a swap:
If the id is in your set, raise an exception.
Use your dict to find the indices of the two ids.
Swap the two items in the list.
Update your dict so it continues to match the list.
If it's a delete:
Add the id to your set.
Create a new list to return as the result.
For each item in the original list:
Check to see if it's in your set.
If it's in the set, skip it (it got deleted).
If not, append it to the result.
Return the result.
Where N is the list size and K is the number of operations, this ends up being O(N+K), because you iterated over the entire list of IDs exactly twice, and the entire list of operations exactly once, and everything you did inside those iterations was O(1).
I have an input of about 2-5 millions strings of about 400 characters each, coming from a stored text file.
I need to check for duplicates before adding them to the list that I check (doesn't have to be a list, can be any other data type, the list is technically a set since all items are unique).
I can expect about 0.01% at max of my data to be non-unique and I need to filter them out.
I'm wondering if there is any faster way for me to check if the item exists in the list rather than:
a=[]
for item in data:
if item not in a:
a.add(item)
I do not want to lose the order.
Would hashing be faster (I don't need encryption)? But then I'd have to maintain a hash table for all the values to check first.
Is there any way I'm missing?
I'm on python 2, can at max go upto python 3.5.
It's hard to answer this question because it keeps changing ;-) The version I'm answering asks whether there's a faster way than:
a=[]
for item in data:
if item not in a:
a.add(item)
That will be horridly slow, taking time quadratic in len(data). In any version of Python the following will take expected-case time linear in len(data):
seen = set()
for item in data:
if item not in seen:
seen.add(item)
emit(item)
where emit() does whatever you like (append to a list, write to a file, whatever).
In comments I already noted ways to achieve the same thing with ordered dictionaries (whether ordered by language guarantee in Python 3.7, or via the OrderedDict type from the collections package). The code just above is the most memory-efficient, though.
You can try this,
a = list(set(data))
A List is an ordered sequence of elements whereas Set is a distinct list of elements which is unordered.
My problem is, that need a list with length of 6:
list=[[],[],[],[],[],[]]
Ok, that's not difficult. Next I'm going to insert integers into the list:
list=[[60],[47],[0],[47],[],[]]
Here comes the real problem: How can I now extend the lists and fill them again and so on, so that it looks something like that:
list=[[60,47,13],[47,13,8],[1,3,1],[13,8,5],[],[]]
I can't find a solution, because at the beginning i do not know the length of each list, I know, they are all the same, but I'm not able to say what length exactly they will have at the end, so I'm forced to add an element to each of these lists, but for some reason i can't.
Btw: This is not a homework, it's part of a private project :)
You don't. You use normal list operations to add elements.
L[0].append(47)
Don't use the name list for your variable it conflicts with the built-in function list()
my_list = [[],[],[],[],[],[]]
my_list[0].append(60)
my_list[1].append(47)
my_list[2].append(0)
my_list[3].append(47)
print my_list # prints [[60],[47],[0],[47],[],[]]
I'm a Python newbie. I have a series of objects that need to be inserted at specific indices of a list, but they come out of order, so I can't just append them. How can I grow the list whenever necessary to avoid IndexErrors?
def set(index, item):
if len(nodes) <= index:
# Grow list to index+1
nodes[index] = item
I know you can create a list with an initial capacity via nodes = (index+1) * [None] but what's the usual way to grow it in place? The following doesn't seem efficient:
for _ in xrange(len(nodes), index+1):
nodes.append(None)
In addition, I suppose there's probably a class in the Standard Library that I should be using instead of built-in lists?
This is the best way to of doing it.
>>> lst.extend([None]*additional_size)
oops seems like I misunderstood your question at first. If you are asking how to expand the length of a list so you can insert something at an index larger than the current length of the list, then lst.extend([None]*(new_size - len(lst)) would probably be the way to go, as others have suggested. Of course, if you know in advance what the maximum index you will be needing is, it would make sense to create the list in advance and fill it with Nones.
For reference, I leave the original text: to insert something in the middle of the existing list, the usual way is not to worry about growing the list yourself. List objects come with an insert method that will let you insert an object at any point in the list. So instead of your set function, just use
lst.insert(item, index)
or you could do
lst[index:index] = item
which does the same thing. Python will take care of resizing the list for you.
There is not necessarily any class in the standard library that you should be using instead of list, especially if you need this sort of random-access insertion. However, there are some classes in the collections module which you should be aware of, since they can be useful for other situations (e.g. if you're always appending to one end of the list, and you don't know in advance how many items you need, deque would be appropriate).
Perhaps something like:
lst += [None] * additional_size
(you shouldn't call your list variable list, since it is also the name of the list constructor).
I have a config file that contains a list of strings. I need to read these strings in order and store them in memory and I'm going to be iterating over them many times when certain events take place. Since once they're read from the file I don't need to add or modify the list, a tuple seems like the most appropriate data structure.
However, I'm a little confused on the best way to first construct the tuple since it's immutable. Should I parse them into a list then put them in a tuple? Is that wasteful? Is there a way to get them into a tuple first without the overhead of copying/destroying the tuple every time I add a new element.
As you said, you're going to read the data gradually - so a tuple isn't a good idea after all, as it's immutable.
Is there a reason for not using a simple list for holding the strings?
Since your data is changing, I am not sure you need a tuple. A list should do fine.
Look at the following which should provide you further information. Assigning a tuple is much faster than assigning a list. But if you are trying to modify elements every now and then then creating a tuple may not make more sense.
Are tuples more efficient than lists in Python?
I wouldn't worry about the overhead of first creating a list and then a tuple from that list. My guess is that the overhead will turn out to be negligible if you measure it.
On the other hand, I would stick with the list and iterate over that instead of creating a tuple. Tuples should be used for struct like data and list for lists of data, which is what your data sounds like to me.
with open("config") as infile:
config = tuple(infile)
You may want to try using chained generators to create your tuple. You can use the generators to perform multiple filtering and transformation operations on your input without creating intermediate lists. All of the generator processing is delayed until iteration. In the example below the processing/iteration all happens on the last line.
Like so:
f = open('settings.cfg')
step1 = (tuple(i.strip() for i in l.split(':', 1)) for l in f if len(l) > 2 and ':' in l)
step2 = ((l[0], ',' in l[1] and 'Tag' in l[0] and l[1].split(',') or l[1]) for l in step1)
t = tuple(step2)