I have values in a list of lists.
I would like to send the whole block to a conversion function which then returns all the converted values in the same structure.
my_list = [sensor1...sensor4] = [hum1...hum3] = [value1, value2, value3, value4]
So several nested lists
def conversion(my_list): dictionaries
for sensor in my_list:
for hum in sensor:
for value in hum:
map(function, value)
Is there a way to do a list comprehension as a one liner? I'm not sure how to use the map function in comprehensions especially when you have several nested iterations.
map(function, value)
Since you are just mapping a function on each value, without collecting the return value in a list, using a list comprehension is not a good idea. You could do it, but you would be collecting list items that have no value, for the sole purpose of throwing them away later—just so you can save a few lines that actually serve a much better purpose: Clearly telling what’s going on, without being in a single, long, and complicated line.
So my advice would be to keep it as it is. It makes more sense like that and clearly shows what’s going on.
I am however collecting the values. They all need to be converted and saved in the same structure as they were.
In that case, you still don’t want a list comprehension as that would mean that you created a new list (for no real reason). Instead, just update the most-inner list. To do that, you need to change the way you’re iterating though:
for sensor in my_list:
for hum in sensor:
for i, value in enumerate(hum):
hum[i] = map(function, value)
This will update the inner list.
Alternatively, since value is actually a list of values, you can also replace the value list’s contents using the slicing syntax:
for sensor in my_list:
for hum in sensor:
for value in hum:
value[:] = map(function, value)
Also one final note: If you are using Python 3, remember that map returns a generator, so you need to convert it to a list first using list(map(function, value)); or use a list comprehension for that part with [function(v) for v in value].
This is the right way to do it. You can use list comprehension to do that, but you shouldn't for code readability and because it's probably not faster.
Related
Python:
I have to use the length of a list which is the value for a key in a dictionary. I have to use this value in FOR loop. Is it better to fetch the length of the list associated with the key every time or fetch the length from a different dictionary which has the same keys?
I am using len() in the for loop as of now.
len() is very fast - it runs in contant time (see Cost of len() function) so I would not build a new data structure just to cache its answer. Just use it each time you need it.
Building a whole extra data structure, that would definitely be using more resources, and most likely slower. Just make sure you write your loop over my_dict.items(), not over the keys, so you don't unnecessarily redo the key lookups inside the loop.
E.g., use something like this for efficient looping over your dict:
my_dict = <some dict where the values are lists>
for key, value in my_dict.items():
# use key, value (your list) and len(value) (its length) as needed
I have an input of about 2-5 millions strings of about 400 characters each, coming from a stored text file.
I need to check for duplicates before adding them to the list that I check (doesn't have to be a list, can be any other data type, the list is technically a set since all items are unique).
I can expect about 0.01% at max of my data to be non-unique and I need to filter them out.
I'm wondering if there is any faster way for me to check if the item exists in the list rather than:
a=[]
for item in data:
if item not in a:
a.add(item)
I do not want to lose the order.
Would hashing be faster (I don't need encryption)? But then I'd have to maintain a hash table for all the values to check first.
Is there any way I'm missing?
I'm on python 2, can at max go upto python 3.5.
It's hard to answer this question because it keeps changing ;-) The version I'm answering asks whether there's a faster way than:
a=[]
for item in data:
if item not in a:
a.add(item)
That will be horridly slow, taking time quadratic in len(data). In any version of Python the following will take expected-case time linear in len(data):
seen = set()
for item in data:
if item not in seen:
seen.add(item)
emit(item)
where emit() does whatever you like (append to a list, write to a file, whatever).
In comments I already noted ways to achieve the same thing with ordered dictionaries (whether ordered by language guarantee in Python 3.7, or via the OrderedDict type from the collections package). The code just above is the most memory-efficient, though.
You can try this,
a = list(set(data))
A List is an ordered sequence of elements whereas Set is a distinct list of elements which is unordered.
What I have is a dictionary of words and I'm generating objects that contain
(1) Original word (e.g. cats)
(2) Alphabetized word (e.g. acst)
(3) Length of the word
Without knowing the length of the longest word, is it possible to create an array (or, in Python, a list) such that, as I scan through the dictionary, it will append an object with x chars into a list in array[x]?
For example, when I encounter the word "a", it will append the generated object to the list at array[1]. Next, for aardvark, if will append the generated object to the list at array[8], etc.
I thought about creating an array of size 1 and then adding on to it, but I'm not sure how it would work.
Foe example: for the first word, a, it will append it to the list stored in array[1]. However, for next word, aardvark, how am I supposed to check/generate more spots in the list until it hits 8? If I append to array, I need give the append function an arg. But, I can't give it just any arg since I don't want to change previously entered values (e.g. 'a' in array[1]).
I'm trying to optimize my code for an assignment, so the alternative is going through the list a second time after I've determined the longest word. However, I think it would be better to do it as I alphabetize the words and create the objects such that I don't have to go through the lengthy dictionary twice.
Also, quick question about syntax: listOfStuff[x].append(y) will initialize/append to the list within listOfStuff at the value x with the value y, correct?
Store the lengths as keys in a dict rather than as indexes in a list. This is really easy if you use a defaultdict from the collections module - your algorithm will look like this:
from collections import defaultdict
results = defaultdict(list)
for word in words:
results[len(word)].append(word)
This ties in to your second question: listOfStuff[x].append(y) will append to a list that already exists at listofStuff[x]. It will not create a new one if that hasn't already been initialised to a (possibly empty) list. If x isn't a valid index to the list (eg, x=3 into a listOfStuff length 2), you'll get an IndexError. If it exists but there is something other than another list there, you will probably get an AttributeError.
Using a dict takes care of the first problem for you - assigning to a non-existent dict key is always valid. Using a defaultdict extends this idea to also reading from a non-existent key - it will insert a default value given by calling the function you give the defaultdict when you create it (in this case, we gave it list, so it calls it and gets an empty list) into the dict the first time you use it.
If you can't use collections for some reason, the next best way is still to use dicts - they have a method called setdefault that works similarly to defaultdicts. You can use it like this:
results = {}
for word in words:
results.setdefault(len(word), []).append(word)
as you can see, setdefault takes two arguments: a key and a default value. If the key already exists in the dict, setdefault just returns its current value as if you'd done results[key]. If that would be an error, however, it inserts the second argument into the dictionary at that key, and then returns it. This is a little bit clunkier to use than defaultdict, but when your default value is an empty list it is otherwise the same (defaultdict is better to use when your default is expensive to create, however, since it only calls the factory function as needed, but you need to precompute it to pass into setdefault).
It is technically possible to do this with nested lists, but it is ugly. You have to:
Detect the case that the list isn't big enough
Figure out how many more elements the list needs
Grow the list to that size
the most Pythonic way to do the first bit is to catch the error (something you could also do with dicts if setdefault and defaultdict didn't exist). The whole thing looks like this:
results = []
for word in words:
try:
results[len(word)]
except IndexError:
# Grow the list so that the new highest index is
# len(word)
new_length = len(word) + 1
difference = len(results) - new_length
results.extend([] for _ in range(difference))
finally:
results[len(word)].append(word)
Stay with dicts to avoid this kind of mess. lists are specifically optimised for the case that the exact numeric index of any element isn't meaningful outside of the list, which doesn't meet your use case. This type of code is really common when you have a mismatch between what your code needs to do and what the data structures you're using are good at, and it is worth learning as early as possible how to avoid it.
This is a question and answer I wanted to share, since I found it very useful.
Suppose I have a dictionary accessible with different keys. And at each position of the dictionary I have a list of a fixed length:
a={}
a["hello"]=[2,3,4]
a["bye"]=[0,10,100]
a["goodbye"]=[2,5,50]
I was interested to compute the sum across all entries in a using only position 1 of their respective lists.
In the example, I wanted to sum:
finalsum=sum([3,10,5]) #-----> 18
Just skip the keys entirely, since they don't really matter.
sum(i[1] for i in a.itervalues())
Also as a side note, you don't need to do a.keys() when iterating over a dict, you can just say for key in a and it will use the keys.
You can use a.values() to get a list of all the values in a dict. As far as I can tell, the keys are irrelevant. a.itervalues() works by iterating rather than constructing a new list. By using this, and a generator expression as the argument to sum, there are no extraneous lists created.
I used list-comprehensions for my one line solution(here separated in two lines):
elements=[a[pos][1] for pos in a.keys()] #----> [3,5,10]
finalsum=sum(elements)
I'm happy with this solution :) , but, any other suggestions?
I have a config file that contains a list of strings. I need to read these strings in order and store them in memory and I'm going to be iterating over them many times when certain events take place. Since once they're read from the file I don't need to add or modify the list, a tuple seems like the most appropriate data structure.
However, I'm a little confused on the best way to first construct the tuple since it's immutable. Should I parse them into a list then put them in a tuple? Is that wasteful? Is there a way to get them into a tuple first without the overhead of copying/destroying the tuple every time I add a new element.
As you said, you're going to read the data gradually - so a tuple isn't a good idea after all, as it's immutable.
Is there a reason for not using a simple list for holding the strings?
Since your data is changing, I am not sure you need a tuple. A list should do fine.
Look at the following which should provide you further information. Assigning a tuple is much faster than assigning a list. But if you are trying to modify elements every now and then then creating a tuple may not make more sense.
Are tuples more efficient than lists in Python?
I wouldn't worry about the overhead of first creating a list and then a tuple from that list. My guess is that the overhead will turn out to be negligible if you measure it.
On the other hand, I would stick with the list and iterate over that instead of creating a tuple. Tuples should be used for struct like data and list for lists of data, which is what your data sounds like to me.
with open("config") as infile:
config = tuple(infile)
You may want to try using chained generators to create your tuple. You can use the generators to perform multiple filtering and transformation operations on your input without creating intermediate lists. All of the generator processing is delayed until iteration. In the example below the processing/iteration all happens on the last line.
Like so:
f = open('settings.cfg')
step1 = (tuple(i.strip() for i in l.split(':', 1)) for l in f if len(l) > 2 and ':' in l)
step2 = ((l[0], ',' in l[1] and 'Tag' in l[0] and l[1].split(',') or l[1]) for l in step1)
t = tuple(step2)