I was reading an article and I came across this below-given piece of code. I ran it and it worked for me:
x = df.columns
x_labels = [v for v in sorted(x.unique())]
x_to_num = {p[1]:p[0] for p in enumerate(x_labels)}
#till here it is okay. But I don't understand what is going with this map.
x.map(x_to_num)
The final result from the map is given below:
Int64Index([ 0, 3, 28, 1, 26, 23, 27, 22, 20, 21, 24, 18, 10, 7, 8, 15, 19,
13, 14, 17, 25, 16, 9, 11, 6, 12, 5, 2, 4],
dtype='int64')
Can someone please explain to me how the .map() worked here. I searched online, but could not find anything related.
ps: df is a pandas dataframe.
Let's look what .map() function in general does in python.
>>> l = [1, 2, 3]
>>> list(map(str, l))
# ['1', '2', '3']
Here the list having numeric elements is converted to string elements.
So, whatever function we are trying to apply using map needs an iterator.
You probably might have got confused because the general syntax of map (map(MappingFunction, IteratorObject)) is not used here and things still work.
The variable x takes the form of IteratorObject , while the dictionary x_to_num contains the mapping and hence takes the form of MappingFunction.
Edit: this scenario has nothing to with pandas as such, x can be any iterator type object.
I am trying to write to a CSV file. I want to write three variables on a row and then write a variable number of columns.
So for example my script will do a bunch of calculations and come up with the idea that I need 12 columns.
So the 'variable' needs to contain column 0 thru 11.
How to do this dynamically?
numberofcolumns = 12
with open(f+".csv",'wb') as output_csvfile:
filewriter = csv.writer(output_csvfile)
filewriter.writerow([constant1,constant2,constant3,variable[0],...,variable[n]])
What I want is to do
filewriter.writerow([constant1, constant2, constant3, variable[0], variable[1],....,variable[11]])
However variable[11] may not be 11 it may be 8 or 10 or whatever. the length is dynamic. How can I make it so that this code will be able to output to Nth column if the function writerow() isn't defined to use *args?
What martineau pointed out in a comment is correct. writerow accepts a list, or sequence, of any length.
So you could do something like the following:
variable = range(12)
# Change your writerow line to be something like this:
filewriter.writerow([constant1,constant2,constant3] + variable)
range in this case is an example of creating a list of however-many items. range is documented here.
Notice that the above example uses + to put two sequences/lists together.
Here's an example of that from the command line/repl:
>>> variable = range(12)
>>> variable
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> ["x", "y", "z"] + variable
['x', 'y', 'z', 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
I have a dictionary containing a variable number of numpy arrays (all same length), each array is stored in its respective key.
For each index I want to replace the value in one of the arrays by a newly calculated value. (This is a very simplyfied version what I'm actually doing.)
The problem is that when I try this as shown below, the value at the current index of every array in the dictionary is replaced, not just the one I specify.
Sorry if the formatting of the example code is confusing, it's my first question here (Don't quite get how to show the line example_dict["key1"][idx] = idx+10 properly indented in the next line of the for loop...).
>>> import numpy as np
>>> example_dict = dict.fromkeys(["key1", "key2"], np.array(range(10)))
>>> example_dict["key1"]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> example_dict["key2"]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> for idx in range(10):
example_dict["key1"][idx] = idx+10
>>> example_dict["key1"]
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> example_dict["key2"]
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
I expected the loop to only access the array in example_dict["key1"], but somehow the same operation is applied to the array stored in example_dict["key2"] as well.
>>> hex(id(example_dict["key1"]))
'0x26a543ea990'
>>> hex(id(example_dict["key2"]))
'0x26a543ea990'
example_dict["key1"] and example_dict["key2"] are pointing at the same address. To fix this, you can use a dict comprehension.
import numpy
keys = ["key1", "key2"]
example_dict = {key: numpy.array(range(10)) for key in keys}
I have a list of lists. The lists within these list look like the following:
[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23],
[9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]
Every small list has 8 values from 0-23 and there are no value repeats within a small list.
What I need now are the three lists which have the values 0-23 stored. It is possible that there are a couple of combinations to accomplish it but I do only need one.
In this particular case the output would be:
[0,2,5,8,7,12,16,18], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15]
I thought to do something with the order but I'm not a python pro so it is hard for me to handle all the lists within the list (to compare all).
Thanks for your help.
The following appears to work:
from itertools import combinations, chain
lol = [[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]]
for p in combinations(lol, 3):
if len(set((list(chain.from_iterable(p))))) == 24:
print(p)
break # if only one is required
This displays the following:
([0, 2, 5, 8, 7, 12, 16, 18], [1, 3, 4, 17, 19, 6, 13, 23], [9, 22, 21, 10, 11, 20, 14, 15])
If it will always happen that 3 list will form numbers from 0-23, and you only want first list, then this can be done by creating combinations of length 3, and then set intersection:
>>> li = [[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]]
>>> import itertools
>>> for t in itertools.combinations(li, 3):
... if not set(t[0]) & set(t[1]) and not set(t[0]) & set(t[2]) and not set(t[1]) & set(t[2]):
... print t
... break
([0, 2, 5, 8, 7, 12, 16, 18], [1, 3, 4, 17, 19, 6, 13, 23], [9, 22, 21, 10, 11, 20, 14, 15])
Let's do a recursive solution.
We need a list of lists that contain these values:
target_set = set(range(24))
This is a function that recursively tries to find a list of lists that match exactly that set:
def find_covering_lists(target_set, list_of_lists):
if not target_set:
# Done
return []
if not list_of_lists:
# Failed
raise ValueError()
# Two cases -- either the first element works, or it doesn't
try:
first_as_set = set(list_of_lists[0])
if first_as_set <= target_set:
# If it's a subset, call this recursively for the rest
return [list_of_lists[0]] + find_covering_lists(
target_set - first_as_set, list_of_lists[1:])
except ValueError:
pass # The recursive call failed to find a solution
# If we get here, the first element failed.
return find_covering_lists(target_set, list_of_lists[1:])
I have multiple .txt files that contain multiple lines similar to this:
[class1] 1:-28 9:-315 13:-354227 2:-36.247 17:-342 8:-34 14:-3825
[class2] 14:-31.8679 7:-32.3582 2:-32.4127 1:-32.7257 8:-32.9804 16:-33.2156
I want to know how to read the numbers before the :s and store them in an array.
>>> import re
>>> text = "[class1] 1:-28 9:-315 13:-354227 2:-36.247 17:-342 8:-34 14:-3825"
>>> map(int, re.findall(r'(\S+):\S+', text)) # You could also do map(float,...)
[1, 9, 13, 2, 17, 8, 14]
I would use regex but here is a version without, clearer than #Thrustmaster's solution imo.
>>> text = "[class1] 1:-28 9:-315 13:-354227 2:-36.247 17:-342 8:-34 14:-3825"
>>> [int(x.split(':')[0]) for x in text.split()[1:]]
[1, 9, 13, 2, 17, 8, 14]
Or without using RE, if you know for sure the syntax of the file remains the same, you could use this:
>>> arr
['[class1] 1:-28 9:-315 13:-354227 2:-36.247 17:-342 8:-34 14:-3825', '[class2] 14:-31.8679 7:-32.3582 2:-32.4127 1:-32.7257 8:-32.9804 16:-33.2156']
>>> newArr = [map(lambda y: int(y[:y.index(":")]),x.split(" ")[1:]) for x in arr]
>>> newArr
[[1, 9, 13, 2, 17, 8, 14], [14, 7, 2, 1, 8, 16]]
UPDATE:
If you have several files, may be you would do something like this (based on #jamylak's clearer version of my solution):
[[[int(x.split(':')[0]) for x in line.split()[1:]] for line in open(fileName)] for fileName in fileNames]
where fileNames is the array of files you are speaking about