Related
[[1.0, 1.0], [1.0, 10.0], [1.0, 11.0], [1.0, 12.0], [1.0, 13.0]],
[[2.0, 1.0], [2.0, 10.0], [2.0, 11.0], [2.0, 12.0], [2.0, 13.0]],
[[3.0, 1.0], [3.0, 10.0], [3.0, 11.0], [3.0, 12.0], [3.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]]]
If I have a list like this, how do I return the duplicate rows as different groups?
I don't know how to compare if two lists are same when there are sub-list in them.
For this problem, the ideal output would be:
[[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]]]
You want to sort your data and make each row a tuple to be able to group them. Then check if there are groups with more than 1 element -> duplicates.
Input:
ll = [[[1.0, 1.0], [1.0, 10.0], [1.0, 11.0], [1.0, 12.0], [1.0, 13.0]],
[[2.0, 1.0], [2.0, 10.0], [2.0, 11.0], [2.0, 12.0], [2.0, 13.0]],
[[3.0, 1.0], [3.0, 10.0], [3.0, 11.0], [3.0, 12.0], [3.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]]]
import itertools
l = [tuple(sorted(lst)) for lst in ll]
out = []
for k, g in itertools.groupby(l):
g = list(g)
if len(g)>1:
out.append(g)
print(out)
[[([4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]),
([4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0])]]
or as one-liner:
out = list(filter(lambda s: len(s) > 1, [list(g) for k, g in itertools.groupby([tuple(sorted(lst)) for lst in ll])]))
You can find all those nested elements whose count is greater than one and then keep them in List comprehension.
l=[[[1.0, 1.0], [1.0, 10.0], [1.0, 11.0], [1.0, 12.0], [1.0, 13.0]],
[[2.0, 1.0], [2.0, 10.0], [2.0, 11.0], [2.0, 12.0], [2.0, 13.0]],
[[3.0, 1.0], [3.0, 10.0], [3.0, 11.0], [3.0, 12.0], [3.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]]]
[x for x in l if l.count(x)>1]
#output
[[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]],
[[4.0, 1.0], [4.0, 10.0], [4.0, 11.0], [4.0, 12.0], [4.0, 13.0]]]
I have a dataset that consists of ID (participant), run, indexnumber (that is, an index number of a slalom turn) and performance (that could be velocity or time). In addition, I have information for each id and run where in the slalom turn (that is, the index) they actually start to turn.
My goal is to create a new column in the dataframe that contain 0 if the id has not started to turn and 1 if they have started to turn. This column could be called phase.
For example:
For ID1 the point where this skier starts to turn i index 4 for the first run and 9 for the second run. Therefore, I want all rows in the new column to contain 0s until index nr 4 and 1s thereafter (for the first run). For the second run I want all rows to contain 0s until index nr 9 and 1 thereafter.
Is there a simple way to do this with pandas or vanilla python?
example = [[1.0, 1.0, 1.0, 0.6912982024915187],
[1.0, 1.0, 2.0, 0.16453900411106737],
[1.0, 1.0, 3.0, 0.11362801727310845],
[1.0, 1.0, 4.0, 0.587778444335624],
[1.0, 1.0, 5.0, 0.8455388913351765],
[1.0, 1.0, 6.0, 0.5719366584505648],
[1.0, 1.0, 7.0, 0.4665520044952449],
[1.0, 1.0, 8.0, 0.9105152709573275],
[1.0, 1.0, 9.0, 0.4600099001744885],
[1.0, 1.0, 10.0, 0.8577060884077763],
[1.0, 2.0, 1.0, 0.11550722410813963],
[1.0, 2.0, 2.0, 0.5729090378222077],
[1.0, 2.0, 3.0, 0.43990164344919824],
[1.0, 2.0, 4.0, 0.595242293948498],
[1.0, 2.0, 5.0, 0.443684017624451],
[1.0, 2.0, 6.0, 0.3608135854303052],
[1.0, 2.0, 7.0, 0.28525404982906766],
[1.0, 2.0, 8.0, 0.11561422303194391],
[1.0, 2.0, 9.0, 0.8579134051748011],
[1.0, 2.0, 10.0, 0.540598113345226],
[2.0, 1.0, 1.0, 0.4058570295736075],
[2.0, 1.0, 2.0, 0.9422426000325298],
[2.0, 1.0, 3.0, 0.7918655742964762],
[2.0, 1.0, 4.0, 0.4145753321336241],
[2.0, 1.0, 5.0, 0.5256388261997529],
[2.0, 1.0, 6.0, 0.8140335187050629],
[2.0, 1.0, 7.0, 0.12134416740848841],
[2.0, 1.0, 8.0, 0.9016748379372173],
[2.0, 1.0, 9.0, 0.462241316800442],
[2.0, 1.0, 10.0, 0.7839715857746699],
[2.0, 2.0, 1.0, 0.5300527244824904],
[2.0, 2.0, 2.0, 0.8784844676567194],
[2.0, 2.0, 3.0, 0.14395673182343738],
[2.0, 2.0, 4.0, 0.7606405990262495],
[2.0, 2.0, 5.0, 0.5123048342846208],
[2.0, 2.0, 6.0, 0.25608277502943655],
[2.0, 2.0, 7.0, 0.4264542956426933],
[2.0, 2.0, 8.0, 0.9144976708651866],
[2.0, 2.0, 9.0, 0.875888479621729],
[2.0, 2.0, 10.0, 0.3428732760552141]]
turnPhaseId1 = [4,9] #the index number when ID1 starts to turn in run 1 and run 2, respectively
turnPhaseId2 = [2,5] #the index number when ID2 starts to turn in run 1 and run 2, respectively
pd.DataFrame(example, columns=['id', 'run', 'index', 'performance'])
I believe it is a better idea to turnPhase into a dictionary, and then use apply:
turn_dict = {1: [4, 9],
2: [2, 5]}
We also need to change the column types as we need to reach dictionary keys, and list indexes, which are int:
df['id'] = df['id'].astype(int)
df['index'] = df['index'].astype(int)
Finally, apply:
df['new_column'] = df.apply(lambda x: 0 if x['index'] < turn_dict[x['id']][int(x['run'] -1)] else 1 , axis=1)
I have two arrays that I want to add together by inserting the elements of each row of the first list into the same row in the second list. So instead of converting two 2x3 matrices into one 4x3 matrix, I want to convert them into one 2x6 matrix. I have tried the following, as well as .append and .extend:
test_1 = [[0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]
test_2 = [[1.0, 2.0, 3.0],[4.0, 5.0, 6.0]]
test_3 = test_1 + test_2
print(test_3)
This gives me the output:
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
However, what I want is:
[[0.0, 0.0, 0.0, 1.0, 2.0, 3.0],[0.0, 0.0, 0.0, 4.0, 5.0, 6.0]]
How do I add the elements of one matrix into the same row on the other matrix?
test_3 = [a+b for a,b in zip(test_1,test_2)]
However, if you're eventually going to convert this to numpy, as most Python matrix processing does, then you will use np.concatenate to do this.
test_1 = [[0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]
test_2 = [[1.0, 2.0, 3.0],[4.0, 5.0, 6.0]]
test_3=[]
for i in range(len(test_1)):
var=test_1[i]+test_2[i]
test_3.append(var)
print(test_3)
I have a list of lists with sublists all of which contain float values.
For example the one below has 2 lists with sublists each:
mylist = [[[2.67, 2.67, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]], [[2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 0.0, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]]]
I want to calculate the standard deviation and the mean of the sublists and what I applied was this:
mean = [statistics.mean(d) for d in mylist]
stdev = [statistics.stdev(d) for d in mylist]
but it takes also the 0.0 values that I do not want because I turned them to 0 in order not to be empty ones. Is there a way to ignore these 0s as they do not exist in the sublist?To not take them under consideration at all? I could not find a way for how I am doing it.
You can use numpy's nanmean and nanstd functions.
import numpy as np
def zero_to_nan(d):
array = np.array(d)
array[array == 0] = np.NaN
return array
mean = [np.nanmean(zero_to_nan(d)) for d in mylist]
stdev = [np.nanstd(zero_to_nan(d)) for d in mylist]
You can do this with a list comprehension.
The following lambda function flattens the nested list into a single list and filters out all zeros:
flatten = lambda nested: [x for sublist in nested for x in sublist if x != 0]
Note that the list comprehension has two for and one ifstatement similar to this code snippet, which does essentially the same:
flat_list = []
for sublist in nested:
for x in sublist:
if x != 0:
flat_list.append(x)
To apply this to your list you can use map. The map function will return an iterator. To get a list we need to pass the iterator to list:
flat_list = list(map(flatten, myList))
Now you can calculate the mean and standard deviation:
mean = [statistics.mean(d) for d in flat]
stdev = [statistics.stdev(d) for d in flat]
print(mean)
print(stdev)
mean = [statistics.mean(d) for d in mylist if d != 0]
stdev = [statistics.stdev(d) for d in mylist if d != 0]
Try:
mean = [statistics.mean([k for k in d if k]) for d in mylist]
stdev = [statistics.stdev([k for k in d if k]) for d in mylist]
I'm having some issues importing data from file in Python. I am quite new to Python, so my error is probably quite simple.
I am reading in 3 column, tab-delimited text files with no headers. I am creating 3 instances of the data file using three different datafiles.
I can see that each object is referencing a different memory location, so they are separate.
When I look at the data stored in each instance, each instance has the same contents, consisting of the three datafiles appended to each other.
What have I done wrong?
The class to read in the data is:
class Minimal:
def __init__(self, data=[]):
self.data = data
def readFile(self, filename):
f = open(filename, 'r')
for line in f:
line = line.strip()
columns = line.split()
#creates a list of angle, intensity and error and appends it to the diffraction pattern
self.data.append( [float(columns[0]), float(columns[1]), float(columns[2])] )
f.close()
def printData(self):
for dataPoint in self.data:
print str(dataPoint)
The datafiles look like:
1 4 2
2 5 2.3
3 4 2
4 6 2.5
5 8 5
6 10 3
The program I am using to actually create the instances of Minimal is:
from minimal import Minimal
d1 = Minimal()
d1.readFile("data1.xye")
d2 = Minimal()
d2.readFile("data2.xye")
d3 = Minimal()
d3.readFile("data3.xye")
print "Data1"
print d1
d1.printData()
print "\nData2"
print d2
d2.printData()
print "\nData3"
print d3
d3.printData()
The output is:
Data1
<minimal.Minimal instance at 0x016A35F8>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]
Data2
<minimal.Minimal instance at 0x016A3620>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]
Data3
<minimal.Minimal instance at 0x016A3648>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]
Tool completed successfully
Default value data is evaluated only once; data attributes of Minimal instances reference the same list.
>>> class Minimal:
... def __init__(self, data=[]):
... self.data = data
...
>>> a1 = Minimal()
>>> a2 = Minimal()
>>> a1.data is a2.data
True
Replace as follow:
>>> class Minimal:
... def __init__(self, data=None):
... self.data = data or []
...
>>> a1 = Minimal()
>>> a2 = Minimal()
>>> a1.data is a2.data
False
See “Least Astonishment” in Python: The Mutable Default Argument.
Consider the following:
def d():
print("d() invoked")
return 1
def f(p=d())
pass
print"("Start")
f()
f()
It prints
d() invoked
Start
Not
Start
d() invoked
d() invoked
Why? Because default arguments are computed on function definition (and stored in some kind of internal global for reuse every subsequent time they are needed). They are not computed on each function invocation.
In other words, they behave more or less like:
_f_p_default= d()
def f(p)
if p is None: p= _f_p_default
pass
Make the above substitution in your code, and you will understand the problem immediately.
The correct form for your code was already provided by #falsetru . I'm just trying to explain the rationale.