How to plot numpy arrays in pandas dataframe - python

I have the DataFrame:
df =
sample_type observed_data
A [0.2, 0.5, 0.17, 0.1]
A [0.9, 0.3, 0.24, 0.5]
A [0.9, 0.5, 0.6, 0.39]
B [0.01, 0.07, 0.15, 0.26]
B [0.08, 0.14, 0.32, 0.58]
B [0.01, 0.16, 0.42, 0.41]
where the data type in the observed_data column is np.array. What's the easiest and most efficient way of plotting each of the numpy arrays overlayed on the same plot using matplotlib and/or plotly and showing A and B as separate colors or line types (eg. dashed, dotted, etc.)?

You can use this...
df = pd.DataFrame({'sample_type' : ['A', 'A', 'A', 'B', 'B', 'B'],
'observed_data' : [[0.2, 0.5, 0.17, 0.1], [0.9, 0.3, 0.24, 0.5], [0.9, 0.5, 0.6, 0.39],
[0.01, 0.07, 0.15, 0.26], [0.08, 0.14, 0.32, 0.58], [0.01, 0.16, 0.42, 0.41]]})
for ind, cell in df['observed_data'].iteritems():
if len(cell) > 0:
if df.loc[ind,'sample_type'] == 'A':
plotted = plt.plot(np.linspace(0,1,len(cell)), cell, color='blue', marker = 'o', linestyle = '-.')
else:
plotted = plt.plot(np.linspace(0,1,len(cell)), cell, color='red', marker = '*', linestyle = ':')
plt.show()

Related

Pandas .apply with conditional if in one column

I have a dataframe as below. I am trying to check if there is 0 or 1 in the vector column, if yes,
add 10 to the vector and divide by adding 2 to the vector otherwise keep the same vector.
df = pd.DataFrame({'user': ['user 1', 'user 2', 'user 3'],
'vector': [[0.01, 0.07, 0.0, 0.14, 0.0, 0.55, 0.11],
[0.12, 0.27, 0.1, 0.14, 0.1, 0.09, 0.19],
[0.58, 0.07, 0.02, 0.14, 0.04, 0.06, 1]]})
df
Output:
user vector
0 user 1 [0.01, 0.07, 0.0, 0.14, 0.0, 0.55, 0.11]
1 user 2 [0.12, 0.27, 0.1, 0.14, 0.1, 0.09, 0.19]
2 user 3 [0.58, 0.07, 0.02, 0.14, 0.04, 0.06, 1]
I used the following code:
df['vector']=df.apply(lambda x: x['vector']+10/(x['vector']+2) if x['vector']==0|1 else x['vector'], axis=1)
But the Output:
user vector
0 user 1 [0.01, 0.07, 0.0, 0.14, 0.0, 0.55, 0.11]
1 user 2 [0.12, 0.27, 0.1, 0.14, 0.1, 0.09, 0.19]
2 user 3 [0.58, 0.07, 0.02, 0.14, 0.04, 0.06, 1]
The expected output:
Use a list comprehension (faster than apply):
df['vector'] = [[x+10/(x+2) if x in [0,1] else x for x in v] for v in df['vector']]
Output:
user vector
0 user 1 [0.01, 0.07, 5.0, 0.14, 5.0, 0.55, 0.11]
1 user 2 [0.12, 0.27, 0.1, 0.14, 0.1, 0.09, 0.19]
2 user 3 [0.58, 0.07, 0.02, 0.14, 0.04, 0.06, 4.333333333333334]

math.cos(t) and an array for t results in an error. how to solve this? [duplicate]

This question already has answers here:
Cosinus of an array in Python
(3 answers)
Closed 1 year ago.
so this is the code:
def s(t):
return math.cos(2*(math.pi)*10*t)
s1_n = s(n1T)
error message: "only size-1 arrays can be converted to Python scalars"
this is the output for
n1T
"array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
0.99, 1. ])"
this are the used modules:
from thkdsp import * from dsplab import * from numpy import arange,
shape, array, zeros, size, ones, isscalar %pylab inline from
thkdsp.interpolation import ideal_interpolation from
thkdsp.interpolation import ideal_interpolation_mat
how to solve this?
-thanks!!
Found it myself :)
Just use numpy.cos(...) and it works perfectly.

Why does my precision change between arrays and lists?

I am trying to check for shared elements between two arrays. Here are the arrays and how they are formed:
# get range between two values from a df
d = pd.DataFrame([[0.06, 0.81]], columns=['start','stop'])
# rounding to 2 to enforce 2 sig digits
a = np.arange(
np.around(d.iloc[0]['start'], 2),
np.around(d.iloc[0]['stop'], 2),
.01
)
and the other array:
b = np.around(0.6999999999999993,2), np.around(1.2400000000000002,2)
b = np.arange(
b[0],
b[1],
.01
)
Now, I want to check if they share any values:
bool(set(a) & set(b))
This gives me False. It should be True. It should be true because this is what a and b look like printed out:
# a
array([0.06, 0.07, 0.08, 0.09, 0.1 , 0.11, 0.12, 0.13, 0.14, 0.15, 0.16,
0.17, 0.18, 0.19, 0.2 , 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27,
0.28, 0.29, 0.3 , 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38,
0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49,
0.5 , 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6 ,
0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7 , 0.71,
0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ])
# b
array([0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1. , 1.01, 1.02,
1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23])
I get this weird behavior where if I print out a and b in my jupyter notebook, and then copy and paste the results into new variables and rerun the test, I get True. See below:
a = [0.06, 0.07, 0.08, 0.09, 0.1 , 0.11, 0.12, 0.13, 0.14, 0.15, 0.16,
0.17, 0.18, 0.19, 0.2 , 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27,
0.28, 0.29, 0.3 , 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38,
0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49,
0.5 , 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6 ,
0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7 , 0.71,
0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ]
b = [0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ,
0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 , 0.91,
0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1. , 1.01, 1.02,
1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 , 1.11, 1.12, 1.13,
1.14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.2 , 1.21, 1.22, 1.23]
bool(set(a) & set(b))
This made me think that the arrays needed to be lists, so I ran this with the original a and b data:
bool(set(list(a)) & set(list(b)))
but still False.
Any ideas?

Computation within a list of dictionaries

I have a list of dictionaries:
wt
Out[189]:
[defaultdict(int,
{'A01': 0.15,
'A02': 0.17,
'A03': 0.13,
'A04': 0.17,
'A05': 0.01,
'A06': 0.12,
'A07': 0.15,
'A08': 0.0,
'A09': 0.02,
'A10': 0.09}),
defaultdict(int,
{'A01': 0.02,
'A02': 0.02,
'A03': 0.06,
'A04': 0.08,
'A05': 0.08,
'A06': 0.04,
'A07': 0.02,
'A08': 0.24,
'A09': 0.34,
'A10': 0.1}),
defaultdict(int,
{'A01': 0.0,
'A02': 0.12,
'A03': 0.01,
'A04': 0.01,
'A05': 0.11,
'A06': 0.13,
'A07': 0.1,
'A08': 0.36,
'A09': 0.13,
'A10': 0.03})]
And I have another dictionary:
zz
Out[188]: defaultdict(int, {'S1': 0.44, 'S2': 0.44, 'S3': 0.12})
I need to run a loop to aggregate the following computation:
'S1':0.44 * 'A01':0.15 + 'S2':0.44 * 'A01':0.02 + 'S3':0.12 * 'A01':0.00 ----- to be stored in a dict with the key 'A01'
'S1':0.44 * 'A02':0.17 + 'S2':0.44 * 'A02':0.02 + 'S3':0.12 * 'A02':0.12 ----- to be stored in a dict with the key 'A02'
.
.
.and so on upto:
'S1':0.44 * 'A10':0.09 + 'S2':0.44 * 'A10':0.1 + 'S3':0.12 * 'A10':0.03 ----- to be stored in a dict with the key 'A10'
Can somebody please suggest a loop for this? The issue I'm facing is that:
wt[0]
Out[197]:
defaultdict(int,
{'A01': 0.15,
'A02': 0.17,
'A03': 0.13,
'A04': 0.17,
'A05': 0.01,
'A06': 0.12,
'A07': 0.15,
'A08': 0.0,
'A09': 0.02,
'A10': 0.09})
But:
wt[0][0]
Out[199]: 0
I'm not being able to access each value within the dict.
You can do your aggregation with a dict comprehension:
x = [defaultdict(int, {'A01': 0.15, 'A02': 0.17, 'A03': 0.13, 'A04': 0.17, 'A05': 0.01, 'A06': 0.12, 'A07': 0.15, 'A08': 0.0, 'A09': 0.02, 'A10': 0.09}),
defaultdict(int, {'A01': 0.02, 'A02': 0.02, 'A03': 0.06, 'A04': 0.08, 'A05': 0.08, 'A06': 0.04, 'A07': 0.02, 'A08': 0.24, 'A09': 0.34, 'A10': 0.1}),
defaultdict(int, {'A01': 0.0, 'A02': 0.12, 'A03': 0.01, 'A04': 0.01, 'A05': 0.11, 'A06': 0.13, 'A07': 0.1, 'A08': 0.36, 'A09': 0.13, 'A10': 0.03})]
mult = defaultdict(int, {'S1': 0.44, 'S2': 0.44, 'S3': 0.12})
d = {k: sum(d[k] * mult['S'+str(idx+1)]
for idx, d in enumerate(x)) for k in x[0].keys()}
If you want to multiply your matrix with a vector, you should try numpy:
import numpy as np
# Transform data to matrix
x = np.array([[d['A'+str(i+1).zfill(2)] for i in range(len(d))] for d in x])
v = np.array([mult['S'+str(i+1)] for i in range(len(mult))]).reshape(1, 3)
print(np.matmul(v, x))
# [[0.0748 0.098 0.0848 0.1112 0.0528 0.086 0.0868 0.1488 0.174 0.0872]]

how to average between values in two files?

I have a two file matrices, that look like this
File1:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10].....'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}
File2:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,
0.26, 0.11].....'key100',g,l,i,o,+: [0.2, 0.0, 0.23, 0.16, 0.21]}
Both files have the same 'keys'. I want to average the values between the two files, so the result file looks like this:
Desired output file:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.94, 0.04, 0.01],'key2',g,l,i,o,+: [0.05, 0.15, 0.925,
0.26, 0.105].....'key100',g,l,i,o,+: [0.15, 0.1, 0.29, 0.175, 0.205]}
I have thought about the python script I could write, but since I am quite new to this, any quick ideas would be welcome:
import gzip
import numpy as np
inFile1 = gzip.open('/home/file1')
inFile2 = gzip.open('/home/file2')
inFile.next()
for line in inFile:
cols = line.strip().split('\t')
data = cols[6:]
for line in inFile2:
cols = line.strip().split('\t')
data2 = cols[6:]
newdata = (data + data2)/2
You could use regex to replace the strings and make it JSON compatible. Then you can easily convert it into a dict and then just use normal python to analyse the data (compare the dicts):
import re
import json
s = '''{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10],'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}'''
s2 = re.sub('\'(key\d+)\',g,l,i,o,\+', r'"\1"', s)
print(s2)
d = json.loads(s2)
print(d)
Problem is your data format , as Wodin commented :
what is this format? It looks a bit like a Python dict, but the
,g,l,i,o,+ doesn't make sense for a dict.
I tried with your data , You can take hint , help from this code:
I tried with
File1.txt
{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,0.26, 0.10]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}
File2.txt:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,0.26, 0.11]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}
Code:
import re
pattern=r"('key\w+',g,l,i,o,\+):\s(\[.+?\])"
with open('File1.txt','r') as f:
for line in f:
average = {}
pr=re.finditer(pattern,line)
for find in pr:
with open('File2','r') as ff:
for line in ff:
for find1 in re.finditer(pattern,line):
if find.group(1)==find1.group(1):
average_part=list(map(lambda x: sum(x) / len(x), list(zip(eval(find.group(2)),eval(find1.group(2))))))
rest_part=find.group(1)
average[rest_part]=average_part
print(average)
output:
{"'key2',g,l,i,o,+": [0.05, 0.15000000000000002, 0.925, 0.26, 0.10500000000000001], "'key1',g,l,i,o,+": [0.0, 0.0, 0.94, 0.04, 0.01]}
{"'key3',g,l,i,o,+": [0.0, 0.0, 0.98, 0.02, 0.01], "'key4',g,l,i,o,+": [0.1, 0.2, 0.9, 0.268, 0.1]}

Categories