Pairing Adjacent Values in a Numpy Array - python

Say I have an array of values array = [0.0, 0.2, 0.5, 0.8, 1.0], and I want to pair adjacent values into a secondary list paired_array = [[0.0, 0.2], [0.2, 0.5], [0.5, 0.8], [0.8, 1.0]], is there an easy way of doing that in numpy?
For context, the pairs represent probability ranges which I will be using to randomise the values in a numpy array of type string. For example string_array = ['Fe', 'Pt', 'Fe', 'Pt', 'Fe', 'Pt', 'Fe', 'Pt'] may become something like randomised_array = ['Pt', 'Fe', 'Pt', 'Pt', 'Pt', 'Pt', 'Fe', 'Fe']. The ranges represent the probability a value is 'Pt' or 'Fe' in this case.

TRY:
from numpy.lib.stride_tricks import sliding_window_view
array = [0.0, 0.2, 0.5, 0.8, 1.0]
result = sliding_window_view(array, 2)
OUTPUT:
array([[0. , 0.2],
[0.2, 0.5],
[0.5, 0.8],
[0.8, 1. ]])

Related

How to plot numpy arrays in pandas dataframe

I have the DataFrame:
df =
sample_type observed_data
A [0.2, 0.5, 0.17, 0.1]
A [0.9, 0.3, 0.24, 0.5]
A [0.9, 0.5, 0.6, 0.39]
B [0.01, 0.07, 0.15, 0.26]
B [0.08, 0.14, 0.32, 0.58]
B [0.01, 0.16, 0.42, 0.41]
where the data type in the observed_data column is np.array. What's the easiest and most efficient way of plotting each of the numpy arrays overlayed on the same plot using matplotlib and/or plotly and showing A and B as separate colors or line types (eg. dashed, dotted, etc.)?
You can use this...
df = pd.DataFrame({'sample_type' : ['A', 'A', 'A', 'B', 'B', 'B'],
'observed_data' : [[0.2, 0.5, 0.17, 0.1], [0.9, 0.3, 0.24, 0.5], [0.9, 0.5, 0.6, 0.39],
[0.01, 0.07, 0.15, 0.26], [0.08, 0.14, 0.32, 0.58], [0.01, 0.16, 0.42, 0.41]]})
for ind, cell in df['observed_data'].iteritems():
if len(cell) > 0:
if df.loc[ind,'sample_type'] == 'A':
plotted = plt.plot(np.linspace(0,1,len(cell)), cell, color='blue', marker = 'o', linestyle = '-.')
else:
plotted = plt.plot(np.linspace(0,1,len(cell)), cell, color='red', marker = '*', linestyle = ':')
plt.show()

Transform JSON file to Data Frame in Python

I have a text file which has a JSON structure and I want to transform it to a data frame.
The JSON files includes several such JSON strings:
{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}
tweets_data_path = "data.txt"
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
tweets_data
df = pd.DataFrame.from_dict(pd.json_normalize(tweets_data), orient='columns')
df
However, apparently there is something wrong with either the json.loads or the append command, because the tweets_data is empty when I call it.
Do you have an idea?
This is how your code should be to append data to tweets_data.
import json
tweets_data_path = "data.txt"
tweets_data = []
with open(tweets_data_path, 'r') as f:
for line in f.readlines():
try:
tweet = json.loads(json.dumps(line))
tweets_data.append(tweet)
except:
continue
print(tweets_data)
["{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}\n", "{'cap': {'english': 0.1000, 'universal': 0.225}, 'display_scores': {'english': {'astroturf': 0.5, 'fake_follower': 0.8, 'financial': 0.2, 'other': 1.8, 'overall': 1.8, 'self_declared': 0.0, 'spammer': 0.2}, 'universal': {'astroturf': 0.4, 'fake_follower': 0.2, 'financial': 0.2, 'other': 0.4, 'overall': 0.8, 'self_declared': 0.0, 'spammer': 0.0}}, 'raw_scores': {'english': {'astroturf': 0.1, 'fake_follower': 0.16, 'financial': 0.05, 'other': 0.35, 'overall': 0.35, 'self_declared': 0.0, 'spammer': 0.04}, 'universal': {'astroturf': 0.07, 'fake_follower': 0.03, 'financial': 0.05, 'other': 0.09, 'overall': 0.16, 'self_declared': 0.0, 'spammer': 0.01}}, 'user': {'majority_lang': 'de', 'user_data': {'id_str': '123456', 'screen_name': 'beispiel01'}}}"]
instead of loading JSON into a dictionary, then converting that dictionary into a pandas dataframe, simply use pandas built-in function to convert from JSON to pandas dataframe
df = pd.read_json(tweets_file)
alternatively, if you wish to load JSON into dictionary, then convert dictionary to dataframe:
tweets_data = json.loads(tweets_file.read())
df = pd.DataFrame.from_dict(tweets_data, orient='columns')

Why the constraint cannot generate all possible combinations?

I am trying to generate all possible combinations under some constraints using the python-constraint. Here are the main() code:
list1 = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]
list2 = [-1.0, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
list3 = [-1.0, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0.0]
problem = constraint.Problem()
problem.addVariable('p1', list1)
problem.addVariable('p2', list2)
problem.addVariable('p3', list3)
problem.addConstraint(our_constraint, ['p1', 'p2', 'p3'])
solutions = problem.getSolutions()
The three lists are constrained within specific range.
The function, our_constraint, is defined by:
def our_constraint(p1, p2, p3):
if (p1 + p3 + p2) == 0:
return True
However, the output I get does not include every possible combinations. Some data are shown in below:
{'p3': -1.0, 'p1': 2.0, 'p2': -1.0}
{'p3': -1.0, 'p1': 1.8, 'p2': -0.8}
{'p3': -1.0, 'p1': 1.7, 'p2': -0.7}
{'p3': -1.0, 'p1': 1.6, 'p2': -0.6}
{'p3': -1.0, 'p1': 1.5, 'p2': -0.5}
{'p3': -1.0, 'p1': 1.3, 'p2': -0.3}
{'p3': -1.0, 'p1': 1.2, 'p2': -0.2}
{'p3': -1.0, 'p1': 1.1, 'p2': -0.1}
It does not include combinations like
{'p3': -1.0, 'p1': 1.4, 'p2': -0.4}
{'p3': -1.0, 'p1': 1.9, 'p2': -0.9}
I tried to rearrange the order of p1, p2 and p3 in our_constraint. And I found that it will generate other combinations. Why is that? And how can I fix it?

Dictionary looping getting every value

without any imports
# given
deps = {'W': ['R', 'S'], 'C': [], 'S': ['C'], 'R': ['C'], 'F': ['W']}
prob = {'C': [0.5], 'R': [0.2, 0.8], 'S': [0.5, 0.1], 'W': [0.01, 0.9, 0.9, 0.99], 'F' : [0.4, 0.3]}
k = 'F'
# want to return: L = [[0.2, 0.8], [0.5, 0.1], [0.01, 0.9, 0.9, 0.99], [0.4, 0.3]]
# attempt
L = []
for i in deps[k]:
s = i
while(deps[s] != []):
L.append(prob[s])
s = deps[s]
print(L)
I'm having trouble figuring this out. So given 2 dictionaries: dependents and probability I wish to traverse through a select point and set every value so for the above example I chose 'F'.
It would first go into the deps of 'F', find 'W' and then check the deps of that being ['R', 'S'] then check 'R' seeing that the depedent of 'R' is 'C' and 'C' does not a depedent so we stop at 'R' and append its probability into L.
[[0.2, 0.8]]
then we go into S and do the same thing
[[0.2, 0.8], [0.5, 0.1]]
then we're done with that and we're back at W
[[0.2, 0.8], [0.5, 0.1], [0.01, 0.9, 0.9, 0.99]]
and finally since we're done with W we get the prob dict of F
[[0.2, 0.8], [0.5, 0.1], [0.01, 0.9, 0.9, 0.99], [0.4, 0.3]]
My code fails when theres more than one dependent value. Not sure how to wrap my head around that. Trying to make a function that will do this given deps and prob and value of k
I would solve the problem with a while loop that keeps looking to see if you've used all the values you've recursively found. You can use a structure like:
deps = {'W': ['R', 'S'], 'C': [], 'S': ['C'], 'R': ['C'], 'F': ['W']}
# out = ['F', 'W', 'R', 'S']
prob = {'C': [0.5], 'R': [0.2, 0.8], 'S': [0.5, 0.1], 'W': [0.01, 0.9, 0.9, 0.99], 'F': [0.4, 0.3]}
k = 'F'
L = []
my_list = []
found_all = False
def get_values(dep_dictionary, prob_dict, start_key):
used_keys = []
keys_to_use = [start_key]
probability = []
# build a list of linked values from deps dictionary
while used_keys != keys_to_use:
print('used: {}'.format(used_keys))
print('to use: {}'.format(keys_to_use))
for i in range(len(keys_to_use)):
if keys_to_use[i] not in used_keys:
new_keys = dep_dictionary[keys_to_use[i]]
if len(new_keys):
for sub_key in new_keys:
if sub_key not in keys_to_use:
keys_to_use.append(sub_key)
used_keys.append(keys_to_use[i])
else:
del keys_to_use[i]
# at this point used_keys = ['F', 'W', 'R', 'S']
for key in used_keys:
probability.append(prob_dict[key])
print(probability)
get_values(deps, prob, k)
Which outputs:
used: []
to use: ['F']
used: ['F']
to use: ['F', 'W']
used: ['F', 'W']
to use: ['F', 'W', 'R', 'S']
used: ['F', 'W', 'R', 'S']
to use: ['F', 'W', 'R', 'S', 'C']
[[0.4, 0.3], [0.01, 0.9, 0.9, 0.99], [0.2, 0.8], [0.5, 0.1]]
Where you can see the output is correct ([[0.4, 0.3], [0.01, 0.9, 0.9, 0.99], [0.2, 0.8], [0.5, 0.1]]), however it is not in the exact same order, but it doesn't sound like that should be a huge issue. If it is, you can always re-splice it into a dictionary by adjusting the
for key in used_keys:
probability.append(prob_dict[key])
bit such that probability is a dictionary also. You can also take the print() statements out, they were just there to debug and show visually what is going on within the loop. You also would probably have the function return probability instead of printing it, but I'll leave that to your discretion!
Here is a solution that uses a stack-based depth-first search to traverse the dependency tree. It adds probabilities at each step iff. the node has dependencies, and then simply reverses the list at the end.
def prob_list(root):
nodes_to_visit = [root]
prob_list = []
while nodes_to_visit:
curr = nodes_to_visit.pop()
print(f"Visiting {curr}")
if deps[curr]:
prob_list.append(prob[curr])
for dep in deps[curr]:
nodes_to_visit.append(dep)
return list(reversed(prob_list))
print(prob_list("F")) # [[0.2, 0.8], [0.5, 0.1], [0.01, 0.9, 0.9, 0.99], [0.4, 0.3]]

how to average between values in two files?

I have a two file matrices, that look like this
File1:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10].....'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}
File2:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,
0.26, 0.11].....'key100',g,l,i,o,+: [0.2, 0.0, 0.23, 0.16, 0.21]}
Both files have the same 'keys'. I want to average the values between the two files, so the result file looks like this:
Desired output file:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.94, 0.04, 0.01],'key2',g,l,i,o,+: [0.05, 0.15, 0.925,
0.26, 0.105].....'key100',g,l,i,o,+: [0.15, 0.1, 0.29, 0.175, 0.205]}
I have thought about the python script I could write, but since I am quite new to this, any quick ideas would be welcome:
import gzip
import numpy as np
inFile1 = gzip.open('/home/file1')
inFile2 = gzip.open('/home/file2')
inFile.next()
for line in inFile:
cols = line.strip().split('\t')
data = cols[6:]
for line in inFile2:
cols = line.strip().split('\t')
data2 = cols[6:]
newdata = (data + data2)/2
You could use regex to replace the strings and make it JSON compatible. Then you can easily convert it into a dict and then just use normal python to analyse the data (compare the dicts):
import re
import json
s = '''{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10],'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}'''
s2 = re.sub('\'(key\d+)\',g,l,i,o,\+', r'"\1"', s)
print(s2)
d = json.loads(s2)
print(d)
Problem is your data format , as Wodin commented :
what is this format? It looks a bit like a Python dict, but the
,g,l,i,o,+ doesn't make sense for a dict.
I tried with your data , You can take hint , help from this code:
I tried with
File1.txt
{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,0.26, 0.10]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}
File2.txt:
{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,0.26, 0.11]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}
Code:
import re
pattern=r"('key\w+',g,l,i,o,\+):\s(\[.+?\])"
with open('File1.txt','r') as f:
for line in f:
average = {}
pr=re.finditer(pattern,line)
for find in pr:
with open('File2','r') as ff:
for line in ff:
for find1 in re.finditer(pattern,line):
if find.group(1)==find1.group(1):
average_part=list(map(lambda x: sum(x) / len(x), list(zip(eval(find.group(2)),eval(find1.group(2))))))
rest_part=find.group(1)
average[rest_part]=average_part
print(average)
output:
{"'key2',g,l,i,o,+": [0.05, 0.15000000000000002, 0.925, 0.26, 0.10500000000000001], "'key1',g,l,i,o,+": [0.0, 0.0, 0.94, 0.04, 0.01]}
{"'key3',g,l,i,o,+": [0.0, 0.0, 0.98, 0.02, 0.01], "'key4',g,l,i,o,+": [0.1, 0.2, 0.9, 0.268, 0.1]}

Categories