Matplotlib,how to represent array as image? - python

This is what I have tried so far
import itertools
import numpy as np
import matplotlib.pyplot as plt
with open('base.txt','r') as f:
vst = map(int, itertools.imap(float, f))
v1=vst[::3]
print type(v1)
a=np.asarray(v1)
print len(a)
a11=a.reshape(50,100)
plt.imshow(a11, cmap='hot')
plt.colorbar()
plt.show()
I have (50,100) array and each element has numerical value(range 1200-5400).I would like to have image that would represent array.But I got this
What should I change to get proper image?

I don't have data from base.txt.
However, in order to simulate your problem, I created random numbers between 1500 to 5500 and created a 50 x 100 numpy array , which I believe is close to your data and requirement.
Then I simply plotted the data as per your plot code.
I am getting true representation of the array.
See if this helps.
Demo Code
#import itertools
import numpy as np
from numpy import array
import matplotlib.pyplot as plt
import random
#Generate a list of 5000 int between 1200,5500
M = 5000
myList = [random.randrange(1200,5500) for i in xrange(0,M)]
#Convert to 50 x 100 list
n = 50
newList = [myList[i:i+n] for i in range(0, len(myList), n)]
#Convert to 50 x 100 numpy array
nArray = array(newList)
print nArray
a11=nArray.reshape(50,100)
plt.imshow(a11, cmap='hot')
plt.colorbar()
plt.show()
Plot

Related

How to use for loop for a model

I understand that I have to put this all into a function and then call the function from a for loop ten times but I'm not sure how. Any help would be deeply appreciated.
import random
import matplotlib.pyplot as plt
import statistics as stats
plt.hist(list1, bins=100, alpha = 0.5)
array1 = np.array(list1)
array2 = np.array(list2)
array3 = np.array(list3)
# Run the t-test using scipy library
scipy.stats.ttest_ind(array1,array2)
Use range (for x in range(0,10)):
import random
import matplotlib.pyplot as plt
import statistics as stats
import numpy as np
# Library for scientific statistics
import scipy.stats
for x in range(0,10):
print(x)
# Create two lists of random numbers that follow a normal ("Gaussian") distribution
# Start with an empty list named "list1"
list1 = []
# Loop that runs 30 times - starts at 1, goes to 30
for x in range(1,30):
# Random numbers drawn from pool that has mean of 12 and standard deviation of 5
value1 = random.gauss(12,5)
# Add random value to the first list, list1
list1.append(value1)
print(list1)
# Do the same with a second list
list2 = []
for x in range(1,30):
# Random numbers drawn from pool that has mean of 14 and standard deviation of 4
value2 = random.gauss(14,4)
list2.append(value2)
print(list2)
# Create a histogram of the two lists using matplotlib library
plt.hist(list1, bins=50, alpha = 0.5)
plt.hist(list2, bins=50, alpha = 0.5)
# Run a t-test on the two sets of data
array1 = np.array(list1)
array2 = np.array(list2)
# Run the t-test using scipy library
scipy.stats.ttest_ind(array1,array2)

Hough Transform on arrays of coordinates(Stock prices)

I want to apply Hough Transform on stock prices (array of numbers).
I read OpenCV and scikit-image docs and examples ,but got nothing how to apply the transformation to the arrays of numbers instead of images.
I created 2D array from data. First dimension is X(simply index of data) and second dimension is close prices.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pywt as wt
from skimage.transform import (hough_line, hough_line_peaks,probabilistic_hough_line)
from matplotlib import cm
path = "22-31May-100Tick.csv"
df = pd.read_csv(path)
y = df.Close.values
x = np.arange(0,len(y),1)
data = []
for i in x:
a = [i,y[i]]
data.append(a)
data = np.array(data)
How is it possible to apply the transformation with OpenCV or sickit-image?
Thank you

Audio waveform matching

I am matching two waveform of 400 ms. I am using correlate to check the shift.
cc = correlate(b1,b2,mode="same")
n=len(cc)
cc=2*cc/n
dur=n*dt1/2;
d=linspace( -dur, dur, n )
idx = argmax(cc)
I am getting the shift between 2 waveform. But how to get the actual match position of two waveform?
you probably want mode = "full" and need to do some more math to pick the correlation peak and adjust for the sequence length padding
hopefully this example will help show the issues:
import math
import numpy as np
import matplotlib.pyplot as plt
a = [math.sin(i* math.pi/10) for i in range(300)]
b = [math.cos(i*math.pi/10) for i in range(300)]
plt.plot(a, 'red')
plt.plot(b, 'green')
axb= np.correlate(a,b, mode="full")/100.0
x = range(len(axb))
plt.plot(x, axb)

Python: faster way of counting occurences in numpy arrays (large dataset)

I am new to Python. I have a numpy.array which size is 66049x1 (66049 rows and 1 column). The values are sorted smallest to largest and are of float type, with some of them being repeated.
I need to determine the frequency of occurrences of each value (the number of times a given value is equalled but not surpassed, e.g. X<=x in statistical terms), in order to later plot the Sample Cumulative Distribution Function.
The code I am currently using is as follows, but it is extremely slow, as it has to loop 66049x66049=4362470401 times. Is there any way to augment the speed of such piece of code? Will perhaps the use of dictionaries help in any way? Unfortunately I cannot change the size of the arrays I am working with.
+++Function header+++
...
...
directoryPath=raw_input('Directory path for native csv file: ')
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2]
x1=numpy.delete(x, 0, 0)
x2=numpy.zeros((x1.shape[0]))
x2=sorted(x1)
x3=numpy.around(x2, decimals=3)
count=numpy.zeros(len(x3))
#Iterates over the x3 array to find the number of occurrences of each value
for i in range(len(x3)):
temp=x3[i]
for j in range(len(x3)):
if (temp<=x3[j]):
count[j]=count[j]+1
#Creates a 2D array with (value, occurrences)
x4=numpy.zeros((len(x3), 2))
for i in range(len(x3)):
x4[i,0]=x3[i]
x4[i,1]=numpy.around((count[i]/x1.shape[0]),decimals=3)
...
...
+++Function continues+++
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
arr = np.random.randint(0, 100, (100000,1))
df = pd.DataFrame(arr)
cnt = Counter(df[0])
df_p = pd.DataFrame(cnt, index=['data'])
df_p.T.plot(kind='hist')
plt.show()
That whole script took a very short period to execute (~2s) for (100,000x1) array. I didn't time, but if you provide the time it took to do yours we can compare.
I used [Counter][2] from collections to count the number of occurrences, my experiences with it have always been great (timewise). I converted it into DataFrame to plot and used T to transpose.
Your data does replicate a bit, but you can try and refine it some more. As it is, it's pretty fast.
Edit
Create CDF using cumsum()
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
arr = np.random.randint(0, 100, (100000,1))
df = pd.DataFrame(arr)
cnt = Counter(df[0])
df_p = pd.DataFrame(cnt, index=['data']).T
df_p['cumu'] = df_p['data'].cumsum()
df_p['cumu'].plot(kind='line')
plt.show()
Edit 2
For scatter() plot you must specify the (x,y) explicitly. Also, calling df_p['cumu'] will result in a Series, not a DataFrame.
To properly display a scatter plot you'll need the following:
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
arr = np.random.randint(0, 100, (100000,1))
df = pd.DataFrame(arr)
cnt = Counter(df[0])
df_p = pd.DataFrame(cnt, index=['data']).T
df_p['cumu'] = df_p['data'].cumsum()
df_p.plot(kind='scatter', x='data', y='cumu')
plt.show()
You should use np.where and then count the length of the obtained vector of indices:
indices = np.where(x3 <= value)
count = len(indices[0])
If efficiency counts, you can use the numpy function bincount, which need integers :
import numpy as np
a=np.random.rand(66049).reshape((66049,1)).round(3)
z=np.bincount(np.int32(1000*a[:,0]))
it takes about 1ms.
Regards.
# for counting a single value
mask = (my_np_array == value_to_count).astype('uint8')
# or a condition
mask = (my_np_array <= max_value).astype('uint8')
count = np.sum(mask)

convert list of values .txt to vector and plot

I have two txt files. The datas in the first one are :
0
0.1
0.5
0.3
and in the second one are:
20
32
35
39
So what I wanna do is:
1º read both text files 2º save the differents values in a vector. 3º plot
In this moment I have done the following code:
fichero = open('signal1t.txt','r')
listx = []
for linea in fichero:
listx.append(linea.strip() )
fichero = open('signal2.txt','r')
listy = []
for linea in fichero:
listay.append(linea.strip() )
But the problem is that it doesn't run very well. In fact it doesn't save numbers ...
Is there anybody that can help me?
a simpler solution is tu use numpy :
import numpy as np
listx=np.loadtxt('signal1t.txt')
listy=np.loadtxt('signal2t.txt')
Then you just have to plot using matplotlib:
import matplotlib.pyplot as plt
plt.plot(listx,listy)
plt.show()
You have to typecast the read string to float:
listx.append(float(linea.strip()))
What you wish to do is very simple with numpy and matplotlib:
import numpy as np
import matplotlib.pyplot as plt
listx = np.genfromtxt('signal1.txt')
listy = np.genfromtxt('signal2.txt')
plt.plot(listx, listy, 'x')
plt.show()

Categories