I guess it is a simple question, I am doing simple while iteration and want to save data within data array so I can simple plot it.
tr = 25 #sec
fr = 50 #Hz
dt = 0.002 #2ms
df = fr*(dt/tr)
i=0;
f = 0
data = 0
while(f<50):
i=i+1
f = ramp(fr,f,df)
data[i] = f
plot(data)
How to correctly define data array? How to save results in array?
One possibility:
data = []
while(f<50):
f = ramp(fr,f,df)
data.append(f)
Here, i is no longer needed.
you could initialize a list like this:
data=[]
then you could add data like this:
data.append(f)
For plotting matplotlib is a good choice and easy to install and use.
import pylab
pylab.plot(data)
pylab.show()
He needs "i" b/c it starts from 1 in the collection. For your code to work use:
data = {} # this is dictionary and not list
Related
I have a document with the following structure:
CUSTOMERID1
conversation-id-123
conversation-id-123
conversation-id-123
CUSTOMERID2
conversation-id-456
conversation-id-789
I'd like to parse the document to get a frequency distribution plot with the number of conversations on the X axis and the # of customers on the Y axis. Does anyone know the easiest way to do this with Python?
I'm familiar with the frequency distribution plot piece but am struggling with how to parse the data into the right data structure to build the plot. Thank you for any help you can provide ahead of time!
You can try the following:
>>> dict_ = {}
>>> with open('file.csv') as f:
for line in f:
if line.startswith('CUSTOMERID'):
dict_[line.strip('\n')] = list_ = []
else:
list_.append(line.strip().split('-'))
>>> df = pd.DataFrame.from_dict(dict_, orient='index').stack()
>>> df.transform(lambda x:x[-1]).groupby(level=0).count().plot(kind='bar')
Output:
If you want only 1 and 2 in X axis, just change dict_[line.strip('\n')] = list_ = [] this line to dict_[line.strip('CUSTOMERID/\n')] = list_ = [].
I was trying to get data(a list) from a file and assign this list to my python script list.
I want to know how to do it without having to assign all varibles manually
Variables = [MPDev,WDev,DDev,LDev,PDev,MPAll,WAll,DAll,LAll,PAll,MPBlit,WBlit,DBlit,LBlit,PBlit,MPCour,WCour,DCour,LCour,PCour]
dataupdate = open("griddata.txt","r")
datalist = dataupdate.read()
#Inside the file is written:
#['0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0',']
var = 0
for e in Variables:
e = datalist[var]
var += 1
I got it working anyways but i would like to know a faster way to improve my skills. Thanks
Get used to using data as a pandas dataframe. It's easy to read, easy to write.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
import pandas as pd
data = pd.read_csv("griddata.txt", names = ['MPDev',
'WDev',
'DDev',
'LDev',
'PDev',
'MPAll',
'WAll',
'DAll',
'LAll',
'PAll',
'MPBlit',
'WBlit',
'DBlit',
'LBlit',
'PBlit',
'MPCour',
'WCour',
'DCour',
'LCour',
'PCour']
)
import ast
Variables = [MPDev,WDev,DDev,LDev,PDev,MPAll,WAll,DAll,LAll,PAll,MPBlit,WBlit,DBlit,LBlit,PBlit,MPCour,WCour,DCour,LCour,PCour]
dataupdate = open("tmp.txt","r")
datalist = ast.literal_eval(dataupdate.read())
#Inside the file is written:
#['0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0',']
for i in Variables:
i=datalist[Variables.index(i)]
Another alternative is using dictionary.
mydict={}
var = 0
for e in Variables:
mydict[e]=datalist[var]
var += 1
I have a large timeseries(pandas dataframe) of windspeed (10min average) which contains error data (dead sensor). How can it be flagged automatically. I was trying with moving average.
Some other approach other then moving average is much appreciated. I have attached the sample data image below.
There are several ways to deal with this problem. I will first pass to differences:
%matplotlib inline
import pandas as pd
import numpy as np
np.random.seed(0)
n = 200
y = np.cumsum(np.random.randn(n))
y[100:120] = 2
y[150:160] = 0
ts = pd.Series(y)
ts.diff().plot();
The next step is to find how long are the strikes of consecutive zeros.
def getZeroStrikeLen(x):
""" Accept a boolean array only
"""
res = np.diff(np.where(np.concatenate(([x[0]],
x[:-1] != x[1:],
[True])))[0])[::2]
return res
vec = ts.diff().values == 0
out = getZeroStrikeLen(vec)
Now if len(out)>0 you can conclude that there is a problem. If you want to go one step further you can have a look to this. It is in R but it's not that hard to replicate in Python.
I am using the scipy stats module to calculate the linear regression. ie
slope, intercept, r_value, p_value, std_err
= stats.linregress(data['cov_0.0075']['num'],data['cov_0.0075']['com'])
where data is a dictionary containing several 'cov_x' keys corresponding to a dataframe with columns 'num' and 'com'
I want to be able to loop through this dictionary and do linear regression on each 'cov_x'. I am not sure how to do this. I tried:
for i in data:
slope_+str(i), intercept+str(i), r_value+str(i),p_value+str(i),std_err+str(i)= stats.linregress(data[i]['num'],data[i]['com'])
Essentially I want len(x) slope_x values.
You could use a list comprehension to collect all the stats.linregress return values:
result = [stats.linregress(df['num'],df['com']) for key, df in data.items()]
result is a list of 5-tuples. To collect all the first, second, third, etc... elements from each 5-tuple into separate lists, use zip(*[...]):
slopes, intercepts, r_values, p_values, stderrs = zip(*result)
You should be able to do what you're trying to, but there are a couple of things you should watch out for.
First, you can't add a string to a variable name and store it that way. No plus signs on the left of the equals sign. Ever.
You should be able to accomplish what you're trying to do, however. Just make sure that you use the dict data type if you want string indexing.
import scipy.stats as stats
import pandas as pd
import numpy as np
data = {}
l = ['cov_0.0075','cov_0.005']
for i in l:
x = np.random.random(100)
y = np.random.random(100)+15
d = {'num':x,'com':y}
df = pd.DataFrame(data=d)
data[i] = df
slope = {}
intercept = {}
r_value = {}
p_value = {}
std_error = {}
for i in data:
slope[str(i)], \
intercept[str(i)], \
r_value[str(i)],\
p_value[str(i)], std_error[str(i)]= stats.linregress(data[i]['num'],data[i]['com'])
print(slope,intercept,r_value,p_value,std_error)
should work just fine. Otherwise, you can store individual values and put them in a list later.
I'm working on a method to average data from multiple files and put the results into a single file. Each line of the files looks like:
File #1
Test1,5,2,1,8
Test2,10,4,3,2
...
File #2
Test1,2,4,5,1
Test2,4,6,10,3
...
Here is the code I use to store the data:
totalData = []
for i in range(0, len(files)):
data = []
if ".csv" in files[i]:
infile = open(files[i],"r")
temp = infile.readline()
while temp != "":
data.append([c.strip() for c in temp.split(",")])
temp = infile.readline()
totalData.append(data)
So what I'm left with is totalData looking like the following:
totalData = [[
[Test1,5,2,1,8],
[Test2,10,4,3,2]],
[[Test1,2,4,5,1],
[Test2,4,6,10,3]]]
What I want to average is for all Test1, Test2, etc, average all the first values and then the second values and so forth. So testAverage would look like:
testAverage = [[Test1,3.5,3,3,4.5],
[Test2,7,5,6.5,2.5]]
I'm struggling to think of a concise/efficient way to do this. Any help is greatly appreciated! Also, if there are better ways to manage this type of data, please let me know.
It just need two loops
totalData = [ [['Test1',5,2,1,8],['Test2',10,4,3,2]],
[['Test1',2,4,5,1],['Test2',4,6,10,3]] ]
for t in range(len(totalData[0])): #tests
result = [totalData[0][t][0],]
for i in range(1,len(totalData[0][0])): #numbers
sum = 0.0
for j in range(len(totalData)):
sum += totalData[j][t][i]
sum /= len(totalData)
result.append(sum)
print result
first flatten it out
results = itertools.chain.from_iterable(totalData)
then sort it
results.sort()
then use groupby
data = {}
for key,values in itertools.groupby(results,lambda x:x[0]):
columns = zip(*values)
data[key] = [sum(c)*1.0/len(c) for c in columns]
and finally just print your data
If your data structure is regular, the best is probably to use numpy. You should be able to install it with pip from the terminal
pip install numpy
Then in python:
import numpy as np
totalData = np.array(totalData)
# remove the last dimension (i.e. 'Test1', 'Test2'), since it's not a number
totalData = np.array(totalData[:, :, 1:], float)
# average
np.mean(totalData, axis=0)