I wanted to read my CSV file first.
https://github.com/hamzaal014/file/blob/main/file.csv
the .csv file contains two columns X and Y
here is my script:
import numpy as np
from pandas import DataFrame as df
import csv
origin_data = open("file.csv", "r")
dato = list(csv.reader(origin_data, delimiter=","))
print(dato)
rowcount = 0
#iterating through the whole file
for row in dato:
rowcount+= 1
#printing the result
#_ print("Number of lines present:-", rowcount)
print(rowcount)
dati = df(dato, columns=['x', 'y'])
window = 6
roll_avg = dati.rolling(window).mean()
roll_avg_cumulative = dati['y'].cumsum()/np.arange(1, 25)
print(roll_avg_cumulative)
but my script is not working ???
Error --------------------------------------------------------------------
Traceback (most recent call last):
File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/ops/array_ops.py", line 163, in _na_arithmetic_op
result = func(left, right)
File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 239, in evaluate
return _evaluate(op, op_str, a, b) # type: ignore[misc]
File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 128, in _evaluate_numexpr
result = _evaluate_standard(op, op_str, a, b)
File "/home/haz/miniconda39/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 69, in _evaluate_standard
return op(a, b)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
When reading from a file you are returned strings. This is the source of your problem since the strings are never converted into numbers. You can fix it by:
dati = df(dato, columns=['x', 'y'], dtype_float)
If it is helpful to you I would also like to poit out a few things that may improve your code:
you are using pandas as your container for data so I would suggest using the pandas functions to convert a CSV file to a DataFrame instead of doing it manually (do it by using pandas.read_csv)
the row count can be easily calculated with the len operator without needing to iterate over all rows
please stick to the more widely used import aliases (import pandas as pd) instead of creating your own. This will help your code be more readable to everyone else
So your code can become:
import numpy as np
import pandas as pd
dati = pd.read_csv("file.csv", sep=",", dtype=float, names=["x", "y"])
rowcount = len(dati)
window = 6
roll_avg = dati.rolling(window).mean()
roll_avg_cumulative = dati["y"].cumsum() / np.arange(1, 25)
print(roll_avg_cumulative)
What went wrong in your code:
All vals are loaded as str.
Simple way
import numpy as np
import pandas as pd
import csv
dati = pd.read_csv('file.csv', header=None)
window = 6
roll_avg = dati.rolling(window).mean()
print(dati[1].cumsum())
roll_avg_cumulative = dati[1].cumsum()/np.arange(1, 25)
print(roll_avg_cumulative)
I am trying to read the data but its first coloumn have data in exp format which is not allowing me to read the file, here is the minimal working example of my code and here is the link datafile for trying out the code
import numpy as np
filename ="0 A.dat"
data = np.loadtxt(filename, delimiter=',', skiprows=3)
but I am getting this error
ValueError: could not convert string to float:
You can read them with pandas:
import pandas as pd
data = pd.read_csv(filename, delimiter=',', skiprows=3)
import numpy as np
def yesfloat(string):
""" True if given string is float else False"""
try:
return float(string)
except ValueError:
return False
data = []
with open('0 A.dat', 'r') as f:
d = f.readlines()
for i in d:
k = i.rstrip().split(",")
data.append([float(i) if yesfloat(i) else i for i in k])
data = np.array(data, dtype='O')
data
I don't know if that is the answer you are looking for but i tried it with you data and it returned this
array([list(['% Version 1.00']), list(['%']),
list(['%freq[Hz]\tTrc1_S21[dB]\tTrc2_S21[U]\tTrc3_S21[U]\tTrc4_S21[U]']),
...,
list([9998199819.981998, -22.89936928953151, 0.07161954135843378, -0.0618770495057106, -0.03606368601322174, '']),
list([9999099909.991, -22.91188769540125, 0.07151639513438152, -0.06464007496833801, -0.03059829212725163, '']),
list([10000000000.0, -22.92596306398167, 0.07140059761720122, -0.0669037401676178, -0.02493862248957157, ''])],
dtype=object)
I am trying to load data in a csv file (with delimiter ',') into a numpy array. Example of a line is: 81905.75578271,81906.6205052,50685.487931,.... (1000 columns).
I have this code but it seems to not be working properly as in the exit of the function the debugger cannot recognize the data, and when I call the xtrain.shape it returns 0:
def load_data(path):
# return np.loadtxt(path,dtype=int,delimiter=',')
file = open(path,'r')
data = []
for line in file:
array_vals = line.split(",")
array = []
for val in array_vals:
if not val:
array.append(float(val))
data.append(np.asarray(array))
return np.asarray(data)
x_train = load_data(path)
This should give you your required output.
import numpy as np
def load_data(path):
return np.loadtxt(path,delimiter=',')
I have this code that reads numbers and is meant to calculate std and %rms using numpy
import numpy as np
import glob
import os
values = []
line_number = 6
road = '/Users/allisondavis/Documents/HCl'
for pbpfile in glob.glob(os.path.join(road, 'pbpfile*')):
lines = open(pbpfile, 'r').readlines()
while line_number < len(lines) :
variables = lines[line_number].split()
values.append(variables)
line_number = line_number + 3
a = np.asarray(values).astype(np.float)
std = np.std(a)
rms = std * 100
print rms
However I keep getting the error code:
Traceback (most recent call last):
File "rmscalc.py", line 17, in <module>
a = np.asarray(values).astype(np.float)
ValueError: setting an array element with a sequence.
Any idea how to fix this? I am new to python/numpy. If I print my values it looks something like this:
[[1,2,3,4],[2,4,5,6],[1,3,5,6]]
I can think of a modification to your code which can potentially fix your problem:
Initialize values as a numpy array, and use numpy append or concatenate:
values = np.array([], dtype=float)
Then inside loop:
values = np.append(values, [variables], axis=0)
# or
variables = np.array(lines[line_number].split(), dtype=float)
values = np.concatenate((values, variables), axis=0)
Alternatively, if you files are .csv (or any other type Pandas can read):
import pandas as pd
# Replace `read_csv` with your appropriate file reader
a = pd.concat([pd.read_csv(pbpfile)
for pbpfile in glob.glob(os.path.join(road, 'pbpfile*'))]).values
# or
a = np.concatenate([pd.read_csv(pbpfile).values
for pbpfile in glob.glob(os.path.join(road, 'pbpfile*'))], axis=0)
Sample csv
time,type,-1,
time,type,0,w
time,type,1,a,12,b,13,c,15,name,apple
time,type,5,r,2,s,43,t,45,u,67,style,blue,font,13
time,type,11,a,12,c,15
time,type,5,r,2,s,43,t,45,u,67,style,green,font,15
time,type,1,a,12,b,13,c,15,name,apple
time,type,11,a,12,c,15
time,type,5,r,2,s,43,t,45,u,67,style,green,font,15
time,type,1,a,12,b,13,c,15,name,apple
time,type,5,r,2,s,43,t,45,u,67,style,yellow,font,9
time,type,19,b,12
type,19,b,42
I would like to filter each of the following "type,1", "type,5", "type,11", "type,19" into a separate pandas frame for further analysis. What's the best way to do it ? [Also, I will be ignoring "type,0" and "type,-1"]
Sample Code
import pandas as pd
type1_header = ['type','a','b','c','name']
type5_header = ['type','r','s','t','u','style','font']
type11_header = ['type','a','c']
type19_header = ['type','b']
type1_data = pd.read_csv(file_path_to_csv, usecols=[2,4,6,8,10] , names=type1_header)
type5_data = pd.read_csv(file_path_to_csv, usecols=[2,4,6,8,10,12,14] , names=type5_header)
import pandas as pd
headers = {1:['a','b','c','name'],
5:['r','s','t','u','style','font'],
}
usecols = {1:[4,6,8,10],
5:[4,6,8,10,12,14],
}
frames = {}
for h in headers:
frames[h] = pd.DataFrame(columns=headers[h])
count = 0
for line in open('irreg.csv'):
row = line.split(',')
count += 1
ID = int(row[2])
row_subset = []
if ID in frames:
for col in usecols[ID]: row_subset.append(row[col])
frames[ID].loc[len(frames[ID])] = row_subset
else:
print('WARNING: line %d: type %s not found'%(count, row[2]))
Although, that done, how often do you do this and how often does the data change? For a one-off it's probably easiest to split up the incoming csv file, e.g. by
grep type,19 irreg.csv > 19.csv
at the commandline, and then import each csv according to its headers and usecols.