In the following code I iterate through a list of images and count the frequencies of a given number, in this case zeros and ones. I then write this out to a csv. This works fine when I write out the list of frequencies only, but when I try to add the filename then I get the error:
ValueError: Shape of passed values is (1, 2), indices imply (2, 2)
When I try to write out one list of frequencies (number of ones) and the filenames it works fine.
My code is as follows:
import os
from osgeo import gdal
import pandas as pd
import numpy as np
# Input directory to the .kea files
InDir = "inDirectory"
# Make a list of the files
files = [file for file in os.listdir(InDir) if file.endswith('.kea')]
# Create empty list to store the counts
ZeroValues = []
OneValues = []
# Iterate through each kea file and open it
for file in files:
print('opening ' + file)
# Open file
ds = gdal.Open(os.path.join(InDir, file))
# Specify the image band
band = ds.GetRasterBand(1)
# Read the pixel values as an array
arr = band.ReadAsArray()
# remove values that are not equal (!=) to 0 (no data)
ZeroPixels = arr[arr==0]
OnePixels = arr[arr==1]
print('Number of 0 pixels = ' + str(len(ZeroPixels)))
print('Number of 1 pixels = ' + str(len(OnePixels)))
# Count the number of values in the array (length) and add to the list
ZeroValues.append(len(ZeroPixels))
OneValues.append(len(OnePixels))
# Close file
ds = Non
# Pandas datagram and out to csv
out = pd.DataFrame(ZeroValues, OneValues, files)
# Write the pandas dataframe to a csv
out.to_csv("out.csv", header=False, index=files)
Pandas thinks you're trying to pass OneValues and files as positional index and columns arguments. See docs.
Try wrapping your fields in a dict:
import pandas as pd
ZeroValues = [2,3,4]
OneValues = [5,6,7]
files = ["A.kea","B.kea","C.kea"]
df = pd.DataFrame(dict(zero_vals=ZeroValues, one_vals=OneValues, fname=files))
Output:
fname one_vals zero_vals
0 A.kea 5 2
1 B.kea 6 3
2 C.kea 7 4
Related
i've done got my outputs for the csv file, but i dont know how to write it into csv file because output result is numpy array
def find_mode(np_array) :
vals,counts = np.unique(np_array, return_counts=True)
index = np.argmax(counts)
return(vals[index])
folder = ("C:/Users/ROG FLOW/Desktop/Untuk SIDANG TA/Sudah Aman/testbikincsv/folderdatacitra/*.jpg")
for file in glob.glob(folder):
a = cv2.imread(file)
rows = a.shape[0]
cols = a.shape[1]
middlex = cols/2
middley = rows/2
middle = [middlex,middley]
titikawalx = middlex - 10
titikawaly = middley - 10
titikakhirx = middlex + 10
titikakhiry = middley + 10
crop = a[int(titikawaly):int(titikakhiry), int(titikawalx):int(titikakhirx)]
c = cv2.cvtColor(crop, cv2.COLOR_BGR2HSV)
H,S,V = cv2.split(c)
hsv_split = np.concatenate((H,S,V),axis=1)
Modus_citra = (find_mode(H)) #how to put this in csv
my outputs is modus citra which is array np.uint8, im trying to put it on csv file but im still confused how to write it into csv because the result in loop.
can someone help me how to write it into csv file ? i appreciate every help
Run your loop, and put the data into lists
eg. mydata = [result1,result2,result3]
Then use csv.writerows(mydata) to write your list into csv rows
https://docs.python.org/3/library/csv.html#csv.csvwriter.writerows
You can save your NumPy arrays to CSV filesĀ using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma. For example:
import numpy as np
my_array = np.array([1,2,3,4,5,6,7,8,9,10])
my_file = np.savetxt('randomtext.csv', my_array, delimiter = ',', fmt = '%d')
print(my_file)
So, I am quite new to python and have been googling a lot but have not found a good solution. What I am looking to do is automate text to columns using python in an excel document without headers.
Here is the excel sheet I have
it is a CSV file where all the data is in one column without headers
ex. hi ho loe time jobs barber
jim joan hello
009 00487 08234 0240 2.0348 20.34829
delimeter is space and comma
What I want to come out is saved in another excel with the first two rows deleted and seperated into columns
( this can be done using text to column in excel but i would like to automate this for several excel sheets)
009 | 00487 | 08234 | 0240 | 2.0348 | 20.34829
the code i have written so far is like this:
import pandas as pd
import csv
path = 'C:/Users/ionan/OneDrive - Universiteit Utrecht/Desktop/UCU/test_excel'
os.chdir(path)
for root, dirs, files in os.walk(path):
for f in files:
df = pd.read_csv(f, delimiter='\t' + ';', engine = 'python')
Original file with name as data.xlsx:
This means all the data we need is under the column Data.
Code to split data into multiple columns for a single file:
import pandas as pd
import numpy as np
f = 'data.xlsx'
# -- Insert the following code in your `for f in files` loop --
file_data = pd.read_excel(f)
# Since number of values to be split is not known, set the value of `num_cols` to
# number of columns you expect in the modified excel file
num_cols = 20
# Create a dataframe with twenty columns
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
# Change the column name of the first column in new_file to "Data"
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
# Add the value of the first cell in the original file to the first cell of the
# new excel file
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]
# Loop through all rows of original excel file
for index, row in file_data.iterrows():
# Skip the first row
if index == 0:
continue
# Split the row by `space`. This gives us a list of strings.
split_data = file_data.loc[index, "Data"].split(" ")
print(split_data)
# Convert each element to a float (a number) if we want numbers and not strings
# split_data = [float(i) for i in split_data]
# Make sure the size of the list matches to the number of columns in the `new_file`
# np.NaN represents no value.
split_data = [np.NaN] + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
# Store the list at a given index using `.loc` method
new_file.loc[index] = split_data
# Drop all the columns where there is not a single number
new_file.dropna(axis=1, how='all', inplace=True)
# Get the original excel file name
new_file_name = f.split(".")[0]
# Save the new excel file at the same location where the original file is.
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)
This creates a new excel file (with a single sheet) of name data_modified.xlsx:
Summary (code without comments):
import pandas as pd
import numpy as np
f = 'data.xlsx'
file_data = pd.read_excel(f)
num_cols = 20
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]
for index, row in file_data.iterrows():
if index == 0:
continue
split_data = file_data.loc[index, "Data"].split(" ")
split_data = [np.NaN] + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
new_file.loc[index] = split_data
new_file.dropna(axis=1, how='all', inplace=True)
new_file_name = f.split(".")[0]
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)
I have a number of csv files. I need to extract all respective rows from each file and save it as a new file.
i.e. first output file must contain first rows of all input files and so on.
I have done the following.
import pandas as pd
import os
import numpy as np
data = pd.DataFrame('', columns =['ObjectID', 'SPI'], index = np.arange(1,100))
path = r'C:\Users\bikra\Desktop\Pandas'
i = 1
for files in os.listdir(path):
if files[-4:] == '.csv':
for j in range(0,10, 1):
#print(files)
dataset = pd.read_csv(r'C:\Users\bikra\Desktop\Pandas'+'\\'+files)
spi1 = dataset.loc[j,'SPI']
data.loc[i]['ObjectID'] = files[:]
data.loc[i]['SPI'] = spi1
data.to_csv(r'C:\Users\bikra\Desktop\Pandas\output\\'+str(j)+'.csv')
i + 1
It works well when index (i.e. 'j' ) is specified. But when I tried to loop, the output csv file contains only first row. Where am I wrong?
You better use append:
data = data.append(spi1)
I have a large csv file which I have split into six individual files. I am using a 'for loop' to read each file and create a column
in which the values ascend by one.
whole_file=['100Hz1-raw.csv','100Hz2-raw.csv','100Hz3-raw.csv','100Hz4-raw.csv','100Hz5-raw.csv','100Hz6-raw.csv']
first_file=True
for piece in whole_file:
if not first_file:
skip_row = [0] # if it is not the first csv file then skip the header row (row 0) of that file
else:
skip_row = []
V_raw = pd.read_csv(piece)
V_raw['centiseconds'] = np.arange(len(V_raw)) #label each centisecond
My output:
My desired output
Is there a clever way of doing what I intend.
Store the last value for centiseconds and count from there:
whole_file=['100Hz1-raw.csv','100Hz2-raw.csv','100Hz3-raw.csv','100Hz4-raw.csv','100Hz5-raw.csv','100Hz6-raw.csv']
first_file=True
## create old_centiseconds variable
old_centiseconds = 0
for piece in whole_file:
if not first_file:
skip_row = [0] # if it is not the first csv file then skip the header row (row 0) of that file
else:
skip_row = []
V_raw = pd.read_csv(piece)
# add old_centiseconds onto what you had before
V_raw['centiseconds'] = np.arange(len(V_raw)) + old_centiseconds #label each centisecond
# update old_centiseconds
old_centiseconds += len(V_raw)
As I said in my comment you may want to view the data as a numpy array as this requires less memory. You can this by opening the .csv files as numpy array and then append to an empty list. If you would like to append these numpy arrays together you can .vstack. The following code should be able to do this:
from numpy import genfromtxt
whole_file=['100Hz1-raw.csv','100Hz2-raw.csv','100Hz3-raw.csv','100Hz4-raw.csv','100Hz5-raw.csv','100Hz6-raw.csv']
whole_file_numpy_array = []
for file_name in whole_file:
my_data = genfromtxt(file_name, delimiter=',')
whole_file_numpy_array.append(file_name)
combined_numpy_array = np.vstack(whole_file_numpy_array)
I have two csv files with 1 row of data each and multiple columns
csv1: 0.1924321564, 0.8937481241, 0.6080270062, ........
csv2: 0.1800000000, 0.7397439374, 0.3949274792, ........
I want to subtract the first value in csv1 from the first value in csv2:
e.g 0.1924321564 - 0.1800000000 = 0.0124321564
0.8937481241 - 0.7397439374 = 0.15400418706
and continue this for the remaining columns.
I then want to take the results of the subtraction of each column and sum them together into one value e.g sum(0.0124321564 + 0.15400418706 + n)
I am very new to python so this is the code I started with:
import numpy as np
import csv
array1 = np.array('1.csv')
array2 = np.array('2.csv')
array3 = np.subtract(array1, array2)
total = np.sum(array3)
genfromtxt
note: the delimeter is comma followed by a space because that is what you showed. Please change accordingly.
import numpy as np
array1 = np.genfromtxt('1.csv', delimiter=', ')
array2 = np.genfromtxt('2.csv', delimiter=', ')
(array1 - array2).sum()
0.37953587010000012