Subtract elements from 2 arrays found in 2 different .csv files - python

I have two csv files with 1 row of data each and multiple columns
csv1: 0.1924321564, 0.8937481241, 0.6080270062, ........
csv2: 0.1800000000, 0.7397439374, 0.3949274792, ........
I want to subtract the first value in csv1 from the first value in csv2:
e.g 0.1924321564 - 0.1800000000 = 0.0124321564
0.8937481241 - 0.7397439374 = 0.15400418706
and continue this for the remaining columns.
I then want to take the results of the subtraction of each column and sum them together into one value e.g sum(0.0124321564 + 0.15400418706 + n)
I am very new to python so this is the code I started with:
import numpy as np
import csv
array1 = np.array('1.csv')
array2 = np.array('2.csv')
array3 = np.subtract(array1, array2)
total = np.sum(array3)

genfromtxt
note: the delimeter is comma followed by a space because that is what you showed. Please change accordingly.
import numpy as np
array1 = np.genfromtxt('1.csv', delimiter=', ')
array2 = np.genfromtxt('2.csv', delimiter=', ')
(array1 - array2).sum()
0.37953587010000012

Related

How to merge three arrays?

My dataset consists of three columns, I required to merge the data into one column. For example, if 1, 2, and 3 are the first entries of each column the merged column should be 123. I attempted to solve this problem with concatenate command but it is not useful. Here is my script:
tr = pd.read_csv("YMD.txt", sep='\t',header=None)
Y = tr[0]
M = tr[1]
D = tr[2]
np.concatenate((Y, M, D))
You don't need Pandas or Numpy to read a tab-delimited file and merge the first 3 columns into a new list:
ymd = []
with open("YMD.txt") as f:
for row in f:
row = row.strip().split("\t")
ymd.append("".join(row[:3]))
print(ymd)

Machine Learning Sololearn's Coach Problem about Coulmn data etc

Problem:
Machine Learning - What's in a Column?
Getting a column from a numpy array.
Task -
Given a csv file and a column name, print the elements in the given column.
Input Format -
First line: filename of a csv file ;
Second line: column name
Output Format -
Numpy array
Sample Input -
'usercode/files/one.csv' (filename) ;
'a' (colmn name)
File 'one.csv' contents:
a,b
1,3
2,4
Sample Output -
[1 2]
----------
My Answer :
import pandas as p
df = p.read_csv('usercode/files/one.csv')
details = df[['a', 'b']].values
print(details[:,1])
But,
I think it needs output [1,2] and [3,4] some how, that it satisfy both Case 1 and Case 2 at the same time. My code can't do so. If I satisfy Case 1, Case 2 isn't satisfied and vice-versa
import pandas as pd
filename = input()
column_name = input()
df = pd.read_csv(filename)
arr = df[column_name].values
print(arr)
filename and column_name are arguments that user will input and just write those arguments inside the function you can get the answer.

starting at index 1 of row when using writerow at python

i have a script to create matrix of size n and write it to csv file.
i want the matrix to have "boarders" at size of n.
my code:
a = []
firstRow = []
for i in range(n):
row = []
row.append(i+1)
firstRow.append(i+1)
for j in range(n):
row.append(random.randint(x,y))
a.append(row)
writer.writerow(firstRow)
writer.writerows(a)
output when using n = 3
1,2,3
1,74,82,68
2,87,70,72
3,68,71,74
i need the output to be like this:
1, 2, 3
1,74,82,68
2,87,70,72
3,68,71,74
with blank box at the csv index 0,0.
also i need the all matrix to start at row 1 instead of 0
Using pandas we can get the following valid csv with a few lines of easy-to-understand code:
,1,2,3
1,91,66,70
2,82,24,79
3,57,56,73
Example code used:
import pandas as pd
import numpy as np
# Create random numbers 0-99, 3x3
data = np.random.randint(0,100, size=(3,3))
df = pd.DataFrame(data)
# Add 1 to index and columns
df.columns = df.columns + 1
df.index = df.index + 1
#df.to_csv('output.csv') # Uncomment this row to write to file.
print(df.to_csv())
And if you insist that you want to remove the leadning ,:
with open('output.csv', 'w') as f:
f.write(df.to_csv()[1:])

Pandas ValueError: Shape of passed values

In the following code I iterate through a list of images and count the frequencies of a given number, in this case zeros and ones. I then write this out to a csv. This works fine when I write out the list of frequencies only, but when I try to add the filename then I get the error:
ValueError: Shape of passed values is (1, 2), indices imply (2, 2)
When I try to write out one list of frequencies (number of ones) and the filenames it works fine.
My code is as follows:
import os
from osgeo import gdal
import pandas as pd
import numpy as np
# Input directory to the .kea files
InDir = "inDirectory"
# Make a list of the files
files = [file for file in os.listdir(InDir) if file.endswith('.kea')]
# Create empty list to store the counts
ZeroValues = []
OneValues = []
# Iterate through each kea file and open it
for file in files:
print('opening ' + file)
# Open file
ds = gdal.Open(os.path.join(InDir, file))
# Specify the image band
band = ds.GetRasterBand(1)
# Read the pixel values as an array
arr = band.ReadAsArray()
# remove values that are not equal (!=) to 0 (no data)
ZeroPixels = arr[arr==0]
OnePixels = arr[arr==1]
print('Number of 0 pixels = ' + str(len(ZeroPixels)))
print('Number of 1 pixels = ' + str(len(OnePixels)))
# Count the number of values in the array (length) and add to the list
ZeroValues.append(len(ZeroPixels))
OneValues.append(len(OnePixels))
# Close file
ds = Non
# Pandas datagram and out to csv
out = pd.DataFrame(ZeroValues, OneValues, files)
# Write the pandas dataframe to a csv
out.to_csv("out.csv", header=False, index=files)
Pandas thinks you're trying to pass OneValues and files as positional index and columns arguments. See docs.
Try wrapping your fields in a dict:
import pandas as pd
ZeroValues = [2,3,4]
OneValues = [5,6,7]
files = ["A.kea","B.kea","C.kea"]
df = pd.DataFrame(dict(zero_vals=ZeroValues, one_vals=OneValues, fname=files))
Output:
fname one_vals zero_vals
0 A.kea 5 2
1 B.kea 6 3
2 C.kea 7 4

Finding same values in a row of csv in python

I have a code that looks for numbers within a csv file that are within 1.0 decimal places of each other in the same row. Although, when I run it, it prints everything. Not just rows that have the condition that I want i.e. that the values from the 2nd and 3rd column be within 1.0 of each other. I want to run the code and have it display, the first column (the time at which it was recorded or better yet the column number), the 2nd and the 3rd column because they should be within 1.0 of each other. This is what the data file looks like:
Time Chan1 Chan2
04:07.0 52.31515503 16.49450684
04:07.1 23.55230713 62.48802185
04:08.0 46.06217957 24.94955444
04:08.0 41.72077942 31.32516479
04:08.0 19.80723572 25.73182678
Here's my code:
import numpy as np
from matplotlib import *
from pylab import *
filename = raw_input("Enter file name: ") + '.csv'
filepath = '/home/david/Desktop/' + filename
data = np.genfromtxt(filepath, delimiter=',', dtype=float)
first=[row[0] for row in data]
rownum1=[row[1] for row in data]
rownum2=[row[2] for row in data]
for row in data:
if ((abs(row[1]-row[2]) <= 1.0)):
print("The values in row 0 are 1 and 2, are within 1.0 of each other.", first, rownum1, rownum2)
This is my output:
26.3460998535, 44.587371826199998, 42.610519409200002, 24.7272491455, 89.397918701199998, 25.479614257800002, 30.991180419900001, 25.676086425800001
But I want this as an output:
4:09.0, 23.456, 22.5
You can do that like this:
data = np.genfromtxt(filepath, names=True, dtype=None)
idx = np.abs(data['Chan1'] - data['Chan2'])<1
print data[idx]

Categories