Name Gender Physics Maths
A 45 55
X 22 64
C 0 86
I have a csv file like this, I have made some modification to get list with only the marks in the form [[45,55],[22,64]]
I want to find the minimum for each subject.
But when I run my code, I only get the minimum for the first subject and the other values are copied from the row
The answer I want - [0,55]
The answer I get - [0,86]
def find_min(marks,cols,rows):
minimum = []
temp = []
for list in marks:
min1 = min([x for x in list])
minimum.append(min1)
# for j in range(rows):
# for i in range(cols):
# temp.append(marks)
# x = min(temp)
# minimum.append(x)
return minimum
How do I modify my code
I cant use any other modules/libraries like csv or pandas
i tries using zip(*marks) - But that just prints my marks list as is.
Is there any way to separate the inner-lists from the larger lists
This will calculate the minimum per subject:
In [707]: marks = [[45,55],[22,64]]
In [697]: [min(idx) for idx in zip(*marks)]
Out[697]: [22, 55]
Try transposing the marks array (which is one student per row) so each list entry corresponds to a column ("subject") from your CSV:
def find_min(marks):
mt = zip(*marks)
mins = [min(row) for row in mt]
return mins
example usage:
marks = [[45,55],[22,64],[0,86]]
print(find_min(marks))
which prints:
[0, 55]
I am trying to get the row and column number, which meets three conditions in Pandas DataFrame.
I have a DataFrame of 0, 1, -1 (bigger than 1850); when I try to get the row and column it takes forever to get the output.
The following is an example I have been trying to use:
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randint(2, size=(1845,1850)))
b = pd.DataFrame(np.random.randint(2, size=(5,1850)))
b[b == 1] = -1
c = pd.concat([a,b], ignore_index=True)
column_positive = []
row_positive = []
column_negative = []
row_negative = []
column_zero = []
row_zero = []
for column in range(0, c.shape[0]):
for row in range(0, c.shape[1]):
if c.iloc[column, row] == 1:
column_positive.append(column)
row_positive.append(row)
elif c.iloc[column, row] == -1:
column_negative.append(column)
row_negative.append(row)
else:
column_zero.append(column)
row_zero.append(row)
I did some web searching and found that np.where() does something like this, but I have no idea how to do it.
Could anyone tell a better alternative?
You are right np.where would be one way to do it. Here's an implementation with it -
# Extract the values from c into an array for ease in further processing
c_arr = c.values
# Use np.where to get row and column indices corresponding to three comparisons
column_zero, row_zero = np.where(c_arr==0)
column_negative, row_negative = np.where(c_arr==-1)
column_positive, row_positive = np.where(c_arr==1)
If you don't mind having rows and columns as a Nx2 shaped array, you could do it in a bit more concise manner, like so -
neg_idx, zero_idx, pos_idx = [np.argwhere(c_arr == item) for item in [-1,0,1]]
I have a CSV that looks like this:
0.500187550,CPU1,7.93
0.500187550,CPU2,1.62
0.500187550,CPU3,7.93
0.500187550,CPU4,1.62
1.000445359,CPU1,9.96
1.000445359,CPU2,1.61
1.000445359,CPU3,9.96
1.000445359,CPU4,1.61
1.500674877,CPU1,9.94
1.500674877,CPU2,1.61
1.500674877,CPU3,9.94
1.500674877,CPU4,1.61
The first column is time, the second the CPU used and the third is energy.
As a final result I would like to have these arrays:
Time:
[0.500187550, 1.000445359, 1.500674877]
Energy (per CPU): e.g. CPU1
[7.93, 9.96, 9.94]
For parsing the CSV I'm using:
query = csv.reader(csvfile, delimiter=',', skipinitialspace=True)
#Arrays global time and power:
for row in query:
x = row[0]
x = float(x)
x_array.append(x) #column 0 to array
y = row[2]
y = float(y)
y_array.append(y) #column 2 to array
print x_array
print y_array
These way I get all the data from time and energy into two arrays: x_array and y_array.
Then I order the arrays:
energy_core_ord_array = []
time_ord_array = []
#Dividing array into energy and time per core:
for i in range(number_cores[0]):
e = 0 + i
for j in range(len(x_array)/(int(number_cores[0]))):
time_ord = x_array[e]
time_ord_array.append(time_ord)
energy_core_ord = y_array[e]
energy_core_ord_array.append(energy_core_ord)
e = e + int(number_cores[0])
And lastly, I cut the time array into the lenghts it should have:
final_time_ord_array = []
for i in range(len(x_array)/(int(number_cores[0]))):
final_time_ord = time_ord_array[i]
final_time_ord_array.append(final_time_ord)
Till here, although the code is not elegant, it works.
The problem comes when I try to get the array for each core.
I get it for the first core, but when I try to iterate for the next one, I donĀ“t know how to do it, and how can I store each array in a variable with a single name for example.
final_energy_core_ord_array = []
#Trunk energy core array:
for i in range(len(x_array)/(int(number_cores[0]))):
final_energy_core_ord = energy_core_ord_array[i]
final_energy_core_ord_array.append(final_energy_core_ord)
So using Pandas (library to handle dataframes in Python) you can do something like this, which is much quicker than trying to process the CSV manually like you're doing:
import pandas as pd
csvfile = "C:/Users/Simon/Desktop/test.csv"
data = pd.read_csv(csvfile, header=None, names=['time','cpu','energy'])
times = list(pd.unique(data.time.ravel()))
print times
cpuList = data.groupby(['cpu'])
cpuEnergy = {}
for i in range(len(cpuList)):
curCPU = 'CPU' + str(i+1)
cpuEnergy[curCPU] = list(cpuList.get_group('CPU' + str(i+1))['energy'])
for k, v in cpuEnergy.items():
print k, v
that will give the following as output:
[0.50018755000000004, 1.000445359, 1.5006748769999998]
CPU4 [1.6200000000000001, 1.6100000000000001, 1.6100000000000001]
CPU2 [1.6200000000000001, 1.6100000000000001, 1.6100000000000001]
CPU3 [7.9299999999999997, 9.9600000000000009, 9.9399999999999995]
CPU1 [7.9299999999999997, 9.9600000000000009, 9.9399999999999995]
Finally I got the answer, using globals.... not a great idea, but works, leave it here if someone find it useful.
final_energy_core_ord_array = []
#Trunk energy core array:
a = 0
for j in range(number_cores[0]):
for i in range(len(x_array)/(int(number_cores[0]))):
final_energy_core_ord = energy_core_ord_array[a + i]
final_energy_core_ord_array.append(final_energy_core_ord)
globals()['core%s' % j] = final_energy_core_ord_array
final_energy_core_ord_array = []
a = a + 12
print 'Final time and cores:'
print final_time_ord_array
for j in range(number_cores[0]):
print globals()['core%s' % j]
My code below is getting stuck on a random point:
import functions
from itertools import product
from random import randrange
values = {}
tables = {}
letters = "abcdefghi"
nums = "123456789"
for x in product(letters, nums): #unnecessary
values[x[0] + x[1]] = 0
for x in product(nums, letters): #unnecessary
tables[x[0] + x[1]] = 0
for line_cnt in range(1,10):
for column_cnt in range(1,10):
num = randrange(1,10)
table_cnt = functions.which_table(line_cnt, column_cnt) #Returns a number identifying the table considered
#gets the values already in the line and column and table considered
line = [y for x,y in values.items() if x.startswith(letters[line_cnt-1])]
column = [y for x,y in values.items() if x.endswith(nums[column_cnt-1])]
table = [x for x,y in tables.items() if x.startswith(str(table_cnt))]
#if num is not contained in any of these then it's acceptable, otherwise find another number
while num in line or num in column or num in table:
num = randrange(1,10)
values[letters[line_cnt-1] + nums[column_cnt-1]] = num #Assign the number to the values dictionary
print(line_cnt) #debug
print(sorted(values)) #debug
As you can see it's a program that generates random sudoku schemes using 2 dictionaries : values that contains the complete scheme and tables that contains the values for each table.
Example :
5th square on the first line = 3
|
v
values["a5"] = 3
tables["2b"] = 3
So what is the problem? Am I missing something?
import functions
...
table_cnt = functions.which_table(line_cnt, column_cnt) #Returns a number identifying the table considered
It's nice when we can execute the code right ahead on our own computer to test it. In other words, it would have been nice to replace "table_cnt" with a fixed value for the example (here, a simple string would have sufficed).
for x in product(letters, nums):
values[x[0] + x[1]] = 0
Not that important, but this is more elegant:
values = {x+y: 0 for x, y in product(letters, nums)}
And now, the core of the problem:
while num in line or num in column or num in table:
num = randrange(1,10)
This is where you loop forever. So, you are trying to generate a random sudoku. From your code, this is how you would generate a random list:
nums = []
for _ in range(9):
num = randrange(1, 10)
while num in nums:
num = randrange(1, 10)
nums.append(num)
The problem with this approach is that you have no idea how long the program will take to finish. It could take one second, or one year (although, that is unlikely). This is because there is no guarantee the program will not keep picking a number already taken, over and over.
Still, in practice it should still take a relatively short time to finish (this approach is not efficient but the list is very short). However, in the case of the sudoku, you can end up in an impossible setting. For example:
line = [6, 9, 1, 2, 3, 4, 5, 8, 0]
column = [0, 0, 0, 0, 7, 0, 0, 0, 0]
Where those are the first line (or any line actually) and the last column. When the algorithm will try to find a value for line[8], it will always fail since 7 is blocked by column.
If you want to keep it this way (aka brute force), you should detect such a situation and start over. Again, this is very unefficient and you should look at how to generate sudokus properly (my naive approach would be to start with a solved one and swap lines and columns randomly but I know this is not a good way).
I have 3 lists, all of the same length. One of the lengths is a number representing a day, and the other two lists are data which correspond to that day, e.g
day = [1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4....]
data1 = [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5,1,2,3,4....] # (effectively random numbers)
data2 = [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5,1,2,3,4....] # (again, effectively random numbers)
What I need to do is to take data1 and data2 for day 1, perform operations on it, and then repeat the process for day 2, day 3, day 4 and so on.
I currently have:
def sortfile(day, data1, data2):
x=[]
y=[]
date=[]
temp1=[]
temp2=[]
i=00
for i in range(0,len(day)-1):
if day[i] == day[i+1]:
x.append(data1[i])
y.append(data2[i])
i+=1
#print x, y
else:
for i in range(len(x)):
temp1.append(x)
for i in range(len(y)):
temp2.append(y)
date.append(day[i])
x=[]
y=[]
i+=1
while i!=(len(epoch)-1):
x.append(data1[i])
y.append(data2[i])
i+=1
date.append(day[i])
return date, temp1, temp2
This is supposed to append to the x array whilst the day stays the same, and then if it changes append all the data from the x array to the temp1 array, then clear the x array. It will then perform operations on temp1 and temp2. However, when I run this as a check (I'm aware that I'm not clearing temp1 and temp2 at any point), temp1 just fills with the full list of days and temp2 is empty. I'm not sure why this is and am open to completely restarting if a better way is suggested!
Thanks in advance.
Just zip the lists:
x = []
y = []
date = []
temp1 = []
temp2 = []
day = [1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4]
data1 = [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5,1,2,3,4]
data2 = [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5,1,2,3,4]
zipped = zip(day, data1,data2) # list(zipped) for python 3
for ind, dy, dt1, dt2 in enumerate(zipped[:-1]):
if zipped[ind+1][0] == dy:
x.append(dt1)
y.append(dt2)
else:
temp1 += x
temp2 += y
x = []
y = []
Not sure what your while loop is doing as it is outside the for loops and you don't actually return or use x and y so that code seems irrelevant and may well be the reason your code is not returning what you expect.
groupby and zip are a good solution for this problem. It lets you group bits of sorted data together. zip allows you to access the elements at each index of day, data1, and data2 together as a tuple.
from operator import itemgetter
from itertools import groupby
day = [1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4]
data1 = [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5,1,2,3,4]
data2 = [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5,1,2,3,4]
x = []
y = []
for day_num, data in groupby(zip(day, data1, data2), itemgetter(0)):
data = list(data)
data1_total = sum(d[1] for d in data)
x.append(data1_total)
data2_total = sum(d[2] for d in data)
y.append(data2_total)
itemgetter is just a function that tells groupby to group the tuple of elements by the first element in the tuple (the day value).
Another option is to use defaultdict and simply iterate over days adding data as we go:
from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for n, d in enumerate(day):
d1[d].append(data1[n])
d2[d].append(data2[n])
This creates two dicts like {day: [value1, value2...]...}. Note that this solution doesn't require days to be sorted.
Running several threads is similar to running several different programs concurrently sharing the same data. You can call a thread for each array.
Read more about threading on the threading documentation