Python CSV transpose data in one column to rows

Python CSV transpose data in one column to rows - python

I have a CSV file containing data only in the first column,
I want to use python to transpose every 4 rows to another empty CSV file, for example, row 1 to row 4 transposed to the first row; then row 5 to row 8 transposed to the second row,...etc, and finally we can get a 5 * 4 matrix in the CSV file.
How to write a script to do this？ Please give me any hint and suggestion, thank you.
I am using python 2.7.4 under Windows 8.1 x64.
update#1
I use the following code provided by thefortheye,
import sys, os
os.chdir('C:\Users\Heinz\Desktop')
print os.getcwd()
from itertools import islice
with open("test_csv.csv") as in_f, open("Output.csv", "w") as out_file:
for line in ([i.rstrip()] + map(str.rstrip, islice(in_f, 3)) for i in in_f):
out_file.write("\t".join(line) + "\n")
the input CSV file is,
and the result is,
This is not what I want.

You can use List comprehension like this
data = range(20)
print data
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
print[data[i:i + 4] for i in xrange(0, len(data), 4)]
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18,19]]
Instead of 4, you might want to use 56.
Since you are planing to read from the file, you might want to do something like this
from itertools import islice
with open("Input.txt") as in_file:
print [[int(line)] + map(int, islice(in_file, 3)) for line in in_file]
Edit As per the updated question,
from itertools import islice
with open("Input.txt") as in_f, open("Output.txt", "w") as out_file:
for line in ([i.rstrip()] + map(str.rstrip, islice(in_f, 3)) for i in in_f):
out_file.write("\t".join(line) + "\n")
Edit: Since you are looking for comma separated values, you can join the lines with ,, like this
out_file.write(",".join(line) + "\n")

You can use List comprehension and double-loop like this.
>>> M = 3
>>> N = 5
>>> a = range(M * N)
>>> o = [[a[i * N + j] for j in xrange(N)] for i in xrange(M)]
>>> print o
[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]

Related

Alternative to for loops for calculating 15^6 combinations in Python

Today, I have a nested for loop in python to calculate the value of all different combinations in a horse racing card consisting of six different races; i.e. six different arrays (of different lengths, but up to 15 items per array). It can be up to 11 390 625 combinations (15^6).
For each horse in each race, I calculate a value (EV) which I want to multiply.
Array 1: 1A,1B,1C,1D,1E,1F
Array 2: 2A,2B,2C,2D,2E,2F
Array 3: 3A,3B,3C,3D,3E,3F
Array 4: 4A,4B,4C,4D,4E,4F
Array 5: 5A,5B,5C,5D,5E,5F
Array 6: 6A,6B,6C,6D,6E,6F
1A * 1B * 1C * 1D * 1E * 1F = X,XX
.... .... .... .... ... ...
6A * 6B * 6C * 6D * 6E * 6F 0 X,XX
Doing four levels is OK. It takes me about 3 minutes.
I have yet not been able to do six levels.
I need help in creating a better way of doing this, and have no idea how to proceed. Does numpy perhaps offer help here? Pandas? I've tried compiling the code with Cython, but it did not help much.
My function takes in a list containing the horses in numerical order and their EV. (Since horse starting numbers do not start with zero, I add 1 to the index). I iterate through all the different races, and save the output for the combination into a dataframe.
def calculateCombos(horses_in_race_1,horses_in_race_2,horses_in_race_3,horses_in_race_4,horses_in_race_5,horses_in_race_6,totalCombinations, df):
totalCombinations = 0
for idx1, hr1_ev in enumerate(horses_in_race_1):
hr1_no = idx1 + 1
for idx2, hr2_ev in enumerate(horses_in_race_2):
hr2_no = idx2 + 1
for idx3, hr3_ev in enumerate(horses_in_race_3):
hr3_no_ = idx3 + 1
for idx4, hr4_ev in enumerate(horses_in_race_4):
hr4_no = idx4 + 1
for idx5, hr5_ev in enumerate(horses_in_race_5):
hr5_no = idx5 + 1
for idx6, hr6_ev in enumerate(horses_in_race_6):
hr6_no = idx6 + 1
totalCombinations = totalCombinations + 1
combinationEV = hr1_ev * hr2_ev * hr3_ev * hr4_ev * hr5_ev * hr6_ev
new_row = {'Race1':str(hr1_no),'Race2':str(hr2_no),'Race3':str(hr3_no),'Race4':str(hr4_no),'Race5':str(hr5_no),'Race6':str(hr6_no), 'EV':combinationEV}
df = appendCombinationToDF(df, new_row)
return df

Why don't you try this and see if you can run the function without any issues? This works on my laptop (I'm using PyCharm). If you can't run this, then I would say that you need a better PC perhaps. I did not encounter any memory error.
Assume that we have the following:
horses_in_race_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_2 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_3 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_4 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_5 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
horses_in_race_6 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
I have re-written the function as follows - made a change in enumeration. Also, not using df as I do not know what function this is - appendCombinationToDF
def calculateCombos(horses_in_race_1,horses_in_race_2,horses_in_race_3,horses_in_race_4,horses_in_race_5,horses_in_race_6):
for idx1, hr1_ev in enumerate(horses_in_race_1, start = 1):
for idx2, hr2_ev in enumerate(horses_in_race_2, start = 1):
for idx3, hr3_ev in enumerate(horses_in_race_3, start = 1):
for idx4, hr4_ev in enumerate(horses_in_race_4, start = 1):
for idx5, hr5_ev in enumerate(horses_in_race_5, start = 1):
for idx6, hr6_ev in enumerate(horses_in_race_6, start = 1):
combinationEV = hr1_ev * hr2_ev * hr3_ev * hr4_ev * hr5_ev * hr6_ev
new_row = {'Race1':str(idx1),'Race2':str(idx2),'Race3':str(idx3),'Race4':str(idx4),'Race5':str(idx5),'Race6':str(idx6), 'EV':combinationEV}
l.append(new_row)
#df = appendCombinationToDF(df, new_row)
l = [] # df = ...
calculateCombos(horses_in_race_1, horses_in_race_2, horses_in_race_3, horses_in_race_4, horses_in_race_5, horses_in_race_6)
Executing len(l), I get:
11390625 # maximum combinations possible. This means that above function ran successfully and computation succeeded.
If the above can be executed, replace list l with df and see if function can execute without encountering memory error. I was able to run the above in less than 20-30 seconds.

Python3 read file and place 2D array with int value one line

I'm interested that this 2 lines can be solved in 1 line:
data = [ line.strip().split() for line in f ] # f = file
data = [ [ int(num) for num in nums ] for nums in data ]
Example lines of file:
9 3 14 3 10 17
9 8 19 12 5 9
Example result:
[[9, 3, 14, 3, 10, 17], [9, 8, 19, 12, 5, 9]]

Try:
f = open("file.txt", "r")
data = [[int(num) for num in line.split()] for line in f.readlines()]
print(data)
[[9, 3, 14, 3, 10, 17], [9, 8, 19, 12, 5, 9]]
or using numpy can be slightly neater:
import numpy as np
data = np.loadtxt("file.txt", dtype=int).tolist()

how to create a multidimensional array on the fly using python?

I have a loop which generates a value_list each time it runs, at the end of each iteration i want to append all the lists into a one multi dimensional array
I have:
value_list = [1,2,3,4] in 1st iteration
value_list = [5,6,7,8] in 2nd iteration
value list = [9,10,11,12] in 3rd iteration
etc...
At the end of each iteration I want one multi dimensional array like
value_list_copy = [[1,2,3,4]] in the 1st iteration
value_list_copy = [[1,2,3,4],[5,6,7,8]] in the 2nd iteration
value_list_copy = [[1,2,3,4],[5,6,7,8],[9,10,11,12]]
etc...
How could I achieve this?
Thanks

You can use a nested comprehension and itertools.count:
from itertools import count, islice
cols = 4
rows = 5
c = count(1)
matrix = [[next(c) for _ in range(cols)] for _ in range(rows)]
# [[1, 2, 3, 4],
# [5, 6, 7, 8],
# [9, 10, 11, 12],
# [13, 14, 15, 16],
# [17, 18, 19, 20]]
The cool kids might also want to zip the count iterator with itself:
list(islice(zip(*[c]*cols), rows))
# [(1, 2, 3, 4),
# (5, 6, 7, 8),
# (9, 10, 11, 12),
# (13, 14, 15, 16),
# (17, 18, 19, 20)]

If you are using Python3.8 then use Walrus assignment(:=).
For Syntax and semantic.
count=0
rows=5
cols=4
[[(count:=count+1) for _ in range(cols)] for _ in range(rows)]
Output:
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20]]
Without using :=.
rows=5
cols=4
[list(range(i,i+cols)) for i in range(1,rows*cols,cols)]

Try this:
limit = 10
length_of_elements_in_each_list = 4
[range(i, i+length_of_elements_in_each_list) for i in range(1, limit)]
You can set a limit and length_of_elements_in_each_list according to your need.

Try this below :
value_list_copy = []
for i in range(n): # ----------> Assuming n is the number of times your loop is running
value_list_copy.append(value_list) # ------ Append your value list in value_list_copy in every iteration
Here you will get an array of arrays.
print(value_list_copy)

Here are two other possible solutions:
Double for loop approach
rows, cols, start = 3, 4, 1
value_list_copy = []
for j in range(rows):
value_list = []
for i in range(start, cols + start):
value_list.append((j*cols)+i)
value_list_copy.append(value_list)
print(
f'value_list = {value_list}\n'
f'value_list_copy = {value_list_copy}\n'
)
List comp method
rows, cols, start = 3, 4, 1
value_list_copy_2 = [
[
(j*cols)+i for i in range(start, cols + start)
] for j in range(rows)
]
print(f'value_list_copy_2 = {value_list_copy_2}')
Python Tutor Link to example code

How to calculate median from 2 different lists in Python

I have two lists note = [6,8,10,13,14,17] Effective = [3,5,6,7,5,1] ,the first one represents grades, the second one the students in the class that got that grade. so 3 kids got a 6 and 1 got a 17. I want to calculate the mean and the median. for the mean I got:
note = [6,8,10,13,14,17]
Effective = [3,5,6,7,5,1]
products = [] for num1, num2 in zip(note, Effective):
products.append(num1 * num2)
print(sum(products)/(sum(Effective)))
My first question is, how do I turn both lists into a 3rd list:
(6,6,6,8,8,8,8,8,10,10,10,10,10,10,13,13,13,13,13,13,13,14,14,14,14,14,17)
in order to get the median.
Thanks,
Donka

Here's one approach iterating over Effective on an inner level to replicate each number as many times as specified in Effective, and taking the median using statistics.median:
from statistics import median
out = []
for i in range(len(note)):
for _ in range(Effective[i]):
out.append(note[i])
print(median(out))
# 10

To get your list you could do something like
total = []
for grade, freq in zip(note, Effective):
total += freq*[grade]

You can use np.repeat to get a list with the new values.
note = [6,8,10,13,14,17]
Effective = [3,5,6,7,5,1]
import numpy as np
new_list = np.repeat(note,Effective)
np.median(new_list),np.mean(new_list)

To achieve output like the third list that you expect you have to do something like that:
from statistics import median
note = [6,8,10,13,14,17]
Effective = [3,5,6,7,5,1]
newList = []
for index,value in enumerate(Effective):
for j in range(value):
newList.append(note[index])
print(newList)
print("Median is {}".format(median(newList)))
Output:
[6, 6, 6, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 17]
Median is 10

For computing the median I suggest you use statistics.median:
from statistics import median
note = [6, 8, 10, 13, 14, 17]
effective = [3, 5, 6, 7, 5, 1]
total = [n for n, e in zip(note, effective) for _ in range(e)]
result = median(total)
print(result)
Output
10
If you look at total (in the code above), you have:
[6, 6, 6, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 17]
A functional alternative, using repeat:
from statistics import median
from itertools import repeat
note = [6, 8, 10, 13, 14, 17]
effective = [3, 5, 6, 7, 5, 1]
total = [v for vs in map(repeat, note, effective) for v in vs]
result = median(total)
print(result)

note = [6,8,10,13,14,17]
effective = [3,5,6,7,5,1]
newlist=[]
for i in range(0,len(note)):
for j in range(effective[i]):
newlist.append(note[i])
print(newlist)

Python list slicing for n-2 every n

If I have a list test
test = [i for i in range(20)]
print(test)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
and I want to get the last 3 numbers every 5 numbers such that I get a list that looks like:
[2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]
Is there a way to do this with list slicing? I can do it with a modulo function like
[i for i in test if i % 5 > 1]
But I'm wondering if there is a way to do this with list slicing? Thanks

Use the filter function:
list(filter(lambda x: x % 5 > 1, test)) # [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]

If ordering does not matter, you can try the following:
test[2::5] + test[3::5] + test[4::5]
Or more generally speaking
start = 2 #Number of indices to skip
n = 5
new_test = []
while start < 5:
b.extend(test[start::n])
start += 1

Yes, but I very much doubt it will be faster than a simple list comprehension:
from itertools import chain, zip_longest as zipl
def offset_modulo(l, x, n):
sentinel = object()
slices = (l[i::n] for i in range(x, n))
iterable = chain.from_iterable(zipl(*slices, fillvalue=sentinel))
return list(filter(lambda x: x is not sentinel, iterable))
print(offset_modulo(range(20), 2, 5))
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]
print(offset_modulo(range(24), 2, 5))
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19, 22, 23]
Basically, this approach gets the list slices that represents each the index i such that i % n >= x. It then uses zip and chain to flatten those into the output.
Edit:
A simpler way
def offset(l, x, n):
diff = n-x
slices = (l[i:i+diff] for i in range(x, len(l), n))
return list(chain.from_iterable(slices))
offset(range(20), 2, 5)
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]
offset(range(24), 2, 5)
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19, 22, 23]
Where we get the slices of the adjacent elements we want, then chain those together.

I propose this solution:
from functools import reduce
reduce(lambda x, y: x + y, zip(test[2::5], test[3::5], test[4::5]))
Testing with timeit, it is faster than filter and comprehension list (at least on my pc).
Here the code to carry out an execution time comparison:
import numpy as np
import timeit
a = timeit.repeat('list(filter(lambda x: x % 5 > 1, test))',
setup='from functools import reduce; test = list(range(20))',
repeat=20,
number=100000)
b = timeit.repeat('[i for i in test if i % 5 > 1]',
repeat=20,
setup='test = list(range(20))',
number=100000)
c = timeit.repeat('reduce(lambda x, y: x + y, zip(test[2::5], test[3::5], test[4::5]))',
repeat=20,
setup='from functools import reduce;test = list(range(20))',
number=100000)
list(map(lambda x: print("{}:\t\t {} ({})".format(x[0], np.mean(x[1]), np.std(x[1]))),
[("filter list", a),
('comprehension', b),
('reduce + zip', c)]))
The previous code produce the following results:
filter list: 0.2983790061000036 (0.007463432805174629)
comprehension: 0.15115660065002884 (0.004455055805853705)
reduce + zip: 0.11976779574997636 (0.002553487341208172)
I hope this can help :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python CSV transpose data in one column to rows - python

You can use List comprehension and double-loop like this. >>> M = 3 >>> N = 5 >>> a = range(M * N) >>> o = [[a[i * N + j] for j in xrange(N)] for i in xrange(M)] >>> print o [[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]

Related

Alternative to for loops for calculating 15^6 combinations in Python

Python3 read file and place 2D array with int value one line

how to create a multidimensional array on the fly using python?

How to calculate median from 2 different lists in Python

Python list slicing for n-2 every n

Categories

Resources