I have data in the following format:
user,item,rating
1,1,3
1,2,2
2,1,2
2,4,1
and so on
I want to convert this in matrix form
So, the out put is like this
Item--> 1,2,3,4....
user
1 3,2,0,0....
2 2,0,0,1
....and so on..
How do I do this in python?
THanks
data = [
(1,1,3),
(1,2,2),
(2,1,2),
(2,4,1),
]
#import csv
#with open('data.csv') as f:
# next(f) # Skip header
# data = [map(int, row) for row in csv.reader(f)]
# # Python 3.x: map(int, row) -> tuple(map(int, row))
n = max(max(user, item) for user, item, rating in data) # Get size of matrix
matrix = np.zeros((n, n))
for user, item, rating in data:
matrix[user-1][item-1] = rating # Convert to 0-based index.
for row in matrix:
print(row)
prints
[3, 2, 0, 0]
[2, 0, 0, 1]
[0, 0, 0, 0]
[0, 0, 0, 0]
a different approach from #falsetru,
do you read from file in write to file?
may be work with dictionary
from collections import defaultdict
valdict=defaultdict(int)
nuser=0
nitem=0
for line in infile:
eachline=line.strip().split(",")
valdict[tuple(eachline[0:2])]=eachline[2]
nuser=max(nuser,eachline[0])
nitem=max(nitem,eachline[1])
towrite=",".join(range(1,nuser+1))+"\n"
for i in range(1:nuser+1):
towrite+=str(i)
for j in range(1:nitem+1):
towrite+=","+str(valdict[i,j])
towrite+="\n"
outfile.write(towrite)
Related
I currently have a CSV file that contains voltage values from an oscilloscope. The data is stored in a singular row made up of 5000 cells, like this:
[0, 0, 0, 2, 2, 2, 4, 2, 2, 2, 0, 0, 0...]
screenshot of part of excel file
When I try to import the data into an array, it creates an array of index 1, containing all 5000 values in that first array[0] index. So when I print array[0], it shows all 5000 values, and when I print array[1-4999], an error occurs. How can I have each value from the cell in its own spot in the array?
Here is my code:
import csv
array = []
with open("test2.csv", 'r') as f:
cols = csv.reader(f, delimiter=",")
for r in cols:
array.append([r])
print(array[0])
import csv
array = []
with open("test2.csv", 'r') as f:
cols = csv.reader(f, delimiter=",")
# r is a single element
for r in cols:
# loop through each item in r
for i in r:
# append each element as list element
array.append([i])
print(array)
Ouput:
[['0'], ['0'], ['0'], ['2'], ['4'], ['5']]
This question already has answers here:
Python read file into 2d list
(9 answers)
Closed 1 year ago.
I have a text file as follows:
id1, 0, 0, 0
id2, 0, 0, 0
id3, 0, 0, 0
I want to store this data in an array but I also need each data to be an induvidual element in the corresponding line's array.
So the output should look like this:
[["id1", 0, 0, 0], ["id2", 0, 0, 0], ["id3", 0, 0, 0]...]
data= "data.txt"
dataArray= []
file = open(data, 'r')
contents=file.read()
file.close()
lines= contents.split("\n")
for line in lines:
dataArray.append(line)
I couldn't find a way to split the string and store the data after this part.
Assuming there is a single space between every element of the line you can just change your for loop as -
for line in lines:
dataArray.append(line.split(", "))
Here the line is further split on basis of , followed by a space. The split will return a list itself.
If that is not the case (I mean if there are non-uniform spaces) then you can just combine split(), strip() and list comprehension like this -
for line in lines:
dataArray.append([e.strip() for e in line.split(',')])
Here strip is required because after split() the individual element of the output of split may contain space before or after it.
If you need the numbers to be converted as int, then further modify it to -
for line in lines:
dataArray.append([int(e) if e.strip().isnumeric() else e.strip() for e in line.split(',')])
Your data is in csv format which is supported by python:
(sos_config) ~/wk/cliosoft/projects/sos_config $ ipython
Python 3.9.4 (default, Apr 9 2021, 01:03:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.23.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import csv
In [2]: !cat /tmp/data.csv
id1, 0, 0, 0
id2, 0, 0, 0
id3, 0, 0, 0
In [3]: with open('/tmp/data.csv') as f:
...: reader = csv.reader(f)
...: data_array = [row for row in reader]
...:
In [4]: data_array
Out[4]:
[['id1', ' 0', ' 0', ' 0'],
['id2', ' 0', ' 0', ' 0'],
['id3', ' 0', ' 0', ' 0']]
You can try this it'll take care of integer/strings separately and also new-line char -
with open(data, 'r') as f:
content = f.read().splitlines()
result = [
[
int(k) if k.strip().isnumeric() else k
for k in item.strip().split(',')
]
for item in content
]
out_gate,useless_column,in_gate,num_connect
a,u,b,1
a,s,b,3
b,e,a,2
b,l,c,4
c,e,a,5
c,s,b,5
c,s,b,3
c,c,a,4
d,o,c,2
d,l,c,3
d,u,a,1
d,m,b,2
shown above is a given, sample csv file. First of all, My final goal is to get the answer as a form of csv file like below:
,a,b,c,d
a,0,4,0,0
b,2,0,4,0
c,9,8,0,0
d,1,2,5,0
I am trying to match this each data (a,b,c,d) one by one to the in_gate so, for example when out_gate 'c'-> in_gate 'b', number of connections is 8 and 'c'->'a' becomes 9.
I want to solve it with lists(or tuple, Dictionary, set) or collections. defaultdict WITHOUT USING PANDAS OR NUMPY, and I want a solution that can be applied to many gates (around 10 to 40) as well.
I understand there is a similar question and It helped a lot, but I still have some troubles in compiling. Lastly, Is there any way with using lists of columns and for loop?
((ex) list1=[a,b,c,d],list2=[b,b,a,c,a,b,b,a,c,c,a,b])
what if there are some useless columns that are not related to the data but the final goal remains same?
thanks
I'd use a Counter for this task. To keep the code simple, I'll read the data from a string. And I'll let you figure out how to produce the output as a CSV file in the format of your choice.
import csv
from collections import Counter
data = '''\
out_gate,in_gate,num_connect
a,b,1
a,b,3
b,a,2
b,c,4
c,a,5
c,b,5
c,b,3
c,a,4
d,c,2
d,c,3
d,a,1
d,b,2
'''.splitlines()
reader = csv.reader(data)
#skip header
next(reader)
# A Counter to accumulate the data
counts = Counter()
# Accumulate the data
for ogate, igate, num in reader:
counts[ogate, igate] += int(num)
# We could grab the keys from the data, but it's easier to hard-code them
keys = 'abcd'
# Display the accumulated data
for ogate in keys:
print(ogate, [counts[ogate, igate] for igate in keys])
output
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]
If I understand your problem correctly, you could try and using a nested collections.defaultdict for this:
import csv
from collections import defaultdict
d = defaultdict(lambda : defaultdict(int))
with open('gates.csv') as in_file:
csv_reader = csv.reader(in_file)
next(csv_reader)
for row in csv_reader:
outs, ins, connect = row
d[outs][ins] += int(connect)
gates = sorted(d)
for outs in gates:
print(outs, [d[outs][ins] for ins in gates])
Which Outputs:
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]
I have the following dataset in a CSV file
[1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2]
Now I want to count each value by comparing them and store it in an array, but I don't want the frequency. So my output should be like this:
[3, 4, 3, 2, 1]
My code is as follows:
import csv
with open("c:/Users/Niels/Desktop/test.csv", 'rb') as f:
reader = csv.reader(f, delimiter=';')
data = []
for column in reader:
data.append(column[0])
results = data
results = [int(i) for i in results]
print results
dataFiltered = []
for i in results:
if i == (i+1):
counter = counter + 1
dataFiltered.append(counter)
counter = 0
print dataFiltered
My idea was by comparing the cell values. I know something is wrong in the for loop of results, but I can't figure out where my mistake is. My idea was by comparing the cell values. Maybe
I won't go into the details of your loop which is very wrong, if i==(i+1): just cannot be True for starters.
Next, you'd be better off with itertools.groupby and sum the length of the groups:
import itertools
results = [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2]
freq = [len(list(v)) for _,v in itertools.groupby(results)]
print(freq)
len(list(v)) uses list to force the iteration on the grouped items so we can compute the length (maybe sum(1 for x in v) would more performant/appropriate, I haven't benched both approaches)
I get:
[3, 4, 3, 2, 1]
Aside: reading the first column of a csv file and convert the result to integer can be simply acheived by:
results = [int(row[0]) for row in reader]
I wrote this function. The input and expected results are indicated in the docstring.
def summarize_significance(sign_list):
"""Summarizes a series of individual significance data in a list of ocurrences.
For a group of p.e. 5 measurements and two diferent states, the input data
has the form:
sign_list = [[-1, 1],
[0, 1],
[0, 0],
[0,-1],
[0,-1]]
where -1, 0, 1 indicates decrease, no change or increase respectively.
The result is a list of 3 items lists indicating how many measurements
decrease, do not change or increase (as list items 0,1,2 respectively) for each state:
returns: [[1, 4, 0], [2, 1, 2]]
"""
swaped = numpy.swapaxes(sign_list, 0, 1)
summary = []
for row in swaped:
mydd = defaultdict(int)
for item in row:
mydd[item] += 1
summary.append([mydd.get(-1, 0), mydd.get(0, 0), mydd.get(1, 0)])
return summary
I am wondering if there is a more elegant, efficient way of doing the same thing. Some ideas?
Here's one that uses less code and is probably more efficient because it just iterates through sign_list once without calling swapaxes, and doesn't build a bunch of dictionaries.
summary = [[0,0,0] for _ in sign_list[0]]
for row in sign_list:
for index,sign in enumerate(row):
summary[index][sign+1] += 1
return summary
No, just more complex ways of doing so.
import itertools
def summarize_significance(sign_list):
res = []
for s in zip(*sign_list):
d = dict((x[0], len(list(x[1]))) for x in itertools.groupby(sorted(s)))
res.append([d.get(x, 0) for x in (-1, 0, 1)])
return res
For starters, you could do:
swapped = numpy.swapaxes(sign_list, 0, 1)
for row in swapped:
mydd = {-1:0, 0:0, 1:0}
for item in row:
mydd[item] += 1
summary.append([mydd[-1], mydd[0], mydd[1])
return summary