Compare cell values csv file python - python

I have the following dataset in a CSV file
[1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2]
Now I want to count each value by comparing them and store it in an array, but I don't want the frequency. So my output should be like this:
[3, 4, 3, 2, 1]
My code is as follows:
import csv
with open("c:/Users/Niels/Desktop/test.csv", 'rb') as f:
reader = csv.reader(f, delimiter=';')
data = []
for column in reader:
data.append(column[0])
results = data
results = [int(i) for i in results]
print results
dataFiltered = []
for i in results:
if i == (i+1):
counter = counter + 1
dataFiltered.append(counter)
counter = 0
print dataFiltered
My idea was by comparing the cell values. I know something is wrong in the for loop of results, but I can't figure out where my mistake is. My idea was by comparing the cell values. Maybe

I won't go into the details of your loop which is very wrong, if i==(i+1): just cannot be True for starters.
Next, you'd be better off with itertools.groupby and sum the length of the groups:
import itertools
results = [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2]
freq = [len(list(v)) for _,v in itertools.groupby(results)]
print(freq)
len(list(v)) uses list to force the iteration on the grouped items so we can compute the length (maybe sum(1 for x in v) would more performant/appropriate, I haven't benched both approaches)
I get:
[3, 4, 3, 2, 1]
Aside: reading the first column of a csv file and convert the result to integer can be simply acheived by:
results = [int(row[0]) for row in reader]

Related

Read List in List [duplicate]

This question already has answers here:
How to convert string representation of list to a list
(19 answers)
Closed 5 months ago.
I have a text file and there is 3 lines on data in it.
[1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
[1, 1, 3, 3, 3, 1, 1, 1, 1, 2, 1, 1, 1, 3, 3]
[1, 2, 3, 1, 3, 1, 1, 3, 1, 3, 1, 1, 1, 3, 3]
I try to open and get data in it.
with open("rafine.txt") as f:
l = [line.strip() for line in f.readlines()]
f.close()
now i have list in list.
if i say print(l[0]) it shows me [1, 2, 1, 1, 3, 1, 1, 2, 1, 3, 1, 1, 1, 3, 3]
But i want to get numbers in it.
So when i write print(l[0][0])
i want to see 1 but it show me [
how can i fix this ?
You can use literal_eval to parse the lines from the file & build the matrix:
from ast import literal_eval
with open("test.txt") as f:
matrix = []
for line in f:
row = literal_eval(line)
matrix.append(row)
print(matrix[0][0])
print(matrix[1][4])
print(matrix[2][8])
result:
1
3
1
import json
with open("rafine.txt") as f:
for line in f.readlines():
line = json.loads(line)
print(line)
The best approach depends on what assumption you make about the data in your text file:
ast.literal_eval
If the data in your file is formatted the same way, it would be inside python source-code, the best approach is to use literal_eval:
from ast import literal_eval
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = literal_eval(line)
data.append(row)
or, the short version:
with open(filename) as f:
data = [literal_eval(line) for line in f]
re.findall
If you can make few assumptions about the data, using regular expressions to find all digits might be a way forward. The below builds lists by simply extracting any digits in the text file, regardless of separators or other characters in the file:
import re
data = [] # will contain list of lists
with open("filename") as f:
for line in f:
row = [int(i) for i in re.findall(r'\d+', line)]
data.append(row)
or, in short:
with open(filename) as f:
data= [ [int(i) for i in re.findall(r'\d+', line)] for line in f ]
handwritten parsing
If both options are not suitable, there is always an option to parse by hand, to tailor for the exact format:
data = [] # will contain list of lists
with open(filename) as f:
for line in f:
row = [int(i) for i in line[1:-1].split(, )]
data.append(row)
The [1,-1] will remove the first and last character (the brackets), then split(", ") will split it into a list. for i in ... will iterate over the items in this list (assigning i to each item) and int(i) will convert i to an integer.

Finding the best supersets in a list based on intersections

I have a file including lines as follows,
finalInjectionList is input file: [0, 2, 3] [0, 2, 3, 4] [0, 3] [1, 2, 4] [2, 3] [2, 3, 4]
Here [0, 2, 3, 4] and [1, 2, 4] are the best supersets for my problem and I want to write them to an outputfile. Because those are supersets of some other elements and NOT subsets of any line.
my code:
import ast
import itertools
def get_data(filename):
with open(filename, 'r') as fi:
data = fi.readlines()
return data
def get_ast_set(line):
return set(ast.literal_eval(line))
def check_infile(datafile, savefile):
list1 = [get_ast_set(row) for row in get_data(datafile)]
print(list1)
outlist = []
#for i in range(len(list1)):
for a, b in itertools.combinations(list1, 2):
if a.issuperset(b):
with open(savefile, 'a') as fo:
fo.writelines(str(a))
if __name__ == "__main__":
datafile = str("./finalInjectionList")
savefile = str("./filteredSets" )
check_infile(datafile, savefile)
My code writes all supersets, e.g {2, 3, 4} also. But {0, 2, 3, 4} covers {2, 3, 4} already, so I do not want to write {2, 3, 4} to output file.
Is there any suggestion?
Your logic in the for loop with itertools.combinations is a bit flawed, as it would create a combination ((2,3,4} , (2,3)), where (2,3,4) is the superset.
I would approach the problem by removing items from the list if they are a subset of another item.
import itertools
import ast
with open(r"C:\Users\%USERNAME%\Desktop\test.txt", 'r') as f:
data = f.readlines()
data = [d.replace('\n','') for d in data]
data = [set(ast.literal_eval(d)) for d in data]
data.sort(key=len)
data1 = data
for d in data:
flag = 0
for d1 in data1:
print(d, d1)
if d == d1:
print('both sets are same')
continue
if d.issubset(d1):
print(str(d) + ' is a subset of ' + str(d1))
flag = 1
break
else:
print(str(d) + ' is not a subset of ' + str(d1))
if flag == 1:
# if the set is a subset of another set, remove it
data1 = [d1 for d1 in data1 if d1 != d]
print('set: ',data1) # data1 will contain your result at the end of the loop
With input:
0, 2, 3
0, 2, 3, 4
0, 3
1, 2, 4
2, 3
2, 3, 4
The output will be
[{1, 2, 4}, {0, 2, 3, 4}]
which can be written to the file
Solved by modifying routine check_infile
import ast
import itertools
# A union by rank and path compression based
# program to detect cycle in a graph
from collections import defaultdict
def findparent(d, node):
"""Goes through chain of parents, until we reach node which is its own parent
Meaning, no node has it has a subset"""
if d[node] == node:
return node
else:
return findparent(d, d[node])
def get_data(filename):
with open(filename, 'r') as fi:
data = fi.readlines()
return data
def get_ast_set(line):
return set(ast.literal_eval(line))
def check_infile(datafile, savefile):
"""Find minimum number of supersets as follows:
1) identify superset of each set
2) Go through superset chains (findparents) to find set of nodes which are supersets (roots) """
list1 = [get_ast_set(row) for row in get_data(datafile)]
print(list1)
outlist = []
n = len(list1)
# Initially each node is its own parent (i.e. include self as superset)
# Here parent means superset
parents = {u:u for u in range(n)}
for u in range(n):
a = list1[u]
for v in range(u+1, n):
b = list1[v]
if a.issuperset(b):
parents[v] = u # index u is superset of v
elif b.issuperset(a):
parents[u] = v # index v is superset of u
# Print root nodes
roots = set()
for u in range(n):
roots.add(findparent(parents, u))
with open(savefile, 'w') as fo:
for i in roots:
fo.write(str(list1[i]))
fo.write('\n')
if __name__ == "__main__":
datafile = str("./finalInjectionList.txt")
savefile = str("./filteredSets.txt" )
check_infile(datafile, savefile)
Test File (finalInjectionList.txt)
[0, 2, 3]
[0, 2, 3, 4]
[0, 3]
[1, 2, 4]
[2, 3]
[2, 3, 4]
Output File (filteredSets.txt)
{0, 2, 3, 4}
{1, 2, 4}

Reading a text document containing python list into a python program

I have a text file(dummy.txt) which reads as below:
['abc',1,1,3,3,0,0]
['sdf',3,2,5,1,3,1]
['xyz',0,3,4,1,1,1]
I expect this to be in lists in python as below:
article1 = ['abc',1,1,3,3,0,0]
article2 = ['sdf',3,2,5,1,3,1]
article3 = ['xyz',0,3,4,1,1,1]
That many articles have to be created as many lines present in dummy.txt
I was trying the following things:
Opened the file, split it by '\n' and appended it to an empty list in python, it had extra quotes and square brackets hence tried to use 'ast.literal_eval' which did not work as well.
my_list = []
fvt = open("dummy.txt","r")
for line in fvt.read():
my_list.append(line.split('\n'))
my_list = ast.literal_eval(my_list)
I also tried to manually remove additional quotes and extra square brackets using replace, that did not help me either. Any leads much appreciated.
This should help.
import ast
myLists = []
with open(filename) as infile:
for line in infile: #Iterate Each line
myLists.append(ast.literal_eval(line)) #Convert to python object and append.
print(myLists)
Output:
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
fvt.read() will produce the entire file string, so that means line will contain a single character string. So this will not work very well, you also use literal_eval(..) with the entire list of strings, and not a single string.
You can obtain the results by iterating over the file handler, and each time call literal_eval(..) on a single line:
from ast import literal_eval
with open("dummy.txt","r") as f:
my_list = [literal_eval(line) for line in f]
or by using map:
from ast import literal_eval
with open("dummy.txt","r") as f:
my_list = list(map(literal_eval, f))
We then obtain:
>>> my_list
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
ast.literal_eval is the right approach. Note that creating a variable number of variables like article1, article2, ... is not a good idea. Use a dictionary instead if your names are meaningful, a list otherwise.
As Willem mentioned in his answer fvt.read() will give you the whole file as one string. It is much easier to exploit the fact that files are iterable line-by-line. Keep the for loop, but get rid of the call to read.
Additionally,
my_list = ast.literal_eval(my_list)
is problematic because a) you evaluate the wrong data structure - you want to evaluate the line, not the list my_list to which you append and b) because you reassign the name my_list, at this point the old my_list is gone.
Consider the following demo. (Replace fake_file with the actual file you are opening.)
>>> from io import StringIO
>>> from ast import literal_eval
>>>
>>> fake_file = StringIO('''['abc',1,1,3,3,0,0]
... ['sdf',3,2,5,1,3,1]
... ['xyz',0,3,4,1,1,1]''')
>>> result = [literal_eval(line) for line in fake_file]
>>> result
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
Of course, you could also use a dictionary to hold the evaluated lines:
>>> result = {'article{}'.format(i):literal_eval(line) for i, line in enumerate(fake_file, 1)}
>>> result
{'article2': ['sdf', 3, 2, 5, 1, 3, 1], 'article1': ['abc', 1, 1, 3, 3, 0, 0], 'article3': ['xyz', 0, 3, 4, 1, 1, 1]}
where now you can issue
>>> result['article2']
['sdf', 3, 2, 5, 1, 3, 1]
... but as these names are not very meaningful, I'd just go for the list instead which you can index with 0, 1, 2, ...
When I do this:
import ast
x = '[ "A", 1]'
x = ast.literal_eval(x)
print(x)
I get:
["A", 1]
So, your code should be:
for line in fvt.read():
my_list.append(ast.literal_eval(line))
Try this split (no imports needed) (i recommend):
with open('dummy.txt','r') as f:
l=[i[1:-1].strip().replace("'",'').split(',') for i in f]
Now:
print(l)
Is:
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
As expected!!!

numpy.searchsorted for multiple instances of the same entry - python

I have the following variables:
import numpy as np
gens = np.array([2, 1, 2, 1, 0, 1, 2, 1, 2])
p = [0,1]
I want to return the entries of gens that match each element of p.
So ideally I would like it to return:
result = [[4],[2,3,5,7],[0,2,6,8]]
#[[where matched 0], [where matched 1], [the rest]]
--
My attempts so far only work with one variable:
indx = gens.argsort()
res = np.searchsorted(gens[indx], [0])
gens[res] #gives 4, which is the position of 0
But I try with with
indx = gens.argsort()
res = np.searchsorted(gens[indx], [1])
gens[res] #gives 1, which is the position of the first 1.
So:
how can I search for an entry that has multiple occurrences
how can I search for multiple entries each of which have multiple occurrences?
You can use np.where
>>> np.where(gens == p[0])[0]
array([4])
>>> np.where(gens == p[1])[0]
array([1, 3, 5, 7])
>>> np.where((gens != p[0]) & (gens != p[1]))[0]
array([0, 2, 6, 8])
Or np.in1d and np.nonzero
>>> np.nonzero(np.in1d(gens, p[0]))[0]
>>> np.nonzero(np.in1d(gens, p[1]))[0]
>>> np.nonzero(~np.in1d(gens, p))[0]

data accumulating from csv file using python

out_gate,useless_column,in_gate,num_connect
a,u,b,1
a,s,b,3
b,e,a,2
b,l,c,4
c,e,a,5
c,s,b,5
c,s,b,3
c,c,a,4
d,o,c,2
d,l,c,3
d,u,a,1
d,m,b,2
shown above is a given, sample csv file. First of all, My final goal is to get the answer as a form of csv file like below:
,a,b,c,d
a,0,4,0,0
b,2,0,4,0
c,9,8,0,0
d,1,2,5,0
I am trying to match this each data (a,b,c,d) one by one to the in_gate so, for example when out_gate 'c'-> in_gate 'b', number of connections is 8 and 'c'->'a' becomes 9.
I want to solve it with lists(or tuple, Dictionary, set) or collections. defaultdict WITHOUT USING PANDAS OR NUMPY, and I want a solution that can be applied to many gates (around 10 to 40) as well.
I understand there is a similar question and It helped a lot, but I still have some troubles in compiling. Lastly, Is there any way with using lists of columns and for loop?
((ex) list1=[a,b,c,d],list2=[b,b,a,c,a,b,b,a,c,c,a,b])
what if there are some useless columns that are not related to the data but the final goal remains same?
thanks
I'd use a Counter for this task. To keep the code simple, I'll read the data from a string. And I'll let you figure out how to produce the output as a CSV file in the format of your choice.
import csv
from collections import Counter
data = '''\
out_gate,in_gate,num_connect
a,b,1
a,b,3
b,a,2
b,c,4
c,a,5
c,b,5
c,b,3
c,a,4
d,c,2
d,c,3
d,a,1
d,b,2
'''.splitlines()
reader = csv.reader(data)
#skip header
next(reader)
# A Counter to accumulate the data
counts = Counter()
# Accumulate the data
for ogate, igate, num in reader:
counts[ogate, igate] += int(num)
# We could grab the keys from the data, but it's easier to hard-code them
keys = 'abcd'
# Display the accumulated data
for ogate in keys:
print(ogate, [counts[ogate, igate] for igate in keys])
output
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]
If I understand your problem correctly, you could try and using a nested collections.defaultdict for this:
import csv
from collections import defaultdict
d = defaultdict(lambda : defaultdict(int))
with open('gates.csv') as in_file:
csv_reader = csv.reader(in_file)
next(csv_reader)
for row in csv_reader:
outs, ins, connect = row
d[outs][ins] += int(connect)
gates = sorted(d)
for outs in gates:
print(outs, [d[outs][ins] for ins in gates])
Which Outputs:
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]

Categories