how do i put my condition result into a list? - python

I want to open a file, get numbers after the = sign, and put the result into a list. I did the first steps, but I'm stuck with assignment of the results into a list.
I tried to create a list and assign the result on it but when I print my list it shows me only the last results:
import cv2 as cv
import time
import numpy
from math import log
import csv
import re
statList = []
with open("C:\\ProgramData\\OutilTestObjets3D\\MaquetteCB-2019\\DataSet\\DEFAULT\\terrain\\3DObjects\\building\\house01.ive.stat.txt", 'r') as f:
#
statList = f.readlines()
statList = [x.strip() for x in statList]
for line in statList :
if (re.search("=" ,str(line))):
if (re.search('#IND',str(line))):
print("ERREUR")
else:
results = re.findall("=\s*?(\d+\.\d+|\d+)", str(line))
print ("result="+str(results))
statList.append(log(float(results[0])))
floatList = [str(results)]
print(floatList)

Its because you are overwriting results variable each time through your loop.
try
#
results = []
statList = f.readlines()
statList = [x.strip() for x in statList]
for line in statList :
if (re.search("=" ,str(line))):
if (re.search('#IND',str(line))):
print("ERREUR")
else:
results.extend(re.findall("=\s*?(\d+\.\d+|\d+)", str(line)))
print ("result="+str(results))
statList.append(log(float(results[0])))
floatList = [str(results)]
print(floatList)

The problem with your program is defining an empty list statList, then redefine it as statList = f.readlines() and append results to it. So, change the name of empty list, then you can use extend as long as results are list objects. And finally, use built-in map function to apply a function for every single item of your list:
from math import log
import re
final_result = []
with open("file.txt", 'r') as f:
#
statList = f.readlines()
statList = [x.strip() for x in statList]
for line in statList :
if (re.search("=" ,str(line))):
if (re.search('#IND',str(line))):
print("ERREUR")
else:
result = re.findall("=\s*?(\d+\.\d+|\d+)", str(line))
print("result=" + result[0])
final_result.extend(result)
# final_result.append(result[0])
floats_list = list(map(float, final_result))
logs_list = list(map(log, floats_list))

Related

How to increase the speed of CSV data matching?

I have a scripts that parse two CSV files and compares the first column from one file with the second column from another file. The problem is those files are big and it takes some time to finish the process. The question is how to improve the speed? I tried to use yield from lines before the for cycle but the problem is then I have convert lines[1:] to list(lines[1:]) as result it makes no sense.
def pk():
with open('way/to/first.csv') as csv_file:
lines = csv_file.readlines()
full_list = []
for line in lines[1:]:
array = line.split(',')
list_pk = array[0].replace('"', '')
full_list.append(list_pk)
return full_list
def fk():
with open('way/to/second.csv') as csv_file:
lines = csv_file.readlines()
full_list = []
for line in lines[1:]:
array = line.split(',')
list_fk = array[1].replace('"', '')
full_list.append(list_fk)
return full_list
def res():
f = fk()
p = pk()
for i in f:
if i not in p:
raise AssertionError(f'{i} not found')
Try using python's "set difference" to find the elements in set A that do not have a match in set B:
def res():
fset = set(fk())
pset = set(pk())
print('items in F that are missing from P:')
print(fset - pset)

No output from sum of Python regex list

The problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.
MY CODE: in which sample.txt is my text file
import re
hand = open('sample.txt')
for line in hand:
line = line.rstrip()
x = re.findall('[0-9]+',line)
print x
x = [int(i) for i in x]
add = sum(x)
print add
OUTPUT:
You need to append the find results to another list. So that the number found on current line will be kept back when iterating over to the next line.
import re
hand = open('sample.txt')
l = []
for line in hand:
x = re.findall('[0-9]+',line)
l.extend(x)
j = [int(i) for i in l]
add = sum(j)
print add
or
with open('sample.txt') as f:
print sum(map(int, re.findall(r'\d+', f.read())))
try this
import re
hand = open("a.txt")
x=list()
for line in hand:
y = re.findall('[0-9]+',line)
x = x+y
sum=0
for z in x:
sum = sum + int(z)
print(sum)

Dictionaries overwriting in Python

This program is to take the grammar rules found in Binary.text and store them into a dictionary, where the rules are:
N = N D
N = D
D = 0
D = 1
but the current code returns D: D = 1, N:N = D, whereas I want N: N D, N: D, D:0, D:1
import sys
import string
#default length of 3
stringLength = 3
#get last argument of command line(file)
filename1 = sys.argv[-1]
#get a length from user
try:
stringLength = int(input('Length? '))
filename = input('Filename: ')
except ValueError:
print("Not a number")
#checks
print(stringLength)
print(filename)
def str2dict(filename="Binary.txt"):
result = {}
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
result[line[0]] = line
print (result)
return result
print (str2dict("Binary.txt"))
Firstly, your data structure of choice is wrong. Dictionary in python is a simple key-to-value mapping. What you'd like is a map from a key to multiple values. For that you'll need:
from collections import defaultdict
result = defaultdict(list)
Next, where are you splitting on '=' ? You'll need to do that in order to get the proper key/value you are looking for? You'll need
key, value = line.split('=', 1) #Returns an array, and gets unpacked into 2 variables
Putting the above two together, you'd go about in the following way:
result = defaultdict(list)
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
key, value = line.split('=', 1)
result[key.strip()].append(value.strip())
return result
Dictionaries, by definition, cannot have duplicate keys. Therefor there can only ever be a single 'D' key. You could, however, store a list of values at that key if you'd like. Ex:
from collections import defaultdict
# rest of your code...
result = defaultdict(list) # Use defaultdict so that an insert to an empty key creates a new list automatically
with open(filename, "r") as grammar:
#read file
lines = grammar.readlines()
count = 0
#loop through
for line in lines:
print(line)
result[line[0]].append(line)
print (result)
return result
This will result in something like:
{"D" : ["D = N D", "D = 0", "D = 1"], "N" : ["N = D"]}

Python: How do I save generator output into text file?

I am using the following generator to calculate a moving average:
import itertools
from collections import deque
def moving_average(iterable, n=50):
it = iter(iterable)
d = deque(itertools.islice(it, n-1))
d.appendleft(0)
s = sum(d)
for elem in it:
s += elem - d.popleft()
d.append(elem)
yield s / float(n)
I can print the generator output, but I can't figure out how to save that output into a text file.
x = (1,2,2,4,1,3)
avg = moving_average(x,2)
for value in avg:
print value
When I change the print line to write to a file, output is printed to the screen, a file is created but it stays empty.
Thanks in advance.
def generator(howmany):
for x in xrange(howmany):
yield x
g = generator(10)
with open('output.txt', 'w') as f:
for x in g:
f.write(str(x))
with open('output.txt', 'r') as f:
print f.readlines()
output:
>>>
['0123456789']

Compiling lines from file that are separated by a certain element . Python

File:
>1
ATTTTttttGGGG
ccCgCgGAgggGGT
gggggttttTTTTTTTTT
>2
ATcggGGGGGGA
>3
ATCGGGGGGATTT
gggggttAGTAttt
i'm constructing a function that reads files that have this format.
the format has multiple files embedded in it that are separated by '>'+the name (e.g. '>1','>2')
i'm trying to get the lines of text flanked by the '>' lines and compile them into one string per section
so this would look like
name_list = ['>1','>2','>3']
sequence_list = ['ATTTTttttGGGGccCgCgGAgggGGTgggggttttTTTTTTTTT','ATcggGGGGGGA','ATCGGGGGGATTTgggggttAGTAttt']
import os
import re
# Open File
in_file=open(FASTA,'r')
dir,file=os.path.split(FASTA)
temp = os.path.join(dir,output)
out_file=open(temp,'w')
# Generating lines
lines = []
name_list = []
seq_list = []
for line in in_file:
line = line.strip()
lines.append(line)
in_file.close()
indx = range(0,len(lines))
# Organizing the elements
for line in lines:
for i in line:
if i == '>':
name_list.append(line)
else:
break
I don't know what to do for the else: statement
I tried creating an index with range(0,len(lines))
so maybe i could do something where it finds '>' and compile all lines for the following indices until it finds the next '>' and adds them to the list called seq_list
any help would be greatly appreciated
You should take a look at Biopython that has a FASTA parser, but here's an example using the standard lib:
import re
with open('filename') as f:
print [i.replace('\n','') for i in re.split(r'\>\d+',f.read()) if i]
out:
['ATTTTttttGGGGccCgCgGAgggGGTgggggttttTTTTTTTTT',
'ATcggGGGGGGA',
'ATCGGGGGGATTTgggggttAGTAttt']
Using Biopython [sudo pip install biopython]:
from Bio import SeqIO
with open("example.fasta", "rU") as handle:
print list(SeqIO.parse(handle, "fasta"))
out:
[SeqRecord(seq=Seq('ATTTTttttGGGGccCgCgGAgggGGTgggggttttTTTTTTTTT', SingleLetterAlphabet()), id='1', name='1', description='1', dbxrefs=[]),
SeqRecord(seq=Seq('ATcggGGGGGGA', SingleLetterAlphabet()), id='2', name='2', description='2', dbxrefs=[]),
SeqRecord(seq=Seq('ATCGGGGGGATTTgggggttAGTAttt', SingleLetterAlphabet()), id='3', name='3', description='3', dbxrefs=[])]
A dictionary would make life easier:
>>> d = {}
>>> with open('t.txt') as f:
... for line in f:
... if line.startswith('>'):
... key = line.strip()
... if key not in d:
... d[key] = []
... else:
... d[key].append(line.strip())
...
>>> d
{'>1': ['ATTTTttttGGGG', 'ccCgCgGAgggGGT', 'gggggttttTTTTTTTTT'],
'>2': ['ATcggGGGGGGA'], '>3': ['ATCGGGGGGATTT', 'gggggttAGTAttt']}
>>> sequence_list = [''.join(k) for k in d.values()]
>>> sequence_list
['ATTTTttttGGGGccCgCgGAgggGGTgggggttttTTTTTTTTT',
'ATcggGGGGGGA', 'ATCGGGGGGATTTgggggttAGTAttt']

Categories