Access the elements of a list around the current element? - python

I am trying to figure out if it is possible to access the elements of a list around the element you are currently at. I have a list that is large (20k+ lines) and I want to find every instance of the string 'Name'. Additionally, I also want to get +/- 5 elements around each 'Name' element. So 5 lines before and 5 lines after. The code I am using is below.
search_string = 'Name'
with open('test.txt', 'r') as infile, open ('textOut.txt','w') as outfile:
for line in infile:
if search_string in line:
outfile.writelines([line, next(infile), next(infile),
next(infile), next(infile), next(infile)])
Getting the lines after the occurrence of 'Name' is pretty straightforward, but figuring out how to access the elements before it has me stumped. Anyone have an ideas?

20k lines isn't that much, if it's ok to read all of them in a list, we can take slices around the index where a match is found, like this:
with open('test.txt', 'r') as infile, open('textOut.txt','w') as outfile:
lines = [line.strip() for line in infile.readlines()]
n = len(lines)
for i in range(n):
if search_string in lines[i]:
start = max(0, i - 5)
end = min(n, i + 6)
outfile.writelines(lines[start:end])

You can use the function enumerate that allows you to iterate through both elements and indexes.
Example to access elements 5 indexes before and after your current element :
n = len(l)
for i, x in enumerate(l):
print(l[max(i-5, 0)]) # Prevent picking last elements of iterable by using negative indexes
print(x)
print(l[min(i+5, n-1)]) # Prevent overflow

You need to keep track of the index of where in the list you currently are
So something like:
# Read the file into list_of_lines
index = 0
while index < len(list_of_lines):
if list_of_lines[index] == 'Name':
print(list_of_lines[index - 1]) # This is the previous line
print(list_of_lines[index + 1]) # This is the next line
# And so on...
index += 1

Let's say you have your lines stored in your list:
lines = ['line1', 'line2', 'line3', 'line4', 'line5', 'line6', 'line7', 'line8', 'line9']
You could define a method returning elements grouped by n consecutives, as a generator:
def each_cons(iterable, n = 2):
if n < 2: n = 1
i, size = 0, len(iterable)
while i < size-n+1:
yield iterable[i:i+n]
i += 1
Teen, just call the method. To show the content I'm calling list on it, but you can iterate over it:
lines_by_3_cons = each_cons(lines, 3) # or any number of lines, 5 in your case
print(list(lines_by_3_cons))
#=> [['line1', 'line2', 'line3'], ['line2', 'line3', 'line4'], ['line3', 'line4', 'line5'], ['line4', 'line5', 'line6'], ['line5', 'line6', 'line7'], ['line6', 'line7', 'line8'], ['line7', 'line8', 'line9']]

I personally loved that problem. All guys here are doing it by taking the whole file into memory. I think I wrote a memory efficient code.
Here, check this out!
myfile = open('infile.txt')
stack_print_moments = []
expression = 'MYEXPRESSION'
neighbourhood_size = 5
def print_stack(stack):
for line in stack:
print(line.strip())
print('-----')
current_stack = []
for index, line in enumerate(myfile):
current_stack.append(line)
if len(current_stack) > 2 * neighbourhood_size + 1:
current_stack.pop(0)
if expression in line:
stack_print_moments.append(index + neighbourhood_size)
if index in stack_print_moments:
print_stack(current_stack)
last_index = index
for index in range(last_index, last_index + neighbourhood_size + 1):
if index in stack_print_moments:
print_stack(current_stack)
current_stack.pop(0)
More advanced code is here: Github link

Related

How to put a group of integers in a row in a text file into a list?

I have a text file composed mostly of numbers something like this:
3 011236547892X
9 02321489764 Q
4 031246547873B
I would like to extract each of the following (spaces 5 to 14 (counting from zero)) into a list:
1236547892
321489764
1246547873
(Please note: each "number" is 10 "characters" long - the second row has a space at the end.)
and then perform analysis on the contents of each list.
I have umpteen versions, however I think I am closest with:
with open('k_d_m.txt') as f:
for line in f:
range = line.split()
num_lst = [x for x in range(3,10)]
print(num_lst)
However I have: TypeError: 'list' object is not callable
What is the best way forward?
What I want to do with num_lst is, amongst other things, as follows:
num_lst = list(map(int, str(num)))
print(num_lst)
nth = 2
odd_total = sum(num_lst[0::nth])
even_total = sum(num_lst[1::nth])
print(odd_total)
print(even_total)
if odd_total - even_total == 0 or odd_total - even_total == 11:
print("The number is ok")
else:
print("The number is not ok")
Use a simple slice:
with open('k_d_m.txt') as f:
num_lst = [x[5:15] for x in f]
Response to comment:
with open('k_d_m.txt') as f:
for line in f:
num_lst = list(line[5:15])
print(num_lst)
First of all, you shouldn't name your variable range, because that is already taken for the range() function. You can easily get the 5 to 14th chars of a string using string[5:15]. Try this:
num_lst = []
with open('k_d_m.txt') as f:
for line in f:
num_lst.append(line[5:15])
print(num_lst)

Splitting an array into two arrays in Python

I have a list of numbers like so;
7072624 through 7072631
7072672 through 7072687
7072752 through 7072759
7072768 through 7072783
The below code is what I have so far, i've removed the word "through" and it now prints a list of numbers.
import os
def file_read(fname):
content_array = []
with open (fname) as f:
for line in f:
content_array.append(line)
#print(content_array[33])
#new_content_array = [word for line in content_array[33:175] for word in line.split()]
new_content_array = [word for line in content_array[33:37] for word in line.split()]
while 'through' in new_content_array: new_content_array.remove('through')
print(new_content_array)
file_read('numbersfile.txt')
This gives me the following output.
['7072624', '7072631', '7072672', '7072687', '7072752', '7072759', '7072768', '7072783']
So what I'm wanting to do but struggling to find is how to split the 'new_content_array' into two arrays so the output is as follows.
array1 = [7072624, 7072672, 7072752, 7072768]
array2 = [7072631, 7072687, 7072759, 7072783]
I then want to be able to take each value in array 2 from the value in array 1
7072631 - 7072624
7072687 - 7072672
7072759 - 7072752
7072783 - 7072768
I've been having a search but can't find anything similar to my situation.
Thanks in advance!
Try this below:
list_data = ['7072624', '7072631', '7072672', '7072687', '7072752', '7072759', '7072768', '7072783']
array1 = [int(list_data[i]) for i in range(len(list_data)) if i % 2 == 0]
array2 = [int(list_data[i]) for i in range(len(list_data)) if i % 2 != 0]
l = ['7072624', '7072631', '7072672', '7072687', '7072752', '7072759','7072768', '7072783']
l1 = [l[i] for i in range(len(l)) if i % 2 == 0]
l2 = [l[i] for i in range(len(l)) if i % 2 == 1]
print(l1) # ['7072624', '7072672', '7072752', '7072768']
print(l2) # ['7072631', '7072687', '7072759', '7072783']
result = list(zip(l1,l2))
As a result you will get:
[('7072624', '7072631'),
('7072672', '7072687'),
('7072752', '7072759'),
('7072768', '7072783')]
I think that as comprehension list, but you could also use filter
You could try to split line using through keyword,
then removing all non numeric chars such as new line or space using a lambda function and regex inside a list comprehension
import os
import re
def file_read(fname):
new_content_array = []
with open (fname) as f:
for line in f:
line_array = line.split('through')
new_content_array.append([(lambda x: re.sub(r'[^0-9]', "", x))(element) for element in line_array])
print(new_content_array)
file_read('numbersfile.txt')
Output looks like this:
[['7072624', '7072631'], ['7072672', '7072687'], ['7072752', '7072759'], ['7072768', '7072783']]
Then you just could extract first element of each nested list to store separately in a variable and so on with second element.
Good luck

Find double lines; a faster way

This is what I do to find all double lines in a textfile
import regex #regex is as re
#capture all lines in buffer
r = f.readlines()
#create list of all linenumbers
lines = list(range(1,endline+1))
#merge both lists
z=[list(a) for a in zip(r, lines)]
#sort list
newsorting = sorted(z)
#put doubles in list
listdoubles = []
for i in range(0,len(newsorting)-1):
if (i+1) <= len(newsorting):
if (newsorting[i][0] == newsorting[i+1][0]) and (not regex.search('^\s*$',newsorting[i][0])):
listdoubles.append(newsorting[i][1])
listdoubles.append(newsorting[i+1][1])
#remove event. double linenumbers
listdoubles = list(set(listdoubles))
#sort line numeric
listdoubles = sorted(listdoubles, key=int)
print(listdoubles)
But it is very slow. When I have over 10.000 lines it takes 10 seconds to create this list.
Is there a way to do it faster?
You can use a simpler approach:
for each line
if it has been seen before then display it
else add it to the set of known lines
In code:
seen = set()
for L in f:
if L in seen:
print(L)
else:
seen.add(L)
If you want to display the line numbers where duplicates are appearing the code can be simply changed to use a dictionary mapping line content to the line number its text has been seen for the first time:
seen = {}
for n, L in enumerate(f):
if L in seen:
print("Line %i is a duplicate of line %i" % (n, seen[L]))
else:
seen[L] = n
Both dict and set in Python are based on hashing and provide constant-time lookup operations.
EDIT
If you need only the line numbers of last duplicate of a line then the output clearly cannot be done during the processing but you will have first to process the whole input before emitting any output...
# lastdup will be a map from line content to the line number the
# last duplicate was found. On first insertion the value is None
# to mark the line is not a duplicate
lastdup = {}
for n, L in enumerate(f):
if L in lastdup:
lastdup[L] = n
else:
lastdup[L] = None
# Now all values that are not None are the last duplicate of a line
result = sorted(x for x in lastdup.values() if x is not None)

Making list of adjacent node pairs from Cube-formatted line file (using Python)

My files are formatted like this:
LINE NAME="FirstLine", MODE=15, ONEWAY=T, HEADWAY[1]=20, HEADWAY[2]=30,
HEADWAY[3]=20, HEADWAY[4]=30, HEADWAY[5]=30, VEHICLETYPE=2,
XYSPEED=20, N=-20609, -22042, -20600, 20601, 22839, 22838,
-20602, -20607, -20606, -20605, -20896, -20895, -20897, 20898,
-20899, -20905, -20906, -20910, 21104, -20911, -20912, 25065,
-21375
LINE NAME="SecondLine", MODE=15, ONEWAY=T, HEADWAY[1]=25, HEADWAY[2]=35,
[ETC]
I need to extract the lists of numbers that come after N= (one list for each N=), get rid of the minus-signs, and append each pair of adjacent numbers (e.g. [[20609, 22042], [22042, 20600]]) into a list of pairs. The major sticking part for Python-noob me is just extracting the lists of numbers as the first step (i.e. making what comes after each N= a list of its own).
If Python lists aren't ordered, I may have to make the lists strings and write each one as a line in a new file.
I was able to solve this by using the find method for LINE and N=. Finding LINE would increase an index and make a new item in a dictionary corresponding to that index. Finding N= would give the "definition" to that item in the dictionary -- a list with a single string element. Then for each item in the dictionary, I stripped spaces, replaced the - with '' (i.e. nothing), and used the split method with argument ',' to cut up the lists.
Then I zipped those lists Li[:-1] into themselves Li[1:] to get the adjacent-node pairs I needed.
Probably no one will ever find this useful (and I know it's probably convoluted), but here's my code:
with open(path + filename) as f:
i = 0
L = {}
for line in f:
existL = line.find("LINE")
existN = line.find("N=")
if existL > -1:
i = i + 1
L["Line" + str(i)] = []
if existN > -1:
go = 0
while go == 0:
txtNodes = line[line.rfind('=')+1:].strip()
nodes = txtNodes.split(',')
for node in nodes:
node = node.strip()
node = node.replace('-','')
if len(node) > 3:
L["Line" + str(i)].append(node)
try:
line = f.next()
if line.find("LINE") > -1:
go = go + 1
i = i + 1
L["Line" + str(i)] = []
except:
go = go + 1
Li = []
while i > 1:
L1 = L["Line" + str(i)][:-1]
L2 = L["Line" + str(i)][1:]
Lx = zip(L1,L2)
i = i-1
Li.extend(Lx)
I hate when people come to forums and don't follow up, so here's my follow-up. Sorry for posting in the wrong place initially.

How to rearrange numbers from different lines of a text file in python?

So I have a text file consisting of one column, each column consist two numbers
190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893
I would like to discard the very 1st and the very last number, in this case, 190 and 9893.
and basically moves the rest of the numbers one spot forward. like this
My desired output
255..337
2799..2801
3733..3734
5020..5234
5530..5683
6459..8238
9191..9306
I hope that makes sense I'm not sure how to approach this
lines = """190..255
337..2799
2801..3733"""
values = [int(v) for line in lines.split() for v in line.split('..')]
# values = [190, 255, 337, 2799, 2801, 3733]
pairs = zip(values[1:-1:2], values[2:-1:2])
# pairs = [(255, 337), (2799, 2801)]
out = '\n'.join('%d..%d' % pair for pair in pairs)
# out = "255..337\n2799..2801"
Try this:
with open(filename, 'r') as f:
lines = f.readlines()
numbers = []
for row in lines:
numbers.extend(row.split('..'))
numbers = numbers[1:len(numbers)-1]
newLines = ['..'.join(numbers[idx:idx+2]) for idx in xrange(0, len(numbers), 2]
with open(filename, 'w') as f:
for line in newLines:
f.write(line)
f.write('\n')
Try this:
Read all of them into one list, split each line into two numbers, so you have one list of all your numbers.
Remove the first and last item from your list
Write out your list, two items at a time, with dots in between them.
Here's an example:
a = """190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893"""
a_list = a.replace('..','\n').split()
b_list = a_list[1:-1]
b = ''
for i in range(len(a_list)/2):
b += '..'.join(b_list[2*i:2*i+2]) + '\n'
temp = []
with open('temp.txt') as ofile:
for x in ofile:
temp.append(x.rstrip("\n"))
for x in range(0, len(temp) - 1):
print temp[x].split("..")[1] +".."+ temp[x+1].split("..")[0]
x += 1
Maybe this will help:
def makeColumns(listOfNumbers):
n = int()
while n < len(listOfNumbers):
print(listOfNumbers[n], '..', listOfNumbers[(n+1)])
n += 2
def trim(listOfNumbers):
listOfNumbers.pop(0)
listOfNumbers.pop((len(listOfNumbers) - 1))
listOfNumbers = [190, 255, 337, 2799, 2801, 3733, 3734, 5020, 5234, 5530, 5683, 6459, 8238, 9191, 9306, 9893]
makeColumns(listOfNumbers)
print()
trim(listOfNumbers)
makeColumns(listOfNumbers)
I think this might be useful too. I am reading data from a file name list.
data = open("list","r")
temp = []
value = []
print data
for line in data:
temp = line.split("..")
value.append(temp[0])
value.append(temp[1])
for i in range(1,(len(value)-1),2):
print value[i].strip()+".."+value[i+1]
print value
After reading the data I split and store it in the temporary list.After that, I copy data to the main list value which have all of the data.Then I iterate from the second element to second last element to get the output of interest. strip function is used in order to remove the '\n' character from the value.
You can later write these values to a file Instead of printing out.

Categories