Splitting an array into two arrays in Python - python

I have a list of numbers like so;
7072624 through 7072631
7072672 through 7072687
7072752 through 7072759
7072768 through 7072783
The below code is what I have so far, i've removed the word "through" and it now prints a list of numbers.
import os
def file_read(fname):
content_array = []
with open (fname) as f:
for line in f:
content_array.append(line)
#print(content_array[33])
#new_content_array = [word for line in content_array[33:175] for word in line.split()]
new_content_array = [word for line in content_array[33:37] for word in line.split()]
while 'through' in new_content_array: new_content_array.remove('through')
print(new_content_array)
file_read('numbersfile.txt')
This gives me the following output.
['7072624', '7072631', '7072672', '7072687', '7072752', '7072759', '7072768', '7072783']
So what I'm wanting to do but struggling to find is how to split the 'new_content_array' into two arrays so the output is as follows.
array1 = [7072624, 7072672, 7072752, 7072768]
array2 = [7072631, 7072687, 7072759, 7072783]
I then want to be able to take each value in array 2 from the value in array 1
7072631 - 7072624
7072687 - 7072672
7072759 - 7072752
7072783 - 7072768
I've been having a search but can't find anything similar to my situation.
Thanks in advance!

Try this below:
list_data = ['7072624', '7072631', '7072672', '7072687', '7072752', '7072759', '7072768', '7072783']
array1 = [int(list_data[i]) for i in range(len(list_data)) if i % 2 == 0]
array2 = [int(list_data[i]) for i in range(len(list_data)) if i % 2 != 0]

l = ['7072624', '7072631', '7072672', '7072687', '7072752', '7072759','7072768', '7072783']
l1 = [l[i] for i in range(len(l)) if i % 2 == 0]
l2 = [l[i] for i in range(len(l)) if i % 2 == 1]
print(l1) # ['7072624', '7072672', '7072752', '7072768']
print(l2) # ['7072631', '7072687', '7072759', '7072783']
result = list(zip(l1,l2))
As a result you will get:
[('7072624', '7072631'),
('7072672', '7072687'),
('7072752', '7072759'),
('7072768', '7072783')]
I think that as comprehension list, but you could also use filter

You could try to split line using through keyword,
then removing all non numeric chars such as new line or space using a lambda function and regex inside a list comprehension
import os
import re
def file_read(fname):
new_content_array = []
with open (fname) as f:
for line in f:
line_array = line.split('through')
new_content_array.append([(lambda x: re.sub(r'[^0-9]', "", x))(element) for element in line_array])
print(new_content_array)
file_read('numbersfile.txt')
Output looks like this:
[['7072624', '7072631'], ['7072672', '7072687'], ['7072752', '7072759'], ['7072768', '7072783']]
Then you just could extract first element of each nested list to store separately in a variable and so on with second element.
Good luck

Related

How to put a group of integers in a row in a text file into a list?

I have a text file composed mostly of numbers something like this:
3 011236547892X
9 02321489764 Q
4 031246547873B
I would like to extract each of the following (spaces 5 to 14 (counting from zero)) into a list:
1236547892
321489764
1246547873
(Please note: each "number" is 10 "characters" long - the second row has a space at the end.)
and then perform analysis on the contents of each list.
I have umpteen versions, however I think I am closest with:
with open('k_d_m.txt') as f:
for line in f:
range = line.split()
num_lst = [x for x in range(3,10)]
print(num_lst)
However I have: TypeError: 'list' object is not callable
What is the best way forward?
What I want to do with num_lst is, amongst other things, as follows:
num_lst = list(map(int, str(num)))
print(num_lst)
nth = 2
odd_total = sum(num_lst[0::nth])
even_total = sum(num_lst[1::nth])
print(odd_total)
print(even_total)
if odd_total - even_total == 0 or odd_total - even_total == 11:
print("The number is ok")
else:
print("The number is not ok")
Use a simple slice:
with open('k_d_m.txt') as f:
num_lst = [x[5:15] for x in f]
Response to comment:
with open('k_d_m.txt') as f:
for line in f:
num_lst = list(line[5:15])
print(num_lst)
First of all, you shouldn't name your variable range, because that is already taken for the range() function. You can easily get the 5 to 14th chars of a string using string[5:15]. Try this:
num_lst = []
with open('k_d_m.txt') as f:
for line in f:
num_lst.append(line[5:15])
print(num_lst)

regex - counting by unique pattern (HH:MM:__)

i want to count the unique HH:MM:xx(eg. 11:11:00, 11:12:00, 11:12:11) using regex. so far i am only able to count the total of HH:MM:SS in the text. not sure how to continue from here.. this are my codes
pattern = re.compile("(\d{2}):(\d{2}):(\d{2})") #capture all the pattern with HH:MM:SS
path = r'C:\Users\CL\Desktop\abc.txt'
list1 = [] # to store values in list
for line in open(path,'r'):
for match in re.finditer(pattern, line): #draw 11:11:00, 11:12:00, 11:12:11
list1.append(line) #append to a list
total = len(list1) #sum list
print(total) #3
sample text
11:11:00
abc
11:12:00
abc
11:12:11
abc
the desired output should be 2 (unique values - 11:11:xx and 11:12:xx)
see below (data1.txt is your data)
from collections import defaultdict
data = defaultdict(int)
with open('data1.txt') as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
if line.count(':') == 2:
data[line[:5]] += 1
print(data)
output
defaultdict(<class 'int'>, {'11:11': 1, '11:12': 2})
You could use re.findall here, followed by a list comprehension to remove duplicates:
with open(path, 'r') as file:
data = file.read()
ts = re.findall(r'(\d{2}:\d{2}):\d{2}', data)
res = []
[res.append(x) for x in ts if x not in res]
print(len(res))
If you only want to count the number of occurences you can simply:
txtfile = open("C:\Users\CL\Desktop\abc.txt", "r")
filetext = txtfile.read()
txtfile.close()
list1 = set(re.findall("(\d{2}:\d{2}):\d{2}",filetext))
total = len(list1) #sum list
print(total) #3
You can use parentheses to specify what you wan to capture (the HH:MM). Then you can use set to remove duplicates.
Have you tried using a set instead of a list?
pattern = re.compile("(\d{2}):(\d{2}):(\d{2})")
path = r'C:\Users\CL\Desktop\abc.txt'
s = set() # use a set instead of a list, to avoid duplicates
for line in open(path,'r'):
for match in re.finditer(pattern, line):
s.add(line[:-3]) #insert into set
total = len(s) #number of elements in s
print(total) #2
This way, if you try to insert an element you've already seen, we won't have multiple copies of it stored, since sets don't allow duplicates.
EDIT: As commented, we are not supposed to include seconds here, which I mistakenly did originally. Fixed now.

Access the elements of a list around the current element?

I am trying to figure out if it is possible to access the elements of a list around the element you are currently at. I have a list that is large (20k+ lines) and I want to find every instance of the string 'Name'. Additionally, I also want to get +/- 5 elements around each 'Name' element. So 5 lines before and 5 lines after. The code I am using is below.
search_string = 'Name'
with open('test.txt', 'r') as infile, open ('textOut.txt','w') as outfile:
for line in infile:
if search_string in line:
outfile.writelines([line, next(infile), next(infile),
next(infile), next(infile), next(infile)])
Getting the lines after the occurrence of 'Name' is pretty straightforward, but figuring out how to access the elements before it has me stumped. Anyone have an ideas?
20k lines isn't that much, if it's ok to read all of them in a list, we can take slices around the index where a match is found, like this:
with open('test.txt', 'r') as infile, open('textOut.txt','w') as outfile:
lines = [line.strip() for line in infile.readlines()]
n = len(lines)
for i in range(n):
if search_string in lines[i]:
start = max(0, i - 5)
end = min(n, i + 6)
outfile.writelines(lines[start:end])
You can use the function enumerate that allows you to iterate through both elements and indexes.
Example to access elements 5 indexes before and after your current element :
n = len(l)
for i, x in enumerate(l):
print(l[max(i-5, 0)]) # Prevent picking last elements of iterable by using negative indexes
print(x)
print(l[min(i+5, n-1)]) # Prevent overflow
You need to keep track of the index of where in the list you currently are
So something like:
# Read the file into list_of_lines
index = 0
while index < len(list_of_lines):
if list_of_lines[index] == 'Name':
print(list_of_lines[index - 1]) # This is the previous line
print(list_of_lines[index + 1]) # This is the next line
# And so on...
index += 1
Let's say you have your lines stored in your list:
lines = ['line1', 'line2', 'line3', 'line4', 'line5', 'line6', 'line7', 'line8', 'line9']
You could define a method returning elements grouped by n consecutives, as a generator:
def each_cons(iterable, n = 2):
if n < 2: n = 1
i, size = 0, len(iterable)
while i < size-n+1:
yield iterable[i:i+n]
i += 1
Teen, just call the method. To show the content I'm calling list on it, but you can iterate over it:
lines_by_3_cons = each_cons(lines, 3) # or any number of lines, 5 in your case
print(list(lines_by_3_cons))
#=> [['line1', 'line2', 'line3'], ['line2', 'line3', 'line4'], ['line3', 'line4', 'line5'], ['line4', 'line5', 'line6'], ['line5', 'line6', 'line7'], ['line6', 'line7', 'line8'], ['line7', 'line8', 'line9']]
I personally loved that problem. All guys here are doing it by taking the whole file into memory. I think I wrote a memory efficient code.
Here, check this out!
myfile = open('infile.txt')
stack_print_moments = []
expression = 'MYEXPRESSION'
neighbourhood_size = 5
def print_stack(stack):
for line in stack:
print(line.strip())
print('-----')
current_stack = []
for index, line in enumerate(myfile):
current_stack.append(line)
if len(current_stack) > 2 * neighbourhood_size + 1:
current_stack.pop(0)
if expression in line:
stack_print_moments.append(index + neighbourhood_size)
if index in stack_print_moments:
print_stack(current_stack)
last_index = index
for index in range(last_index, last_index + neighbourhood_size + 1):
if index in stack_print_moments:
print_stack(current_stack)
current_stack.pop(0)
More advanced code is here: Github link

Populating python matrix

I'm doing the splitting of the words from the text file in python. I've receive the number of row (c) and a dictionary (word_positions) with index. Then I create a zero matrix (c, index). Here is the code:
from collections import defaultdict
import re
import numpy as np
c=0
f = open('/Users/Half_Pint_Boy/Desktop/sentenses.txt', 'r')
for line in f:
c = c + 1
word_positions = {}
with open('/Users/Half_Pint_Boy/Desktop/sentenses.txt', 'r') as f:
index = 0
for word in re.findall(r'[a-z]+', f.read().lower()):
if word not in word_positions:
word_positions[word] = index
index += 1
print(word_positions)
matrix=np.zeros(c,index)
My question: How can I populate the matrix to be able to get this: matrix[c,index] = count, where c - is the number of row, index -the indexed position and count -the number of counted words in a row
Try next:
import re
import numpy as np
from itertools import chain
text = open('/Users/Half_Pint_Boy/Desktop/sentenses.txt')
text_list = text.readlines()
c=0
for i in range(len(text_list)):
c=c+1
text_niz = []
for i in range(len(text_list)):
text_niz.append(text_list[i].lower()) # перевел к нижнему регистру
slovo = []
for j in range(len(text_niz)):
slovo.append(re.split('[^a-z]', text_niz[j])) # токенизация
for e in range(len(slovo)):
while slovo[e].count('') != 0:
slovo[e].remove('') # удалил пустые слова
slovo_list = list(chain(*slovo))
print (slovo_list) # составил список слов
slovo_list=list(set(slovo_list)) # удалил повторяющиеся
x=len(slovo_list)
s = []
for i in range(len(slovo)):
for j in range(len(slovo_list)):
s.append(slovo[i].count(slovo_list[j])) # посчитал количество слов в каждом предложении
matr = np.array(s) # матрица вхождений слов в предложения
d = matr.reshape((c, x)) # преобразовал в матрицу 22*254
It looks like you are trying to create something similar to an n-dimensional list. these are achieved by nesting lists inside themselves as such:
two_d_list = [[0, 1], [1, 2], [example, blah, blah blah]]
words = two_d_list[2]
single_word = two_d_list[2][1] # Notice the second index operator
This concept is very flexible in Python and can also be done with a dictionary nested inside as you would like:
two_d_list = [{"word":1}, {"example":1, "blah":3}]
words = two_d_list[1] # type(words) == dict
single_word = two_d_list[2]["example"] # Similar index operator, but for the dictionary
This achieves what you would like, functionally, but does not use the syntax matrix[c,index], however this syntax does not really exist in python for indexing. Commas within square-brackets usually delineate the elements of list literals. Instead you can access the row's dictionary's element with matrix[c][index] = count
You may be able to overload the index operator to achieve the syntx you want. Here is a question about achieving the syntax you desire. In summary:
Overload the __getitem__(self, inex) function in a wrapper of the list class and set the function to accept a tuple. The tuple can be created without parenthesis, giving the syntax matrix[c, index] = count

Sum of everything inside a list [duplicate]

This question already has answers here:
Sum of strings extracted from text file using regex
(10 answers)
Closed 6 years ago.
I am extracting numbers from sentences within a large file. Example sentence (in file finalsum.txt):
the cat in the 42 hat and the cow 1772 jumps over the moon.
I then use regular expressions to create a list where I want to add together 1772 + 42 for the entire file creating a final sum of all the numbers.
Here is my current code :
import re
import string
fname = raw_input('Enter file name: ')
try:
if len(fname) < 1: fname = "finalsum.txt"
handle = open(fname, 'r')
except:
print 'Cannot open file:', fname
counts = dict()
numlist = list()
for line in handle:
line = line.rstrip()
x = re.findall('([0-9]+)', line)
if len(x) > 0:
result = map(int, x)
numlist.append(result)
print numlist
I know this code can be written in two lines using line comprehension, but I am just learning. Thank you!
You should not append the list but instead join the lists using + operator.
for line in handle:
line = line.rstrip()
x = re.findall('([0-9]+)', line)
if len(x) > 0:
result = map(int, x)
numlist+=result
#numlist.append(result)
print numlist
print sum(numlist)
Illustration:
>>> a = [1,2]
>>> b = [2,3]
>>> c = [4,5]
>>> a.append(b)
>>> a
[1, 2, [2, 3]]
>>> print b + c
[2, 3, 4, 5]
As you can see, append() appends list object at the end where as + joins the two lists.
Check this:
import re
line = 'the cat in the 42 hat and the cow 1772 jumps over the moon 11 11 11'
y = re.findall('([0-9]+)', line)
print sum([int(z) for z in y])
It does not create nested lists rather it adds all the value in a given list after converting it to integer. You can manage a temporary variable which will store the final sum and keep adding the sum of all the integers in the current line to this temporary variable.
Like this:
answer = 0
for line in handle:
line = line.rstrip()
x = re.findall('([0-9]+)', line)
if len(x) > 0:
answer += sum([int(z) for z in x])
print answer

Categories