Extract values between two markers - python

I have a text file with the following information.
document.txt:
z
-0.01,0.04,-0.04,0,-0.06,-0.08,0.04,0.03
z
0.1,-0.02,0.1,0.14,0.07,0.05,0.01
z
0.05,0.05,0.12,0.13,0.08,0.01,0.12,0.11
Essentially, I want my python program to extract the numbers between the markers z and add it to a list. So there will be 3 lists, the first will contain the numbers between the first and second z, the second list will contain the numbers between the second and third z etc...
Here is what I have so far that takes all the numbers, converts them to floats and puts it in a list. Now I need to split it into lists that only contain the numbers between the marker, z.
f = open(file_name)
contents = f.readlines()
myList = []
for line in contents:
line = line.split()
if 'z' not in line:
for j in line:
j = j.split(',')
for l in j:
l = float(l)
myList.append(l)

You can set the document.txt in a string and split it on "z". You get a list where each row is a value from that list. So you can make strings from that list and then split it again on ",". That should do the trick
#open text file in read mode
text_file = open("document.txt", "r")
#read whole file to a string
data = text_file.read()
#close file
text_file.close()
#split the string
data.split("z")
#remove the first empty value of the list
del data[0]
#since you wanted it in 3 separate list
l1 = data[0]
l2 = data[1]
l3 = data[2]
#split the other 3 lists now and done
l1.split(",")
l2.split(",")
l3.split(",")
this gives you the list you wanted.

Related

How to put a group of integers in a row in a text file into a list?

I have a text file composed mostly of numbers something like this:
3 011236547892X
9 02321489764 Q
4 031246547873B
I would like to extract each of the following (spaces 5 to 14 (counting from zero)) into a list:
1236547892
321489764
1246547873
(Please note: each "number" is 10 "characters" long - the second row has a space at the end.)
and then perform analysis on the contents of each list.
I have umpteen versions, however I think I am closest with:
with open('k_d_m.txt') as f:
for line in f:
range = line.split()
num_lst = [x for x in range(3,10)]
print(num_lst)
However I have: TypeError: 'list' object is not callable
What is the best way forward?
What I want to do with num_lst is, amongst other things, as follows:
num_lst = list(map(int, str(num)))
print(num_lst)
nth = 2
odd_total = sum(num_lst[0::nth])
even_total = sum(num_lst[1::nth])
print(odd_total)
print(even_total)
if odd_total - even_total == 0 or odd_total - even_total == 11:
print("The number is ok")
else:
print("The number is not ok")
Use a simple slice:
with open('k_d_m.txt') as f:
num_lst = [x[5:15] for x in f]
Response to comment:
with open('k_d_m.txt') as f:
for line in f:
num_lst = list(line[5:15])
print(num_lst)
First of all, you shouldn't name your variable range, because that is already taken for the range() function. You can easily get the 5 to 14th chars of a string using string[5:15]. Try this:
num_lst = []
with open('k_d_m.txt') as f:
for line in f:
num_lst.append(line[5:15])
print(num_lst)

Extracted float values are stored in a list of lists instead of a list of values

I am doing an exercise for finding all the float point values in a text file and computing the average .
I have managed to extract all the necessary values but they are being stored in a list of lists and I don't know how extract them as floats in order to do the calculations .
Here is my code :
import re
fname = input("Enter file name: ")
fhandle = open(fname)
x = []
count = 0
for line in fhandle:
if not line.startswith("X-DSPAM-Confidence:") : continue
s = re.findall(r"[-+]?\d*\.\d+|\d+", line)
x.append(s)
count = count + 1
print(x)
print("Done")
and this is the output of x :
[['0.8475'], ['0.6178'], ['0.6961'], ['0.7565'], ['0.7626'], ['0.7556'], ['0.7002'], ['0.7615'], ['0.7601'], ['0.7605'], ['0.6959'], ['0.7606'], ['0.7559'], ['0.7605'], ['0.6932'], ['0.7558'], ['0.6526'], ['0.6948'], ['0.6528'], ['0.7002'], ['0.7554'], ['0.6956'], ['0.6959'], ['0.7556'], ['0.9846'], ['0.8509'], ['0.9907']]
Done
You can make x a flat list of floats from the start:
# ...
for line in fhandle:
# ...
s = re.findall(r"[-+]?\d*\.\d+|\d+", line)
x.extend(map(float, s))
Note that re.findall returns a list, so we extend x by it while applying float to all the strings in it.

Combining lines using reg ex in python

If wanted to combine six lines (each containing 3 elements) so that the final outcome is a single line with three elements so that the first is the addition of all the first elements, the second is the addition of all the second elements and the third is the concatenation of all the third elements.
For example,
We have,
12.34 -79 x
-3.5 23 y
32.2E2 2 z
4.23e-10 +45 x
62E+2 -4 y
0.0 0 z
and we need
9428.84 -13 xyzxyz
Here is my current code:
f = open('data.txt', 'r')
""" opens the file """
import re
""" Imports the regular expressions module"""
# lines = f.readlines ()
lines = list(f)
""" Reads all the lines of the file """
p = re.compile(r'\s*^([-]?([1-9]\d|\d)[E|e]?[+\d]?(.)(\d+(E|e)[-]?\d+|\d+))\s*([-,+]?([1-9]\d+|\d))\s*([x|y|z])$')
for x in lines:
m = p.match(x)
if m:
print (x)
You can do this by zipping the contents of the file so that all number of the first column are on first list, all number of the second column on second list and finally all characters on the third list. Then all you do is simply sum the first two lists and join the third list that contains the characters:
sum1 = 0
sum2 = 0
finalStr = ""
with open("data.txt", "r") as infile:
lines = list(zip(*[line.split() for line in list(infile)]))
sum1 = sum(map(float,lines[0]))
sum2 = sum(map(float,lines[1]))
finalStr = "".join(lines[2])
# Some formatting for float numbers
print("{:.2f}".format(sum1), end=" ")
print("{:.0f}".format(sum2), end=" ")
print(finalStr)
Output:
9428.84 -13 xyzxyz
There is no need for a regex in your case. Regular expressions are used to deconstruct strings, not to combine them. If you do not mind using pandas, the solution takes two lines:
import pandas as pd
data = pd.read_table("data.txt", sep='\s+', header=None)
df.sum().values.tolist()
#[9428.840000000422, -13, 'xyzxyz']

Read file line by line and create lists

I have a file which contains lines of the form
2.484 5.234
6.123 1.461
1.400 9.381
I would like to read these into python lists x containing the first value of each line and y containing the second value of each line.
How can I achieve this? Here is my attempt:
x = []
y = []
with open(filename) as file_:
for line in file_:
a, b = line
x.append(a)
y.append(b)
a, b = line
cannot work because you're trying to unpack a string into 2 elements (unless the string itself is 2 elements long, which isn't the case)
you want to convert to float & unpack the splitted line like this:
a, b = map(float,line.split())
in that case split() without arguments takes care of multiple spaces, linefeeds, tabs... like awk would do so it's pretty easy.
You can try this:
data =[map(float, i.strip('\n').split()) for i in open('filename.txt')]
You can do that
x = []
y = []
with open("file.txt", "r") as ins:
for line in ins:
elt = line.split()
x.append(elt[0])
y.append(elt[1])
print x
print y

How to rearrange numbers from different lines of a text file in python?

So I have a text file consisting of one column, each column consist two numbers
190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893
I would like to discard the very 1st and the very last number, in this case, 190 and 9893.
and basically moves the rest of the numbers one spot forward. like this
My desired output
255..337
2799..2801
3733..3734
5020..5234
5530..5683
6459..8238
9191..9306
I hope that makes sense I'm not sure how to approach this
lines = """190..255
337..2799
2801..3733"""
values = [int(v) for line in lines.split() for v in line.split('..')]
# values = [190, 255, 337, 2799, 2801, 3733]
pairs = zip(values[1:-1:2], values[2:-1:2])
# pairs = [(255, 337), (2799, 2801)]
out = '\n'.join('%d..%d' % pair for pair in pairs)
# out = "255..337\n2799..2801"
Try this:
with open(filename, 'r') as f:
lines = f.readlines()
numbers = []
for row in lines:
numbers.extend(row.split('..'))
numbers = numbers[1:len(numbers)-1]
newLines = ['..'.join(numbers[idx:idx+2]) for idx in xrange(0, len(numbers), 2]
with open(filename, 'w') as f:
for line in newLines:
f.write(line)
f.write('\n')
Try this:
Read all of them into one list, split each line into two numbers, so you have one list of all your numbers.
Remove the first and last item from your list
Write out your list, two items at a time, with dots in between them.
Here's an example:
a = """190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893"""
a_list = a.replace('..','\n').split()
b_list = a_list[1:-1]
b = ''
for i in range(len(a_list)/2):
b += '..'.join(b_list[2*i:2*i+2]) + '\n'
temp = []
with open('temp.txt') as ofile:
for x in ofile:
temp.append(x.rstrip("\n"))
for x in range(0, len(temp) - 1):
print temp[x].split("..")[1] +".."+ temp[x+1].split("..")[0]
x += 1
Maybe this will help:
def makeColumns(listOfNumbers):
n = int()
while n < len(listOfNumbers):
print(listOfNumbers[n], '..', listOfNumbers[(n+1)])
n += 2
def trim(listOfNumbers):
listOfNumbers.pop(0)
listOfNumbers.pop((len(listOfNumbers) - 1))
listOfNumbers = [190, 255, 337, 2799, 2801, 3733, 3734, 5020, 5234, 5530, 5683, 6459, 8238, 9191, 9306, 9893]
makeColumns(listOfNumbers)
print()
trim(listOfNumbers)
makeColumns(listOfNumbers)
I think this might be useful too. I am reading data from a file name list.
data = open("list","r")
temp = []
value = []
print data
for line in data:
temp = line.split("..")
value.append(temp[0])
value.append(temp[1])
for i in range(1,(len(value)-1),2):
print value[i].strip()+".."+value[i+1]
print value
After reading the data I split and store it in the temporary list.After that, I copy data to the main list value which have all of the data.Then I iterate from the second element to second last element to get the output of interest. strip function is used in order to remove the '\n' character from the value.
You can later write these values to a file Instead of printing out.

Categories