How to split numbers individually in python from a text file? - python

This question is somewhat the same about others but different on what I'm trying to interpret as i'm also new to python.
suppose that i have sample.txt:
123
456
321
780
both are separated by white space. but i wanted them to look like:
>> goal = '123456'
>> start = '456312'
and my start up code somehow looks like:
with open('input.txt') as f:
out = f.read().split()
print map(int, out)
which results to:
>> [123, 456, 456, 123]
which is different from what I''m trying to exert.

One thing you can do is loop through the file line by line, and if the line is empty then start a new string in the result list, otherwise append the line to the last element of the result list:
lst = ['']
with open('input.txt', 'r') as file:
for line in file:
line = line.rstrip()
if len(line) == 0:
lst.append('')
else:
lst[-1] += line
lst
# ['123456', '321780']

Splint on \n\n. (If this isn't exactly what you need, then modify to suit!)
>>> inp = '123\n456\n\n321\n780\n'
>>> [int(num.replace('\n', '')) for num in inp.split('\n\n')]
[123456, 321780]

Related

How to put a group of integers in a row in a text file into a list?

I have a text file composed mostly of numbers something like this:
3 011236547892X
9 02321489764 Q
4 031246547873B
I would like to extract each of the following (spaces 5 to 14 (counting from zero)) into a list:
1236547892
321489764
1246547873
(Please note: each "number" is 10 "characters" long - the second row has a space at the end.)
and then perform analysis on the contents of each list.
I have umpteen versions, however I think I am closest with:
with open('k_d_m.txt') as f:
for line in f:
range = line.split()
num_lst = [x for x in range(3,10)]
print(num_lst)
However I have: TypeError: 'list' object is not callable
What is the best way forward?
What I want to do with num_lst is, amongst other things, as follows:
num_lst = list(map(int, str(num)))
print(num_lst)
nth = 2
odd_total = sum(num_lst[0::nth])
even_total = sum(num_lst[1::nth])
print(odd_total)
print(even_total)
if odd_total - even_total == 0 or odd_total - even_total == 11:
print("The number is ok")
else:
print("The number is not ok")
Use a simple slice:
with open('k_d_m.txt') as f:
num_lst = [x[5:15] for x in f]
Response to comment:
with open('k_d_m.txt') as f:
for line in f:
num_lst = list(line[5:15])
print(num_lst)
First of all, you shouldn't name your variable range, because that is already taken for the range() function. You can easily get the 5 to 14th chars of a string using string[5:15]. Try this:
num_lst = []
with open('k_d_m.txt') as f:
for line in f:
num_lst.append(line[5:15])
print(num_lst)

How do I compare a value in one line to a value in another line?

I have a file that puts out lines that have two values each. I need to compare the second value in every line to make sure those values are not repeated more than once. I'm very new to coding so any help is appreciated.
My thinking was to turn each line into a list with two items each, and then I could compare the same position from a couple lists.
This is a sample of what my file contains:
20:19:18 -1.234567890
17:16:15 -1.098765432
14:13:12 -1.696969696
11:10:09 -1.696969696
08:07:06 -1.696969696
Here's the code I'm trying to use. Basically I want it to ignore those first two lines and print out the third line, since it gets repeated more than once:
with open('my_file') as txt:
for line in txt: #this section turns the file into lists
linelist = '%s' % (line)
lista = linelist.split(' ')
n = 1
for line in lista:
listn = line[n]
listo = line[n + 1]
listp = line[n + 2]
if listn[1] == listo[1] and listn[1] == listp[1]:
print line
else:
pass
n += 1
What I want to see is:
14:13:12 -1.696969696
But I keep getting an error on the long if statement of "string index out of range"
You would be a lot better off using a dictionary type structure. Dictionary allows you to quickly check for existence.
Basically check if the 2nd value is a key in your dict. If a key then print the line. Else just add the 2nd value as a key for later.
myDict = {}
with open('/home/dmoraine/pylearn/%s' % (file)) as txt:
for line in txt:
key = line.split()[1]
if key in myDict:
print(line)
else:
myDict[key] = None #value doesn't matter
Some simple debugging highlights the functional problem:
with open('my_file.txt') as txt:
for line in txt: #this section turns the file into lists
linelist = '%s' % (line)
lista = linelist.split(' ')
print(linelist, lista)
n = 1
for line in lista:
print("line", n, ":\t", line)
listn = line[n]
listo = line[n + 1]
listp = line[n + 2]
print(listn, '|',listo, '|',listp)
if listn[1] == listo[1] and listn[1] == listp[1]:
print(line)
n += 1
Output:
20:19:18 -1.234567890
['20:19:18', '-1.234567890\n']
17:16:15 -1.098765432
['17:16:15', '-1.098765432\n']
14:13:12 -1.696969696
['14:13:12', '-1.696969696\n']
11:10:09 -1.696969696
['11:10:09', '-1.696969696\n']
08:07:06 -1.696969696
['08:07:06', '-1.696969696\n']
line 1 : 08:07:06
8 | : | 0
In short, you've mis-handled the variables. When you get to the second loop, lista is the "words" of the final line; you've read and discarded all of the others. line iterates through these individual words. Your listn/o/p variables are, therefore, individual characters. Thus, there is no such thing as listn[1], and you get an error.
Instead, you need to build some sort of list of the floating-point numbers. For instance, using your top loop as a starting point:
float_list = {}
for line in txt: #this section turns the file into lists
lista = line.split(' ')
my_float = float(lista[1]) # Convert the second field into a float
float_list.append(my_float)
Now you need to write code that will find duplicates in float_list. Can you take it from there?
Ended up turning each line into a list, and then making a dictionary of all the lists. Thank you all for your help.

How To Dynamically Append Some Mark Content to a File Object

I am trying to read a file, collect some lines, batch process them and then post process the result.
Example:
with open('foo') as input:
line_list = []
for line in input:
line_list.append(line)
if len(line_list) == 10:
result = batch_process(line_list)
# something to do with result here
line_list = []
if len(line_list) > 0: # very probably the total lines is not mutiple of 10 e.g. 11
result = batch_process(line_list)
# something to do with result here
I do not want to duplicate the batch invoking and post processing so I want to know if could dynamically add some content to input, e.g.
with open('foo') as input:
line_list = []
# input.append("THE END")
for line in input:
if line != 'THE END':
line_list.append(line)
if len(line_list) == 10 or line == 'THE END':
result = batch_process(line_list)
# something to do with result here
line_list = []
So if in this case I cannot duplicate the code in if branch. Or if has any other better manner could know it's the last line?
If your input is not too large and fits comfortably in memory, you can read everything into a list, slice the list into sub-list of length 10 and loop over them.
k = 10
with open('foo') as input:
lines = input.readlines()
slices = [lines[i:i+k] for i in range(0, len(lines), k)]
for slice in slices:
batch_process(slice)
If you want to append a mark to the input lines, you also have to read all lines first.

"IndexError: list index out of range" when reading file

Just started learning Python and I'm struggling with this a little.
I'm opening a txt file that will be variable in length and I need to iterate over a user definable amount of lines at a time. When I get to the end of the file I receive the error in the subject field. I've also tried the readlines() function and a couple of variations on the "if" statement that causes the problem. I just can't seem to get the code to find EOF.
Hmm, as I write this, I'm thinking ... do I need to addlist "EOF" to the array and just look for that? Is that the best solution, to find a custom EOF?
My code snippet goes something like:
### variables defined outside of scapy PacketHandler ##
x = 0
B = 0
##########
with open('dict.txt') as f:
lines = list(f)
global x
global B
B = B + int(sys.argv[3])
while x <= B:
while y <= int(sys.argv[2]):
if lines[x] != "":
#...do stuff...
# Scapy send packet Dot11Elt(ID="SSID",info"%s" % (lines[x].strip())
# ....more code...
x = x 1
Let’s say you need to read X lines at a time, put it in a list and process it:
with open('dict.txt') as f:
enoughLines = True
while enoughLines:
lines = []
for i in range(X):
l = f.readline()
if l != '':
lines.append( l )
else:
enoughLines = False
break
if enoughLines:
#Do what has to be done with the list “lines”
else:
break
#Do what needs to be done with the list “lines” that has less than X lines in it
Try a for in loop. You have created your list, now iterate through it.
with open('dict.txt') as f:
lines = list(f)
for item in lines: #each item here is an item in the list you created
print(item)
this way you go through each line of your text file and don't have to worry about where it ends.
edit:
you can do this as well!
with open('dict.txt') as f:
for row in f:
print(row)
The following function will return a generator that returns the next n lines in a file:
def iter_n(obj, n):
iterator = iter(obj)
while True:
result = []
try:
while len(result) < n:
result.append(next(iterator))
except StopIteration:
if len(result) == 0:
raise
yield result
Here is how you can use it:
>>> with open('test.txt') as f:
... for three_lines in iter_n(f, 3):
... print three_lines
...
['first line\n', 'second line\n', 'third line\n']
['fourth line\n', 'fifth line\n', 'sixth line\n']
['seventh line\n']
Contents of test.txt:
first line
second line
third line
fourth line
fifth line
sixth line
seventh line
Note that, because the file does not have a multiple of 3 lines, the last value returned is not 3 lines, but just the rest of the file.
Because this solution uses a generator, it doesn't require that the full file be read into memory (into a list), but iterates over it as needed.
In fact, the above function can iterate over any iterable object, like lists, strings, etc:
>>> for three_numbers in iter_n([1, 2, 3, 4, 5, 6, 7], 3):
... print three_numbers
...
[1, 2, 3]
[4, 5, 6]
[7]
>>> for three_chars in iter_n("1234567", 3):
... print three_chars
...
['1', '2', '3']
['4', '5', '6']
['7']
If you want to get n lines in a list use itertools.islice yielding each list:
from itertools import islice
def yield_lists(f,n):
with open(f) as f:
for sli in iter(lambda : list(islice(f,n)),[]):
yield sli
If you want to use loops, you don't need a while loop at all, you can use an inner loop in range n-1 calling next on the file object with a default value of an empty string, if we get an empty string break the loop if not just append and again yield each list:
def yield_lists(f,n):
with open(f) as f:
for line in f:
temp = [line]
for i in range(n-1):
line = next(f,"")
if not line:
break
temp.append(line)
yield temp

How to rearrange numbers from different lines of a text file in python?

So I have a text file consisting of one column, each column consist two numbers
190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893
I would like to discard the very 1st and the very last number, in this case, 190 and 9893.
and basically moves the rest of the numbers one spot forward. like this
My desired output
255..337
2799..2801
3733..3734
5020..5234
5530..5683
6459..8238
9191..9306
I hope that makes sense I'm not sure how to approach this
lines = """190..255
337..2799
2801..3733"""
values = [int(v) for line in lines.split() for v in line.split('..')]
# values = [190, 255, 337, 2799, 2801, 3733]
pairs = zip(values[1:-1:2], values[2:-1:2])
# pairs = [(255, 337), (2799, 2801)]
out = '\n'.join('%d..%d' % pair for pair in pairs)
# out = "255..337\n2799..2801"
Try this:
with open(filename, 'r') as f:
lines = f.readlines()
numbers = []
for row in lines:
numbers.extend(row.split('..'))
numbers = numbers[1:len(numbers)-1]
newLines = ['..'.join(numbers[idx:idx+2]) for idx in xrange(0, len(numbers), 2]
with open(filename, 'w') as f:
for line in newLines:
f.write(line)
f.write('\n')
Try this:
Read all of them into one list, split each line into two numbers, so you have one list of all your numbers.
Remove the first and last item from your list
Write out your list, two items at a time, with dots in between them.
Here's an example:
a = """190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893"""
a_list = a.replace('..','\n').split()
b_list = a_list[1:-1]
b = ''
for i in range(len(a_list)/2):
b += '..'.join(b_list[2*i:2*i+2]) + '\n'
temp = []
with open('temp.txt') as ofile:
for x in ofile:
temp.append(x.rstrip("\n"))
for x in range(0, len(temp) - 1):
print temp[x].split("..")[1] +".."+ temp[x+1].split("..")[0]
x += 1
Maybe this will help:
def makeColumns(listOfNumbers):
n = int()
while n < len(listOfNumbers):
print(listOfNumbers[n], '..', listOfNumbers[(n+1)])
n += 2
def trim(listOfNumbers):
listOfNumbers.pop(0)
listOfNumbers.pop((len(listOfNumbers) - 1))
listOfNumbers = [190, 255, 337, 2799, 2801, 3733, 3734, 5020, 5234, 5530, 5683, 6459, 8238, 9191, 9306, 9893]
makeColumns(listOfNumbers)
print()
trim(listOfNumbers)
makeColumns(listOfNumbers)
I think this might be useful too. I am reading data from a file name list.
data = open("list","r")
temp = []
value = []
print data
for line in data:
temp = line.split("..")
value.append(temp[0])
value.append(temp[1])
for i in range(1,(len(value)-1),2):
print value[i].strip()+".."+value[i+1]
print value
After reading the data I split and store it in the temporary list.After that, I copy data to the main list value which have all of the data.Then I iterate from the second element to second last element to get the output of interest. strip function is used in order to remove the '\n' character from the value.
You can later write these values to a file Instead of printing out.

Categories