Loop through fixed number of chars in text file - python

Let's say that I have a file.txt with consecutive characters (no spaces nor newlines), like this:
ABCDHELOABCDFOOOABCD
And I want to loop through the file, iterating through fixed amounts of 4 characters, like this:
[ABCD, HELO, ABCD, FOOO, ABCD]
A regular loop won't do: how can I achieve this?

You can read four characters from the file at a time, by using TextIOWrapper.read's optional size parameter. Here I'm using Python 3.8's "walrus" operator, but it's not strictly required:
with open("file.txt", "r") as file:
while chunk := file.read(4):
print(chunk)

A simple loop like this would work. Not very pythonic, but gets the job done
s = 'ABCDHELLOABCDFOOOABCD'
for i in range(0,len(s),3):
print(s[i:i+3])

There is built-in textwrap module which has wrap function. So one can accomplish tasks without loop this way:
import textwrap
with open('file.txt', 'r') as f:
chunked = textwrap.wrap(f.read(), 4)
# chunked -> ['ABCD', 'HELO', 'ABCD', 'FOOO', 'ABCD']

Assuming that you've read the input of your file and converted the entire chunk into a single string called data, you could iterate over it like so:
individual_strings = data[::4]
This gives you a list of strings as required which you can then loop over!

try this:
with open('file.txt', 'r') as f:
content = f.read()
splited_by_four_letters = [content[i:i+4] for i in range(len(content))]
// do whatever you want with your data here

Related

How to define/ingest a hard-coded multi-line list without using quotes

I have a script that operates on elements of a list
The list is hard-coded at the top of the script and is edited periodically
When adding/removing items it would be ideal not to have to "quote" each item (especially since the users that may edit it may insert entries that have quotes and need to be escaped)
i.e. right now the list is defined as:
blah = [
'banana1',
'banana2',
'banana3'
]
If I wanted to add ban'ana4 then it would look like:
blah = [
'banana1',
'banana2',
'banana3',
'ban\'ana4'
]
Is there a more elegant way to do this other than making it a multi-line text string and then splitting on linebreaks?
I agree with #snakecharmerb's suggestion. It's less error-prone to store string values in a text file and load them whenever you run your Python program. For example, if you store the list items in the text file "test.txt"
test.txt
banana1
banana2
banana3
ban'ana4
Then you can load the list of strings into your program by reading the content in the "test.txt" file:
FILENAME = 'test.txt'
blah = []
with open(FILENAME) as f:
for line in f:
# cut off newline characters \n and \r
l = line.split('\n')[0].split('\r')[0]
blah.append(l)
Unless it is absolutely necessary to keep the list in the script file, I would read the data from a text file instead. This way any quoting is handled automatically by Python, and their is no risk of typos corrupting the script.
This wouldn't work if some elements of the list are not strings, but that doesn't seem to be likely in your case.
with open('somefile.txt') as f:
mylist = [line.strip() for line in f]
# do stuff with list
You can use split to avoid quoting, this is a very common idiom in python. Here is a example from repl
>>> '''foo
... bar'tar
... zar'''.split()
['foo', "bar'tar", 'zar']
Just note that line breaks are white spaces here, so split() just works. You will need to remove the indentation of those lines leading to another problem, this can be fixed by removing left spaces after splitting, which can be with something like the bellow
import re
from operator import truth
class R:
def __rsub__(self, string):
return list(filter(truth, re.split(r'\n\s*', string)))
R = R()
def foo():
s = '''
foo
bar
tar'zar''' - R
print(s)
foo()
Just give R a better name :)
You can use double quotes to include a single quote in the string or triple quotes to contain a mix of the other two:
blah = [
'banana1',
'banana2',
'banana3',
"ban'ana4",
"""ban'an"a5"""
]

Writing from one file to another

I've been stuck on this Python homework problem for awhile now: "Write a complete python program that reads 20 real numbers from a file inner.txt and outputs them in sorted order to a file outter.txt."
Alright, so what I do is:
f=open('inner.txt','r')
n=f.readlines()
n.replace('\n',' ')
n.sort()
x=open('outter.txt','w')
x.write(print(n))
So my thought process is: Open the text file, n is the list of read lines in it, I replace all the newline prompts in it so it can be properly sorted, then I open the text file I want to write to and print the list to it. First problem is it won't let me replace the new line functions, and the second problem is I can't write a list to a file.
I just tried this:
>>> x= "34\n"
>>> print(int(x))
34
So, you shouldn't have to filter out the "\n" like that, but can just put it into int() to convert it into an integer. This is assuming you have one number per line and they're all integers.
You then need to store each value into a list. A list has a .sort() method you can use to then sort the list.
EDIT:
forgot to mention, as other have already said, you need to iterate over the values in n as it's a list, not a single item.
Here's a step by step solution that fixes the issues you have :)
Opening the file, nothing wrong here.
f=open('inner.txt','r')
Don't forget to close the file:
f.close()
n is now a list of each line:
n=f.readlines()
There are no list.replace methods, so I suggest changing the above line to n = f.read(). Then, this will work (don't forget to reassign n, as strings are immutable):
n = n.replace('\n','')
You still only have a string full of numbers. However, instead of replacing the newline character, I suggest splitting the string using the newline as a delimiter:
n = n.split('\n')
Then, convert these strings to integers:
`n = [int(x) for x in n]`
Now, these two will work:
n.sort()
x=open('outter.txt','w')
You want to write the numbers themselves, so use this:
x.write('\n'.join(str(i) for i in n))
Finally, close the file:
x.close()
Using a context manager (the with statement) is good practice as well, when handling files:
with open('inner.txt', 'r') as f:
# do stuff with f
# automatically closed at the end
I guess real means float. So you have to convert your results to float to sort properly.
raw_lines = f.readlines()
floats = map(float, raw_lines)
Then you have to sort it. To write result back, you have to convert to string and join with line endings:
sortеd_as_string = map(str, sorted_floats)
result = '\n'.join(sortеd_as_string)
Finally you have have to write result to destination.
Ok let's look it step by step what you want to do.
First: Read some integers out of a textfile.
Pythonic Version:
fileNumbers = [int(line) for line in open(r'inner.txt', 'r').readlines()]
Easy to get version:
fileNumbers = list()
with open(r'inner.txt', 'r') as fh:
for singleLine in fh.readlines():
fileNumbers.append(int(singleLine))
What it does:
Open the file
Read each line, convert it to int (because readlines return string values) and append it to the list fileNumbers
Second: Sort the list
fileNumbers.sort()
What it does:
The sort function sorts the list by it's value e.g. [5,3,2,4,1] -> [1,2,3,4,5]
Third: Write it to a new textfile
with open(r'outter.txt', 'a') as fh:
[fh.write('{0}\n'.format(str(entry))) for entry in fileNumbers]

Python Regular Expression loop

I have this code wich will look for certain things in a file. The file looks like this:
name;lastname;job;5465465
name2;lastname2;job2;5465465
name3;lastname3;job3;5465465
This is the python code:
import re
import sys
filehandle = open('somefile.csv', 'r')
text = filehandle.read()
b = re.search("([a-zA-Z]+);([a-z\sA-Z]+);([a-zA-Z]*);([0-9^-]+)\n?",text)
print (b.group(2),b.group(1),b.group(3),b.group(4))
no it will only print:
lastname;name;job;5465465
It supposed to print the lastname first so i did that with groups. Now i need a loop to print all lines like this:
lastname;name;job;5465465
lastname2;name2;job2;5465465
lastname3;name3;job3;5465465l
i tried all kind of loops but it doesnt go trough the whole file... how do i need to do this?
it must be done with the re module. I know its easy in the csv module ;)
You need to process the file line by line.
import re
import sys
with open('somefile.csv', 'r') as filehandle:
for text in filehandle:
b = re.search("([a-zA-Z]+);([a-z\sA-Z]+);([a-zA-Z]*);([0-9^-]+)\n?",text)
print (b.group(2),b.group(1),b.group(3),b.group(4))
Your file has nicely semi-colon separated values, so it would be easier to just use split or the csv library as has been suggested.
No need for re, but a good job for csv:
import csv
with open('somefile.csv', 'r') as f:
for rec in csv.reader(f, delimiter=';'):
print (rec[1], rec[0], rec[2], rec[3])
You can use re if you want to check the validity of individual elements (valid phone number, no numbers in name, capitalized names, etc.).
The fault is not with the loops, but rather with your regex / capture group patterns. The class [a-zA-Z]+ will not match "lastname3" or "lastname2". This sample works:
import re
import sys
for line in open('somefile.csv', 'r'):
b = re.search("(\w+);(\w+);(\w*);([0-9^-]+)\n?",line)
if b:
print "%s;%s;%s;%s" % (b.group(2),b.group(1),b.group(3),b.group(4))
Seems as if you just want to reorder what you have, in which case I don't know whether regex are needed. I believe the following might be of use:
reorder = operator.itemgetter(1, 0, 2, 3)
http://docs.python.org/library/operator.html

Trouble sorting a list with python

I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).

How to read, in a line, all characters from column A to B

is it possible in Python, given a file with 10000 lines, where all of them have this structure:
1, 2, xvfrt ert5a fsfs4 df f fdfd56 , 234
or similar, to read the whole string, and then to store in another string all characters from column 7 to column 17, including spaces, so the new string would be
"xvfrt ert5a" ?
Thanks a lot
lst = [line[6:17] for line in open(fname)]
another_list = []
for line in f:
another_list.append(line[6:17])
Or as a generator (a memory friendly solution):
another_list = (line[6:17] for line in f)
I'm going to take Michael Dillon's answer a little further. If by "columns 6 through 17" you mean "the first 11 characters of the third comma-separated field", this is a good opportunity to use the csv module. Also, for Python 2.6 and above it's considered best practice to use the 'with' statement when opening files. Behold:
import csv
with open(filepath, 'rt') as f:
lst = [row[2][:11] for row in csv.reader(f)]
This will preserve leading whitespace; if you don't want that, change the last line to
lst = [row[2].lstrip()[:11] for row in csv.reader(f)]
This technically answers the direct question:
lst = [line[6:17] for line in open(fname)]
but there is a fatal flaw. It is OK for throwaway code, but that data looks suspiciously like comma separated values, and the third field may even be space delimited chunks of data. Far better to do it like this so that if the first two columns sprout an extra digit, it will still work:
lst = [x[2].strip()[0:11] for x in [line.split(',') for line in open(fname)]]
And if those space delimited chunks might get longer, then this:
lst = [x[2].strip().split()[0:2] for x in [line.split(',') for line in open(fname)]]
Don't forget a comment or two to explain what is going on. Perhaps:
# on each line, get the 3rd comma-delimited field and break out the
# first two space-separated chunks of the licence key
Assuming, of course, that those are licence keys. No need to be too abstract in comments.
You don't say how you want to store the data from each of the 10,000 lines -- if you want them in a list, you would do something like this:
my_list = []
for line in open(filename):
my_list.append(line[7:18])
for l in open("myfile.txt"):
c7_17 = l[6:17]
# Not sure what you want to do with c7_17 here, but go for it!
This functionw will compute the string that you want and print it out
def readCols(filepath):
f = open(filepath, 'r')
for line in file:
newString = line[6:17]
print newString

Categories