Replacing characters in a string loop - python

I have a template txt file. This txt file is to be written as 10 new files but each with some characters changed according to a list of arbitrary values:
with open('template.txt') as template_file:
template = template_file.readlines()
for i in range(10):
with open('output_%s.txt' % i, 'w') as new_file:
new_file.writelines(template_file)
The length of the list is the same as the number of new files (10).
I am trying to replace part of the 2nd line of each new file with the value in my list.
So for example, I want line 2, positions [5:16] in each new file replaced with the respective value in the list..
File 0 will have element 0 of the list
File 1 will have element 1 of the list
etc..
I tried using the replace() method:
list = [element0, element1, etc...element9]
for i in template_file:
i.replace(template_file[2][5:16], list_element)
But it will only replace all the files with the first list element... It wont loop over.
Any help appreciated

There are a couple of problems I can find which prevent your code from working:
You should write template out, which is a list of lines, not template_file, which is a file object
In Python, strings are immutable, meaning they cannot be changed. The replace function does not change the string, it returns a new copy of the string. Furthermore, replace will replace a substring with a new text, regardless of where that substring is. If you want to replace at a specific index, I suggest to slice the string yourself. For example:
line2 = '0123456789ABCDEFG'
element = '-ho-ho-ho-'
line2 = line2[:5] + element + line2[16:]
# line2 now is '01234-ho-ho-ho-G'
Please do not use list as a variable name. It is a type, which can be used to construct a new list as such:
empty = list() # ==> []
letters = list('abc') # ==> ['a', 'b', 'c']
The expression template_file[2][5:16] is incorrect: First, it should be template, not template_file. Second, the second line should be template[1], not template[2] since Python list are zero based
The list_element variable is not declared in your code
Solution 1
That being said, I find that it is easier to structure your template file as a real template with placeholders. I'll talk about that later. If you still insist to replace index 5-16 of line 2 with something, here is a solution I tested and it works:
with open('template.txt') as template_file:
template = template_file.readlines()
elements = ['ABC', 'DEF', 'GHI', 'JKL']
for i, element in enumerate(elements):
with open('output_%02d.txt' % i, 'w') as out_file:
line2 = template[1]
line2 = line2[:5] + element + line2[16:]
for line_number, line in enumerate(template, 1):
if line_number == 2:
line = line2
out_file.write(line)
Notes
The code writes out all lines, but with special replacement applies to line 2
The code is clunky, nested deeply
I don't like having to hard code the index numbers (5, 16) because if the template changes, I have to change the code as well
Solution 2
If you have control of the template file, I suggest to use the string.Template class to make search and replace easier. Since I don't know what your template file looks like, I am going to make up my own template file:
line #1
This is my ${token} to be replaced
line #3
line #4
Note that I intent to replace ${token} with one of the elements in the code. Now on to the code:
import string
with open('template.txt') as template_file:
template = string.Template(template_file.read())
elements = ['ABC', 'DEF', 'GHI', 'JKL']
for i, element in enumerate(elements):
with open('output_%02d.txt' % i, 'w') as out_file:
out_file.write(template.substitute(token=element))
Notes
I read the whole file in at once with template_file.read(). This could be a problem if the template file is large, but previous solution als ran into the same performance issue as this one
I use the string.Template class to make search/replace easier
Search and replace is done by substitute(token=element) which said: replace all the $token or ${token} instances in the template with element.
The code is much cleaner and dare I say, easier to read.
Solution 3
If the template file is too large to fit in memory at once, you can modify the first solution to read it line-by-line instead of reading all lines in at once. I am not going to present that solution here, just a asuggestion.

Looks like you need
list = [element0, element1, etc...element9]
for i in list:
template_file = template_file.replace(template_file[2][5:16], i)

Related

im trying this question but i just cant seem to get the code working, after the sentence ive attached a picture of my work

load_datafile() takes a single string parameter representing the filename of a datafile.
This function must read the content of the file, convert all letters to their lowercase, and store
the result in a string, and finally return that string. I will refer to this string as data throughout
this specification, you may rename it. You must also handle all exceptions in case the datafile
is not available.
Sample output:
data = load_datafile('harry.txt')
print(data)
the hottest day of the summer so far was drawing to a close and a drowsy silence
lay over the large, square houses of privet drive.
load_wordfile() takes a single string argument representing the filename of a wordfile.
This function must read the content of the wordfile and store all words in a one-dimensional
list and return the list. Make sure that the words do not have any additional whitespace or newline character in them. You must also handle all exceptions in case the files are not
available.
Sample outputs:
pos_words = load_wordfile("positivewords.txt")
print(pos_words[2:9])
['abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed',
'acclamation']
neg_words = load_wordfile("negativewords.txt")
print(neg_words[10:19])
['aborts', 'abrade', 'abrasive', 'abrupt', 'abruptly', 'abscond', 'absence',
'absent-minded', 'absentee']
MY CODE BELOW
def load_datafile('harryPotter.txt'):
data = ""
with open('harryPotter.txt') as file:
lines = file.readlines()
temp = lines[-1].lower()
return data
Your code has two main problems. The first one is that you are assigning an empty string to the variable data and returning it, so no matter what you do with the contents of the file you always return an empty string. The second one is that file.readlines() returns a list of strings, where each line in the file is an element on the list and you are only converting the last element lines[-1] to lowercase.
To fix your code you should make sure that you store the contents of the file on the data variable and you should apply the lower() function to each line on the file and not just the last one. Something like this:
def load_datafile(file_name):
data = ''
with open(file_name) as file:
lines = file.readlines()
for line in lines:
data = data + line.lower() + '\n'
return data
The previous example is not the best way of doing this but it's very easy to understand what is happening and I think that is more important when you are starting. To make it more efficient you might want to change it to:
def load_datafile(file_name):
with open(file_name) as file:
return '\n'.join(line.lower() for line in file.readlines())

How to import a special format as a dictionary in python?

I have the text files as below format in single line,
username:password;username1:password1;username2:password2;
etc.
What I have tried so far is
with open('list.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
but I get an error saying that the length is 1 and 2 is required which indicates the file is not being as key:value.
Is there any way to fix this or should I just reformat the file in another way?
thanks for your answers.
What i got so far is:
with open('tester.txt') as f:
password_list = dict(x.strip(":").split(";", 1) for x in f)
for user, password in password_list.items():
print(user + " - " + password)
the results comes out as username:password - username1:password1
what i need is to split username:password where key = user and value = password
Since variable f in this case is a file object and not a list, the first thing to do would be to get the lines from it. You could use the https://docs.python.org/2/library/stdtypes.html?highlight=readline#file.readlines* method for this.
Furthermore, I think I would use strip with the semicolon (";") parameter. This will provide you with a list of strings of "username:password", provided your entire file looks like this.
I think you will figure out what to do after that.
EDIT
* I auto assumed you use Python 2.7 for some reason. In version 3.X you might want to look at the "distutils.text_file" (https://docs.python.org/3.7/distutils/apiref.html?highlight=readlines#distutils.text_file.TextFile.readlines) class.
Load the text of the file in Python with open() and read() as a string
Apply split(;) to that string to create a list like [username:password, username1:password1, username2:password2]
Do a dict comprehension where you apply split(":") to each item of the above list to split those pairs.
with open('list.txt', 'rt') as f:
raw_data = f.readlines()[0]
list_data = raw_data.split(';')
user_dict = { x.split(':')[0]:x.split(':')[1] for x in list_data }
print(user_dict)
Dictionary comprehension is useful here.
One liner to pull all the info out of the text file. As requested. Hope your tutor is impressed. Ask him How it works and see what he says. Maybe update your question to include his response.
If you want me to explain, feel free to comment and I shall go into more detail.
The error you're probably getting:
ValueError: dictionary update sequence element #3 has length 1; 2 is required
is because the text line ends with a semicolon. Splitting it on semicolons then results in a list that contains some pairs, and an empty string:
>>> "username:password;username1:password1;username2:password2;".split(";")
['username:password', 'username1:password1', 'username2:password2', '']
Splitting the empty string on colons then results in a single empty string, rather than two strings.
To fix this, filter out the empty string. One example of doing this would be
[element for element in x.split(";") if element != ""]
In general, I recommend you do the work one step at a time and assign to intermediary variables.
Here's a simple (but long) answer. You need to get the line from the file, and then split it and the items resulting from the split:
results = {}
with open('file.txt') as file:
for line in file:
#Only one line, but that's fine
entries = line.split(';')
for entry in entries:
if entry != '':
#The last item in entries will be blank, due to how split works in this example
user, password = entry.split(':')
results[user] = password
Try this.
f = open('test.txt').read()
data = f.split(";")
d = {}
for i in data:
if i:
value = i.split(":")
d.update({value[0]:value[1]})
print d

Having problems with strings and arrays

I want to read a text file and copy text that is in between '~~~~~~~~~~~~~' into an array. However, I'm new in Python and this is as far as I got:
with open("textfile.txt", "r",encoding='utf8') as f:
searchlines = f.readlines()
a=[0]
b=0
for i,line in enumerate(searchlines):
if '~~~~~~~~~~~~~' in line:
b=b+1
if '~~~~~~~~~~~~~' not in line:
if 's1mb4d' in line:
break
a.insert(b,line)
This is what I envisioned:
First I read all the lines of the text file,
then I declare 'a' as an array in which text should be added,
then I declare 'b' because I need it as an index. The number of lines in between the '~~~~~~~~~~~~~' is not even, that's why I use 'b' so I can put lines of text into one array index until a new '~~~~~~~~~~~~~' was found.
I check for '~~~~~~~~~~~~~', if found I increase 'b' so I can start adding lines of text into a new array index.
The text file ends with 's1mb4d', so once its found, the program ends.
And if '~~~~~~~~~~~~~' is not found in the line, I add text to the array.
But things didn't go well. Only 1 line of the entire text between those '~~~~~~~~~~~~~' is being copied to the each array index.
Here is an example of the text file:
~~~~~~~~~~~~~
Text123asdasd
asdasdjfjfjf
~~~~~~~~~~~~~
123abc
321bca
gjjgfkk
~~~~~~~~~~~~~
You could use regex expression, give a try to this:
import re
input_text = ['Text123asdasd asdasdjfjfjf','~~~~~~~~~~~~~','123abc 321bca gjjgfkk','~~~~~~~~~~~~~']
a = []
for line in input_text:
my_text = re.findall(r'[^\~]+', line)
if len(my_text) != 0:
a.append(my_text)
What it does is it reads line by line looks for all characters but '~' if line consists only of '~' it ignores it, every line with text is appended to your a list afterwards.
And just because we can, oneliner (excluding import and source ofc):
import re
lines = ['Text123asdasd asdasdjfjfjf','~~~~~~~~~~~~~','123abc 321bca gjjgfkk','~~~~~~~~~~~~~']
a = [re.findall(r'[^\~]+', line) for line in lines if len(re.findall(r'[^\~]+', line)) != 0]
In python the solution to a large part of problems is often to find the right function from the standard library that does the job. Here you should try using split instead, it should be way easier.
If I understand correctly your goal, you can do it like that :
joined_lines = ''.join(searchlines)
result = joined_lines.split('~~~~~~~~~~')
The first line joins your list of lines into a sinle string, and then the second one cut that big string every times it encounters the '~~' sequence.
I tried to clean it up to the best of my knowledge, try this and let me know if it works. We can work together on this!:)
with open("textfile.txt", "r",encoding='utf8') as f:
searchlines = f.readlines()
a = []
currentline = ''
for i,line in enumerate(searchlines):
currentline += line
if '~~~~~~~~~~~~~' in line:
a.append(currentline)
elif 's1mb4d' in line:
break
Some notes:
You can use elif for your break function
Append will automatically add the next iteration to the end of the array
currentline will continue to add text on each line as long as it doesn't have 's1mb4d' or the ~~~ which I think is what you want
s = ['']
with open('path\\to\\sample.txt') as f:
for l in f:
a = l.strip().split("\n")
s += a
a = []
for line in s:
my_text = re.findall(r'[^\~]+', line)
if len(my_text) != 0:
a.append(my_text)
print a
>>> [['Text123asdasd asdasdjfjfjf'], ['123abc 321bca gjjgfkk']]
If you're willing to impose/accept the constraint that the separator should be exactly 13 ~ characters (actually '\n%s\n' % ( '~' * 13) to be specific) ...
then you could accomplish this for relatively normal sized files using just
#!/usr/bin/python
## (Should be #!/usr/bin/env python; but StackOverflow's syntax highlighter?)
separator = '\n%s\n' % ('~' * 13)
with open('somefile.txt') as f:
results = f.read().split(separator)
# Use your results, a list of the strings separated by these separators.
Note that '~' * 13 is a way, in Python, of constructing a string by repeating some smaller string thirteen times. 'xx%sxx' % 'YY' is a way to "interpolate" one string into another. Of course you could just paste the thirteen ~ characters into your source code ... but I would consider constructing the string as shown to make it clear that the length is part of the string's specification --- that this is part of your file format requirements ... and that any other number of ~ characters won't be sufficient.
If you really want any line of any number of ~ characters to serve as a separator than you'll want to use the .split() method from the regular expressions module rather than the .split() method provided by the built-in string objects.
Note that this snippet of code will return all of the text between your separator lines, including any newlines they include. There are other snippets of code which can filter those out. For example given our previous results:
# ... refine results by filtering out newlines (replacing them with spaces)
results = [' '.join(each.split('\n')) for each in results]
(You could also use the .replace() string method; but I prefer the join/split combination). In this case we're using a list comprehension (a feature of Python) to iterate over each item in our results, which we're arbitrarily naming each), performing our transformation on it, and the resulting list is being boun back to the name results; I highly recommend learning and getting comfortable with list comprehension if you're going to learn Python. They're commonly used and can be a bit exotic compared to the syntax of many other programming and scripting languages).
This should work on MS Windows as well as Unix (and Unix-like) systems because of how Python handles "universal newlines." To use these examples under Python 3 you might have to work a little on the encodings and string types. (I didn't need to for my Python3.6 installed under MacOS X using Homebrew ... but just be forewarned).

Python: Appending string constructed out of multiple lines to list

I'm trying to parse a txt file and put sentences in a list that fit my criteria.
The text file consists of several thousand lines and I'm looking for lines that start with a specific string, lets call this string 'start'.
The lines in this text file can belong together and are somehow seperated with \n at random.
This means I have to look for any string that starts with 'start', put it in an empty string 'complete' and then continue scanning each line after that to see if it also starts with 'start'.
If not then I need to append it to 'complete' because then it is part of the entire sentence. If it does I need to append 'complete' to a list, create a new, empty 'complete' string and start appending to that one. This way I can loop through the entire text file without paying attention to the number of lines a sentence exists of.
My code thusfar:
import sys, string
lines_1=[]
startswith = ('keys', 'values', 'files', 'folders', 'total')
completeline = ''
with open (sys.argv[1]) as f:
data = f.read()
for line in data:
if line.lower().startswith(startswith):
completeline = line
else:
completeline += line
lines_1.append(completeline)
# check some stuff in output
for l in lines_1:
print "______"
print l
print len(lines_1)
However this puts the entire content in 1 item in the list, where I'd like everything to be seperated.
Keep in mind that the lines composing one sentence can span one, two, 10 or 1000 lines so it needs to spot the next startswith value, append the existing completeline to the list and then fill completeline up with the next sentence.
Much obliged!
Two issues:
Iterating over a string, not lines:
When you iterate over a string, the value yielded is a character, not a line. This means for line in data: is going character by character through the string. Split your input by newlines, returning a list, which you then iterate over. e.g. for line in data.split('\n'):
Overwriting the completeline inside the loop
You append a completed line at the end of the loop, but not when you start recording a new line inside the loop. Change the if in the loop to something like this:
if line.lower().startswith(startswith):
if completeline:
lines_1.append(completeline)
completeline = line
For task like this
"I'm trying to parse a txt file and put sentences in a list that fit my criteria"
I usually prefer using dictionary for such kind of ideas, for example
from collections import defaultdict
seperatedItems = defaultdict(list)
for sentence in fileDataAsAList:
if satisfiesCriteria("start",sentence):
seperatedItems["start"].append(sentence)
def satisfiesCriteria(criteria,sentence):
if sentence.lower.startswith(criteria):
return True
return False
Something like this should suffise.. the code is just for giving you idea of what you might like to do.. you can have list of criterias and loop over them which will add sentences related to different creterias into dictionary something like this
mycriterias = ['start','begin','whatever']
for criteria in mycriterias:
for sentence in fileDataAsAList:
if satisfiesCriteria(criteria ,sentence):
seperatedItems[criteria ].append(sentence)
mind the spellings :p

Trouble sorting a list with python

I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).

Categories