I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).
Related
load_datafile() takes a single string parameter representing the filename of a datafile.
This function must read the content of the file, convert all letters to their lowercase, and store
the result in a string, and finally return that string. I will refer to this string as data throughout
this specification, you may rename it. You must also handle all exceptions in case the datafile
is not available.
Sample output:
data = load_datafile('harry.txt')
print(data)
the hottest day of the summer so far was drawing to a close and a drowsy silence
lay over the large, square houses of privet drive.
load_wordfile() takes a single string argument representing the filename of a wordfile.
This function must read the content of the wordfile and store all words in a one-dimensional
list and return the list. Make sure that the words do not have any additional whitespace or newline character in them. You must also handle all exceptions in case the files are not
available.
Sample outputs:
pos_words = load_wordfile("positivewords.txt")
print(pos_words[2:9])
['abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed',
'acclamation']
neg_words = load_wordfile("negativewords.txt")
print(neg_words[10:19])
['aborts', 'abrade', 'abrasive', 'abrupt', 'abruptly', 'abscond', 'absence',
'absent-minded', 'absentee']
MY CODE BELOW
def load_datafile('harryPotter.txt'):
data = ""
with open('harryPotter.txt') as file:
lines = file.readlines()
temp = lines[-1].lower()
return data
Your code has two main problems. The first one is that you are assigning an empty string to the variable data and returning it, so no matter what you do with the contents of the file you always return an empty string. The second one is that file.readlines() returns a list of strings, where each line in the file is an element on the list and you are only converting the last element lines[-1] to lowercase.
To fix your code you should make sure that you store the contents of the file on the data variable and you should apply the lower() function to each line on the file and not just the last one. Something like this:
def load_datafile(file_name):
data = ''
with open(file_name) as file:
lines = file.readlines()
for line in lines:
data = data + line.lower() + '\n'
return data
The previous example is not the best way of doing this but it's very easy to understand what is happening and I think that is more important when you are starting. To make it more efficient you might want to change it to:
def load_datafile(file_name):
with open(file_name) as file:
return '\n'.join(line.lower() for line in file.readlines())
I have the text files as below format in single line,
username:password;username1:password1;username2:password2;
etc.
What I have tried so far is
with open('list.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
but I get an error saying that the length is 1 and 2 is required which indicates the file is not being as key:value.
Is there any way to fix this or should I just reformat the file in another way?
thanks for your answers.
What i got so far is:
with open('tester.txt') as f:
password_list = dict(x.strip(":").split(";", 1) for x in f)
for user, password in password_list.items():
print(user + " - " + password)
the results comes out as username:password - username1:password1
what i need is to split username:password where key = user and value = password
Since variable f in this case is a file object and not a list, the first thing to do would be to get the lines from it. You could use the https://docs.python.org/2/library/stdtypes.html?highlight=readline#file.readlines* method for this.
Furthermore, I think I would use strip with the semicolon (";") parameter. This will provide you with a list of strings of "username:password", provided your entire file looks like this.
I think you will figure out what to do after that.
EDIT
* I auto assumed you use Python 2.7 for some reason. In version 3.X you might want to look at the "distutils.text_file" (https://docs.python.org/3.7/distutils/apiref.html?highlight=readlines#distutils.text_file.TextFile.readlines) class.
Load the text of the file in Python with open() and read() as a string
Apply split(;) to that string to create a list like [username:password, username1:password1, username2:password2]
Do a dict comprehension where you apply split(":") to each item of the above list to split those pairs.
with open('list.txt', 'rt') as f:
raw_data = f.readlines()[0]
list_data = raw_data.split(';')
user_dict = { x.split(':')[0]:x.split(':')[1] for x in list_data }
print(user_dict)
Dictionary comprehension is useful here.
One liner to pull all the info out of the text file. As requested. Hope your tutor is impressed. Ask him How it works and see what he says. Maybe update your question to include his response.
If you want me to explain, feel free to comment and I shall go into more detail.
The error you're probably getting:
ValueError: dictionary update sequence element #3 has length 1; 2 is required
is because the text line ends with a semicolon. Splitting it on semicolons then results in a list that contains some pairs, and an empty string:
>>> "username:password;username1:password1;username2:password2;".split(";")
['username:password', 'username1:password1', 'username2:password2', '']
Splitting the empty string on colons then results in a single empty string, rather than two strings.
To fix this, filter out the empty string. One example of doing this would be
[element for element in x.split(";") if element != ""]
In general, I recommend you do the work one step at a time and assign to intermediary variables.
Here's a simple (but long) answer. You need to get the line from the file, and then split it and the items resulting from the split:
results = {}
with open('file.txt') as file:
for line in file:
#Only one line, but that's fine
entries = line.split(';')
for entry in entries:
if entry != '':
#The last item in entries will be blank, due to how split works in this example
user, password = entry.split(':')
results[user] = password
Try this.
f = open('test.txt').read()
data = f.split(";")
d = {}
for i in data:
if i:
value = i.split(":")
d.update({value[0]:value[1]})
print d
I've been stuck on this Python homework problem for awhile now: "Write a complete python program that reads 20 real numbers from a file inner.txt and outputs them in sorted order to a file outter.txt."
Alright, so what I do is:
f=open('inner.txt','r')
n=f.readlines()
n.replace('\n',' ')
n.sort()
x=open('outter.txt','w')
x.write(print(n))
So my thought process is: Open the text file, n is the list of read lines in it, I replace all the newline prompts in it so it can be properly sorted, then I open the text file I want to write to and print the list to it. First problem is it won't let me replace the new line functions, and the second problem is I can't write a list to a file.
I just tried this:
>>> x= "34\n"
>>> print(int(x))
34
So, you shouldn't have to filter out the "\n" like that, but can just put it into int() to convert it into an integer. This is assuming you have one number per line and they're all integers.
You then need to store each value into a list. A list has a .sort() method you can use to then sort the list.
EDIT:
forgot to mention, as other have already said, you need to iterate over the values in n as it's a list, not a single item.
Here's a step by step solution that fixes the issues you have :)
Opening the file, nothing wrong here.
f=open('inner.txt','r')
Don't forget to close the file:
f.close()
n is now a list of each line:
n=f.readlines()
There are no list.replace methods, so I suggest changing the above line to n = f.read(). Then, this will work (don't forget to reassign n, as strings are immutable):
n = n.replace('\n','')
You still only have a string full of numbers. However, instead of replacing the newline character, I suggest splitting the string using the newline as a delimiter:
n = n.split('\n')
Then, convert these strings to integers:
`n = [int(x) for x in n]`
Now, these two will work:
n.sort()
x=open('outter.txt','w')
You want to write the numbers themselves, so use this:
x.write('\n'.join(str(i) for i in n))
Finally, close the file:
x.close()
Using a context manager (the with statement) is good practice as well, when handling files:
with open('inner.txt', 'r') as f:
# do stuff with f
# automatically closed at the end
I guess real means float. So you have to convert your results to float to sort properly.
raw_lines = f.readlines()
floats = map(float, raw_lines)
Then you have to sort it. To write result back, you have to convert to string and join with line endings:
sortеd_as_string = map(str, sorted_floats)
result = '\n'.join(sortеd_as_string)
Finally you have have to write result to destination.
Ok let's look it step by step what you want to do.
First: Read some integers out of a textfile.
Pythonic Version:
fileNumbers = [int(line) for line in open(r'inner.txt', 'r').readlines()]
Easy to get version:
fileNumbers = list()
with open(r'inner.txt', 'r') as fh:
for singleLine in fh.readlines():
fileNumbers.append(int(singleLine))
What it does:
Open the file
Read each line, convert it to int (because readlines return string values) and append it to the list fileNumbers
Second: Sort the list
fileNumbers.sort()
What it does:
The sort function sorts the list by it's value e.g. [5,3,2,4,1] -> [1,2,3,4,5]
Third: Write it to a new textfile
with open(r'outter.txt', 'a') as fh:
[fh.write('{0}\n'.format(str(entry))) for entry in fileNumbers]
I am a beginner at python and I need to check the presence of a given set of string in a huge txt file. I've written this code so far and it runs with no problems on a light subsample of my database. The problem is that it takes more than 10 hours when searching through the whole database and I'm looking for a way to speed up the process.
The code so far reads a list of strings from a txt I've put together (list.txt) and search for every item in every line of the database (hugedataset.txt). My final output should be a list of items which are present in the database (or, alternatively, a list of items which are NOT present). I bet there is a more efficient way to do things though...
Thank you for your support!
import re
fobj_in = open('hugedataset.txt')
present=[]
with open('list.txt', 'r') as f:
list1 = [line.strip() for line in f]
print list1
for l in fobj_in:
for title in list1:
if title in l:
print title
present.append(title)
set=set(presenti)
print set
Since you don't need any per-line information, you can search the whole thing in one go for each string:
data = open('hugedataset.txt').read() # Assuming it fits in memory
present=[] # As #svk points out, you could make this a set
with open('list.txt', 'r') as f:
list1 = [line.strip() for line in f]
print list1
for title in list1:
if title in data:
print title
present.append(title)
set=set(present)
print set
You could use a regexp to check for all substring with a single pass. Look for example at this answer: Check to ensure a string does not contain multiple values
is it possible in Python, given a file with 10000 lines, where all of them have this structure:
1, 2, xvfrt ert5a fsfs4 df f fdfd56 , 234
or similar, to read the whole string, and then to store in another string all characters from column 7 to column 17, including spaces, so the new string would be
"xvfrt ert5a" ?
Thanks a lot
lst = [line[6:17] for line in open(fname)]
another_list = []
for line in f:
another_list.append(line[6:17])
Or as a generator (a memory friendly solution):
another_list = (line[6:17] for line in f)
I'm going to take Michael Dillon's answer a little further. If by "columns 6 through 17" you mean "the first 11 characters of the third comma-separated field", this is a good opportunity to use the csv module. Also, for Python 2.6 and above it's considered best practice to use the 'with' statement when opening files. Behold:
import csv
with open(filepath, 'rt') as f:
lst = [row[2][:11] for row in csv.reader(f)]
This will preserve leading whitespace; if you don't want that, change the last line to
lst = [row[2].lstrip()[:11] for row in csv.reader(f)]
This technically answers the direct question:
lst = [line[6:17] for line in open(fname)]
but there is a fatal flaw. It is OK for throwaway code, but that data looks suspiciously like comma separated values, and the third field may even be space delimited chunks of data. Far better to do it like this so that if the first two columns sprout an extra digit, it will still work:
lst = [x[2].strip()[0:11] for x in [line.split(',') for line in open(fname)]]
And if those space delimited chunks might get longer, then this:
lst = [x[2].strip().split()[0:2] for x in [line.split(',') for line in open(fname)]]
Don't forget a comment or two to explain what is going on. Perhaps:
# on each line, get the 3rd comma-delimited field and break out the
# first two space-separated chunks of the licence key
Assuming, of course, that those are licence keys. No need to be too abstract in comments.
You don't say how you want to store the data from each of the 10,000 lines -- if you want them in a list, you would do something like this:
my_list = []
for line in open(filename):
my_list.append(line[7:18])
for l in open("myfile.txt"):
c7_17 = l[6:17]
# Not sure what you want to do with c7_17 here, but go for it!
This functionw will compute the string that you want and print it out
def readCols(filepath):
f = open(filepath, 'r')
for line in file:
newString = line[6:17]
print newString