Nested lists and converting to floats - python

I'm trying to read some numbers from a text file and convert them to a list of floats, but nothing I try seems to work right.
Here's my code right now:
python_data = open('C:\Documents and Settings\redacted\Desktop\python_lengths.txt','r')
python_lengths = []
for line in python_data:
python_lengths.append(line.split())
python_lengths.sort()
print python_lengths
It returns:
[['12.2'], ['26'], ['34.2'], ['5.0'], ['62'], ['62'], ['62.6']]
(all brackets included)
But I can't convert it to a list of floats with any regular commands like:
python_lengths = float(python_lengths)
or:
float_lengths = [map(float, x) for x in python_lengths]
because it seems to be nested or something?

That is happening because .split() always returns a list of items even if there was just 1 element present. If you change your python_lengths.append(line.split()) to python_lengths.extend(line.split()) you will get your flat list you expected.

#eumiro's answer is correct, but here is something else that can help:
numbers = []
with open('C:\Documents and Settings\redacted\Desktop\python_lengths.txt','r') as f:
for line in f.readlines():
numbers.extend(line.split())
numbers.sort()
print numbers

def floats_from_file(f):
for line in f:
for word in line.split():
yield float(word)
with open('C:/Documents and Settings/redacted/Desktop/python_lengths.txt') as f:
python_lengths = list(floats_from_file(f))
python_lengths.sort()
print python_lengths
Note that you can use forward slashes, even on Windows. If you want to use backslashes you should use a "raw" string, to avoid problems. What sort of problems? Well, some characters are special with backslash; for example, \n represents a newline. If you just put a path in plain quotes, and one of the directory names starts with n, you will get a newline there. Solutions are to double the backslashes, use raw strings, or just use forward slashes.

Related

Having problems with strings and arrays

I want to read a text file and copy text that is in between '~~~~~~~~~~~~~' into an array. However, I'm new in Python and this is as far as I got:
with open("textfile.txt", "r",encoding='utf8') as f:
searchlines = f.readlines()
a=[0]
b=0
for i,line in enumerate(searchlines):
if '~~~~~~~~~~~~~' in line:
b=b+1
if '~~~~~~~~~~~~~' not in line:
if 's1mb4d' in line:
break
a.insert(b,line)
This is what I envisioned:
First I read all the lines of the text file,
then I declare 'a' as an array in which text should be added,
then I declare 'b' because I need it as an index. The number of lines in between the '~~~~~~~~~~~~~' is not even, that's why I use 'b' so I can put lines of text into one array index until a new '~~~~~~~~~~~~~' was found.
I check for '~~~~~~~~~~~~~', if found I increase 'b' so I can start adding lines of text into a new array index.
The text file ends with 's1mb4d', so once its found, the program ends.
And if '~~~~~~~~~~~~~' is not found in the line, I add text to the array.
But things didn't go well. Only 1 line of the entire text between those '~~~~~~~~~~~~~' is being copied to the each array index.
Here is an example of the text file:
~~~~~~~~~~~~~
Text123asdasd
asdasdjfjfjf
~~~~~~~~~~~~~
123abc
321bca
gjjgfkk
~~~~~~~~~~~~~
You could use regex expression, give a try to this:
import re
input_text = ['Text123asdasd asdasdjfjfjf','~~~~~~~~~~~~~','123abc 321bca gjjgfkk','~~~~~~~~~~~~~']
a = []
for line in input_text:
my_text = re.findall(r'[^\~]+', line)
if len(my_text) != 0:
a.append(my_text)
What it does is it reads line by line looks for all characters but '~' if line consists only of '~' it ignores it, every line with text is appended to your a list afterwards.
And just because we can, oneliner (excluding import and source ofc):
import re
lines = ['Text123asdasd asdasdjfjfjf','~~~~~~~~~~~~~','123abc 321bca gjjgfkk','~~~~~~~~~~~~~']
a = [re.findall(r'[^\~]+', line) for line in lines if len(re.findall(r'[^\~]+', line)) != 0]
In python the solution to a large part of problems is often to find the right function from the standard library that does the job. Here you should try using split instead, it should be way easier.
If I understand correctly your goal, you can do it like that :
joined_lines = ''.join(searchlines)
result = joined_lines.split('~~~~~~~~~~')
The first line joins your list of lines into a sinle string, and then the second one cut that big string every times it encounters the '~~' sequence.
I tried to clean it up to the best of my knowledge, try this and let me know if it works. We can work together on this!:)
with open("textfile.txt", "r",encoding='utf8') as f:
searchlines = f.readlines()
a = []
currentline = ''
for i,line in enumerate(searchlines):
currentline += line
if '~~~~~~~~~~~~~' in line:
a.append(currentline)
elif 's1mb4d' in line:
break
Some notes:
You can use elif for your break function
Append will automatically add the next iteration to the end of the array
currentline will continue to add text on each line as long as it doesn't have 's1mb4d' or the ~~~ which I think is what you want
s = ['']
with open('path\\to\\sample.txt') as f:
for l in f:
a = l.strip().split("\n")
s += a
a = []
for line in s:
my_text = re.findall(r'[^\~]+', line)
if len(my_text) != 0:
a.append(my_text)
print a
>>> [['Text123asdasd asdasdjfjfjf'], ['123abc 321bca gjjgfkk']]
If you're willing to impose/accept the constraint that the separator should be exactly 13 ~ characters (actually '\n%s\n' % ( '~' * 13) to be specific) ...
then you could accomplish this for relatively normal sized files using just
#!/usr/bin/python
## (Should be #!/usr/bin/env python; but StackOverflow's syntax highlighter?)
separator = '\n%s\n' % ('~' * 13)
with open('somefile.txt') as f:
results = f.read().split(separator)
# Use your results, a list of the strings separated by these separators.
Note that '~' * 13 is a way, in Python, of constructing a string by repeating some smaller string thirteen times. 'xx%sxx' % 'YY' is a way to "interpolate" one string into another. Of course you could just paste the thirteen ~ characters into your source code ... but I would consider constructing the string as shown to make it clear that the length is part of the string's specification --- that this is part of your file format requirements ... and that any other number of ~ characters won't be sufficient.
If you really want any line of any number of ~ characters to serve as a separator than you'll want to use the .split() method from the regular expressions module rather than the .split() method provided by the built-in string objects.
Note that this snippet of code will return all of the text between your separator lines, including any newlines they include. There are other snippets of code which can filter those out. For example given our previous results:
# ... refine results by filtering out newlines (replacing them with spaces)
results = [' '.join(each.split('\n')) for each in results]
(You could also use the .replace() string method; but I prefer the join/split combination). In this case we're using a list comprehension (a feature of Python) to iterate over each item in our results, which we're arbitrarily naming each), performing our transformation on it, and the resulting list is being boun back to the name results; I highly recommend learning and getting comfortable with list comprehension if you're going to learn Python. They're commonly used and can be a bit exotic compared to the syntax of many other programming and scripting languages).
This should work on MS Windows as well as Unix (and Unix-like) systems because of how Python handles "universal newlines." To use these examples under Python 3 you might have to work a little on the encodings and string types. (I didn't need to for my Python3.6 installed under MacOS X using Homebrew ... but just be forewarned).

Removing newline characters in a txt files

I'm doing Euler Problems and am at problem #8 and wanted to just copy this huge 1000-digit number to a numberToProblem8.txt file and then just read it into my script but I can't find a good way to remove newlines from it. With that code:
hugeNumberAsStr = ''
with open('numberToProblem8.txt') as f:
for line in f:
aSingleLine = line.strip()
hugeNumberAsStr.join(aSingleLine)
print(hugeNumberAsStr)
Im using print() to only check if it works and well, it doesnt. It doesnt print out anything. What's wrong with my code? I remove all the trash with strip() and then use join() to add that cleaned line into hugeNumberAsStr (need a string to join those lines, gonna use int() later on) and its repeated for all the lines.
Here is the .txt file with a number in it.
What about something like:
hugeNumberAsStr = open('numberToProblem8.txt').read()
hugeNumberAsStr = hugeNumberAsStr.strip().replace('\n', '')
Or even:
hugeNumberAsStr = ''.join([d for d in hugeNumberAsStr if d.isdigit()])
I was able to simplify it to the following to get the number from that file:
>>> int(open('numberToProblem8.txt').read().replace('\n',''))
731671765313306249192251196744265747423553491949349698352031277450632623957831801698480186947885184385861560789112949495459501737958331952853208805511125406987471585238630507156932909632952274430435576689664895044524452316173185640309871112172238311362229893423380308135336276614282806444486645238749303589072962904915604407723907138105158593079608667017242712188399879790879227492190169972088809377665727333001053367881220235421809751254540594752243525849077116705560136048395864467063244157221553975369781797784617406495514929086256932197846862248283972241375657056057490261407972968652414535100474821663704844031998900088952434506585412275886668811642717147992444292823086346567481391912316282458617866458359124566529476545682848912883142607690042242190226710556263211111093705442175069416589604080719840385096245544
You need to do hugeNumberAsStr += aSingleLine instead of hugeNumberAsStr.join(..)
str.join() joins the passed iterator and return the string value joined by str. It doesn't update the value of hugeNumberAsStr as you think. You want to create a new string with removed \n. You need to store these values in new string. For that you need append the content to the string
The join method for strings simply takes an iterable object and concatenates each part together. It then returns the resulting concatenated string. As stated in help(str.join):
join(...)
S.join(iterable) -> str
Return a string which is the concatenation of the strings in the
iterable. The separator between elements is S.
Thus the join method really does not do what you want.
The concatenation line should be more like:
hugeNumberAsString += aSingleLine
Or even:
hugeNumberAsString += line.strip()
Which gets rid of the extra line of code doing the strip.

Reading characters from file into list

I have a part of a program with the following code
file1 = [line.strip()for line in open(sometext.txt).readlines()]
print ((file1)[0])
and when the code is executed it gives me the whole contents of the txt file which is a very long sentence,
how would I go about reading every letter and placing it in a list to index each character separately? I have used the list() function which seems to put the whole text file into a list and not each character.
You can use file.read() rather than file.readlines():
file1 = [char for char in open(sometext.txt).read()]
You don't really need list-comprehension, however; instead you can do this:
file1 = list(open(sometext.txt).read())
Also, as #furas mentioned in his comment, you don't need a list to have indexing. str also has a method called index, so you could say file1 = open(sometext.txt).read() and still be able to use file1.index(). Note, str also has a find method which will return -1 if the substring is not found, rather than raising a ValueError.
With a read() is enough. Plus. if you want to store the list without \n and white spaces, you can use:
char_list = [ch for ch in open('test.txt').read() if ch != '\n' if ch != ' ']
You can remove the if statements if you want to maintain them.

Writing from one file to another

I've been stuck on this Python homework problem for awhile now: "Write a complete python program that reads 20 real numbers from a file inner.txt and outputs them in sorted order to a file outter.txt."
Alright, so what I do is:
f=open('inner.txt','r')
n=f.readlines()
n.replace('\n',' ')
n.sort()
x=open('outter.txt','w')
x.write(print(n))
So my thought process is: Open the text file, n is the list of read lines in it, I replace all the newline prompts in it so it can be properly sorted, then I open the text file I want to write to and print the list to it. First problem is it won't let me replace the new line functions, and the second problem is I can't write a list to a file.
I just tried this:
>>> x= "34\n"
>>> print(int(x))
34
So, you shouldn't have to filter out the "\n" like that, but can just put it into int() to convert it into an integer. This is assuming you have one number per line and they're all integers.
You then need to store each value into a list. A list has a .sort() method you can use to then sort the list.
EDIT:
forgot to mention, as other have already said, you need to iterate over the values in n as it's a list, not a single item.
Here's a step by step solution that fixes the issues you have :)
Opening the file, nothing wrong here.
f=open('inner.txt','r')
Don't forget to close the file:
f.close()
n is now a list of each line:
n=f.readlines()
There are no list.replace methods, so I suggest changing the above line to n = f.read(). Then, this will work (don't forget to reassign n, as strings are immutable):
n = n.replace('\n','')
You still only have a string full of numbers. However, instead of replacing the newline character, I suggest splitting the string using the newline as a delimiter:
n = n.split('\n')
Then, convert these strings to integers:
`n = [int(x) for x in n]`
Now, these two will work:
n.sort()
x=open('outter.txt','w')
You want to write the numbers themselves, so use this:
x.write('\n'.join(str(i) for i in n))
Finally, close the file:
x.close()
Using a context manager (the with statement) is good practice as well, when handling files:
with open('inner.txt', 'r') as f:
# do stuff with f
# automatically closed at the end
I guess real means float. So you have to convert your results to float to sort properly.
raw_lines = f.readlines()
floats = map(float, raw_lines)
Then you have to sort it. To write result back, you have to convert to string and join with line endings:
sortеd_as_string = map(str, sorted_floats)
result = '\n'.join(sortеd_as_string)
Finally you have have to write result to destination.
Ok let's look it step by step what you want to do.
First: Read some integers out of a textfile.
Pythonic Version:
fileNumbers = [int(line) for line in open(r'inner.txt', 'r').readlines()]
Easy to get version:
fileNumbers = list()
with open(r'inner.txt', 'r') as fh:
for singleLine in fh.readlines():
fileNumbers.append(int(singleLine))
What it does:
Open the file
Read each line, convert it to int (because readlines return string values) and append it to the list fileNumbers
Second: Sort the list
fileNumbers.sort()
What it does:
The sort function sorts the list by it's value e.g. [5,3,2,4,1] -> [1,2,3,4,5]
Third: Write it to a new textfile
with open(r'outter.txt', 'a') as fh:
[fh.write('{0}\n'.format(str(entry))) for entry in fileNumbers]

Trouble sorting a list with python

I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).

Categories