Python list in lists from datafile with different layouts - python

Im a little stuck here. I'm trying to read a data file in Python 3.
I want to make a list of lists
*The first 36 lines:
each line is a list that's appended to the main list
f = open("a.data","r")
h = []
a = []
for word in range(0,797):
g = f.readline()
h.append(g.strip())
a.append(h)
h = []
But from the 37th line and beyond:
I need a loop where this happens:
The new line is a white line, pass
the next 4 lines should go into a new list 'h' and append to 'h' to 'a'
The thing is that readline() acts crazy for everything I tried
Any suggestions?
Thanks in advance.
ps the strings in the 4 lines are divided by a ;

Try this:
import re
with open('a.data', 'r') as f:
lst = re.split(';|\n{1,2}', f.read())
length = 36
lstoflst = [lst[i:i+length] for i in range(0, len(lst)-1, length)]
print(lstoflst)
I read the whole list, split at the newline and semicolon, and make a list of list with a list comprehension.
Please consider a better data format for your next report, like csv if possible.

Related

How to read specific parts of a input file in Python 3?

Let's say I have an input file with the following data:
50 50
A
B
C
D
I know that I can extract the first line using the map function as follow:
x,y= map(int, input().split())
But I am unsure how I can retrieve the next 4 lines and put them into a list. I tried using the splitlines() function, since each value is on a seperate line, but that only returns the first value.
strings = input().splitlines()
How can I choose what parts of the input file I want to read and then store them in respective variables?
Open the file, read all the lines into a list, do what you want with them
with open("[filename]", "r") as f:
lines = list(map(lambda l: l.strip(), f.readlines()))
# do whatever with the lines here
# use lines.pop(0) if you want to remove the line from the list

Spilting a large list of string and creating a list of the results

I have a large list of strings. Each string has a number of segments separated by a ";":
'1,2,23,17,-1006,0.20;1,3,3,2258,-1308,0.72;'
I want to split each string by the ";" and save the resulting list.
I am currently using:
player_parts = []
for line in playerinf:
parts = line.split(";")
player_parts = player_parts + parts
Is there a faster way to do this?
If I understand you correctly, you can try itertools.chain and unpacking a list comprehension:
from itertools import chain
lines = ['1,2,23,17,-1006,0.20;1,3,3,2258,-1308,0.72;', '2,3,34,56,-2134,0.50;2,4,7,2125,-3408,0.56;']
parts = list(chain(*[line.split(';')[:-1] for line in lines]))
parts
# ['1,2,23,17,-1006,0.20',
# '1,3,3,2258,-1308,0.72',
# '2,3,34,56,-2134,0.50',
# '2,4,7,2125,-3408,0.56']
I added a [:-1] to drop the last empty element of the split(';'). If however you need that empty element, just remove [:-1].
Since chain runs on compiled code it should be much faster than the python interpreter.
The run time for 10000 lines are:
using chain: 0.34399986267089844s
using your method: > 240.234s # (I didn't want to wait any more)
Every time you do player_parts = player_parts + parts, you're combining two lists into a new list and assigning that list to player_parts. That's very inefficient. Doing player_parts.extend(parts) would greatly improve performance, since it's adding the contents to the end of the original player_parts list.
However, it looks like you may be adding some empty strings to the player_parts list. So let's see if there's a better way.
It sounds like you have a file like this:
1,2,23,17,-1006,0.20;1,3,3,2258,-1308,0.72;
1,2,23,17,-1006,0.20;1,3,3,2258,-1308,0.72
1,2,23,17,-1006,0.20;1,3,3,2258,-1308,0.72;
And you want this result:
['1,2,23,17,-1006,0.20', '1,3,3,2258,-1308,0.72', '1,2,23,17,-1006,0.20',
'1,3,3,2258,-1308,0.72', '1,2,23,17,-1006,0.20', '1,3,3,2258,-1308,0.72']
So this should work:
f = open('infile', 'r')
player_parts = []
for line in f: # For each line in the file
for segment in line.split(';'): # For each segment in the line
if segment.strip(): # If the segment has anything in it besides whitespace
player_parts.append(segment) # Add it to the end of the list
If you're comfortable with comprehensions, you can do this:
f = open('infile', 'r')
player_parts = []
for line in f:
player_parts.extend(segment for segment in line.split(';') if segment.strip())
As far as I know list comprehensions are always a good approach if speed is important.
player_parts = [line.split(';') for line in playerinf]

How to convert txt file into 2d array of each char

I am trying to read a text file I created, which looks like this:
small.txt
%%%%%%%%%%%%%%%%%%%%%%%
%eeeeeee%eeeee%eeeee%G%
%%%e%e%%%%%e%e%%%e%e%e%
%e%e%eeeeeee%eee%e%eee%
%e%e%e%e%%%e%%%e%e%%%e%
%eeeee%eee%eeeeeeeee%e%
%e%%%e%e%e%e%e%e%%%%%e%
%e%e%eee%e%e%eeeeeee%e%
%e%e%e%%%e%%%%%e%e%%%e%
%Pee%eeeeeeeee%e%eeeee%
%%%%%%%%%%%%%%%%%%%%%%%
I want to create a a 2D array board[21][11] in the specific situation.
I want each char to be in a cell, because I want to implement BFS and other algorithms to reach a specific path, it's a kind of Pacman game.
Here is my code:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = rec.split()
print chars
inner_list = []
for each in chars:
inner_list.append(each)
output_list.append(inner_list)
print output_list
As you see the output i get now is [[%%%%%%%%%%%%%%%%%%%%%%%]]
You can just do:
with open('small.txt') as f:
board = f.readlines()
The file.readlines() method will return a list of strings, which you can then use as a 2D array:
board[1][5]
>>> 'e'
Note, that with this approach, the newline characters ('\n') will be put into each row at the last index. To get rid of them, you can use str.rstrip:
board = [row.rstrip('\n') for row in board]
As another answer noted, the line strings are already indexable by integer, but if you really want a list of lists:
array = [list(line.strip()) for line in f]
That removes the line endings and converts each string to a list.
There are a few problems with your code:
you try to split lines into lists of chars using split, but that will only split at spaces
assuming your indentation is correct, you are only ever treating the last value of chars in your second loop
that second loop just wraps each of the (not splitted) lines in chars (which due to the previous issue is only the last one) into a list
Instead, you can just convert str to list...
>>> list("abcde")
['a', 'b', 'c', 'd', 'e']
... and put those into output_list directly. Also, don't forget to strip the \n:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = list(rec.strip())
output_list.append(chars)
Or using with for autoclosing and a list-comprehension:
with open("small.txt") as f:
output_list = [list(line.strip()) for line in f]
Note, however, that is you do not want to change the values in that grid, you do not have to convert to a list of lists of chars at all; a list of strings will work just as well.
output_list = list(map(str.strip, f))

generating list by reading from file

i want to generate a list of server addresses and credentials reading from a file, as a single list splitting from newline in file.
file is in this format
login:username
pass:password
destPath:/directory/subdir/
ip:10.95.64.211
ip:10.95.64.215
ip:10.95.64.212
ip:10.95.64.219
ip:10.95.64.213
output i want is in this manner
[['login:username', 'pass:password', 'destPath:/directory/subdirectory', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
i tried this
with open('file') as f:
credentials = [x.strip().split('\n') for x in f.readlines()]
and this returns lists within list
[['login:username'], ['pass:password'], ['destPath:/directory/subdir/'], ['ip:10.95.64.211'], ['ip:10.95.64.215'], ['ip:10.95.64.212'], ['ip:10.95.64.219'], ['ip:10.95.64.213']]
am new to python, how can i split by newline character and create single list. thank you in advance
You could do it like this
with open('servers.dat') as f:
L = [[line.strip() for line in f]]
print(L)
Output
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211', 'ip:10.95.64.215', 'ip:10.95.64.212', 'ip:10.95.64.219', 'ip:10.95.64.213']]
Just use a list comprehension to read the lines. You don't need to split on \n as the regular file iterator reads line by line. The double list is a bit unconventional, just remove the outer [] if you decide you don't want it.
I just noticed you wanted the list of ip addresses joined in one string. It's not clear as its off the screen in the question and you make no attempt to do it in your own code.
To do that read the first three lines individually using next then just join up the remaining lines using ; as your delimiter.
def reader(f):
yield next(f)
yield next(f)
yield next(f)
yield ';'.join(ip.strip() for ip in f)
with open('servers.dat') as f:
L2 = [[line.strip() for line in reader(f)]]
For which the output is
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
It does not match your expected output exactly as there is a typo 'destPath:/directory/subdirectory' instead of 'destPath:/directory/subdir' from the data.
This should work
arr = []
with open('file') as f:
for line in f:
arr.append(line)
return [arr]
You could just treat the file as a list and iterate through it with a for loop:
arr = []
with open('file', 'r') as f:
for line in f:
arr.append(line.strip('\n'))

Trouble sorting a list with python

I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).

Categories