How to convert txt file into 2d array of each char - python

I am trying to read a text file I created, which looks like this:
small.txt
%%%%%%%%%%%%%%%%%%%%%%%
%eeeeeee%eeeee%eeeee%G%
%%%e%e%%%%%e%e%%%e%e%e%
%e%e%eeeeeee%eee%e%eee%
%e%e%e%e%%%e%%%e%e%%%e%
%eeeee%eee%eeeeeeeee%e%
%e%%%e%e%e%e%e%e%%%%%e%
%e%e%eee%e%e%eeeeeee%e%
%e%e%e%%%e%%%%%e%e%%%e%
%Pee%eeeeeeeee%e%eeeee%
%%%%%%%%%%%%%%%%%%%%%%%
I want to create a a 2D array board[21][11] in the specific situation.
I want each char to be in a cell, because I want to implement BFS and other algorithms to reach a specific path, it's a kind of Pacman game.
Here is my code:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = rec.split()
print chars
inner_list = []
for each in chars:
inner_list.append(each)
output_list.append(inner_list)
print output_list
As you see the output i get now is [[%%%%%%%%%%%%%%%%%%%%%%%]]

You can just do:
with open('small.txt') as f:
board = f.readlines()
The file.readlines() method will return a list of strings, which you can then use as a 2D array:
board[1][5]
>>> 'e'
Note, that with this approach, the newline characters ('\n') will be put into each row at the last index. To get rid of them, you can use str.rstrip:
board = [row.rstrip('\n') for row in board]

As another answer noted, the line strings are already indexable by integer, but if you really want a list of lists:
array = [list(line.strip()) for line in f]
That removes the line endings and converts each string to a list.

There are a few problems with your code:
you try to split lines into lists of chars using split, but that will only split at spaces
assuming your indentation is correct, you are only ever treating the last value of chars in your second loop
that second loop just wraps each of the (not splitted) lines in chars (which due to the previous issue is only the last one) into a list
Instead, you can just convert str to list...
>>> list("abcde")
['a', 'b', 'c', 'd', 'e']
... and put those into output_list directly. Also, don't forget to strip the \n:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = list(rec.strip())
output_list.append(chars)
Or using with for autoclosing and a list-comprehension:
with open("small.txt") as f:
output_list = [list(line.strip()) for line in f]
Note, however, that is you do not want to change the values in that grid, you do not have to convert to a list of lists of chars at all; a list of strings will work just as well.
output_list = list(map(str.strip, f))

Related

Nested lists in python containing a single string and not single letters

I need to load text from a file which contains several lines, each line contains letters separated by coma, into a 2-dimensional list. When I run this, I get a 2 dimensional list, but the nested lists contain single strings instead of separated values, and I can not iterate over them. how do I solve this?
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split()
matrix.append(line)
return matrix
result:
[['a,p,p,l,e'], ['a,g,o,d,o'], ['n,n,e,r,t'], ['g,a,T,A,C'], ['m,i,c,s,r'], ['P,o,P,o,P']]
I need each letter in the nested lists to be a single string so I can use them.
thanks in advance
split() function splits on white space by default. You can fix this by passing the string you want to split on. In this case, that would be a comma. The code below should work.
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
matrix.append(line)
return matrix
The input format you described conforms to CSV format. Python has a library just for reading CSV files. If you just want to get the job done, you can use this library to do the work for you. Here's an example:
Input(test.csv):
a,string,here
more,strings,here
Code:
>>> import csv
>>> lines = []
>>> with open('test.csv') as file:
... reader = csv.reader(file)
... for row in reader:
... lines.append(row)
...
>>>
Output:
>>> lines
[['a', 'string', 'here'], ['more', 'strings', 'here']]
Using the strip() function will get rid of the new line character as well:
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
line[-1] = line[-1].strip()
matrix.append(line)
return matrix

How to get the unique elements of the first column of a text file?

I am processing a text file whose columns are separated by tabs .I want to get all the unique values of the first column.
Text Input e.g:
"a\t\xxx\t..\zzz\n
a\t\xxx\t....\n
b\t\xxx\t.....\n
b\t\xxx\t.....\n
c\t\xxx\t.....\n"
So in this case i would like to get an array: uniques=["a","b","c"]
Code:
def getData(fin):
input = open(fin, 'r',encoding='utf-16')
headers=input.readline().split()
lines=input.readlines()[1:]
uniques=[(lambda line: itertools.takewhile(lambda char: char!='\t',line))for line in lines]
Instead of the desired values i get a list of :
<function getData.<locals>.<listcomp>.<lambda> at 0x000000000C46DB70>
I have already read this article Python: Lambda function in List Comprehensions and I unserstood that you have to use parenthesis to ensure the right execution order.Still i get the same result.
You can just use split():
def getData(fin):
input = open(fin, 'r',encoding='utf-16')
headers=input.readline().split()
lines=input.readlines()[1:]
uniques=[line.split('\t')[0] for line in lines]
Note that this will not produce unique values, it will produce every line's value. To make this unique, do:
uniques = list(set(uniques))
May be csv can simplify your problem:
>>> import csv
>>> with open(fin, 'rb') as csvfile:
... spamreader = csv.reader(csvfile, delimiter='\t')
... list(set( row[0] for row in spamreader ))
['a', 'c', 'b']
When looking for unique elements set() is a good solution:
def getData(fin):
with open(fin, 'r') as input:
first_cols = list(set([line.split("\\")[0] for line in input.readlines()]))
You can use regex:
import re
s = """
a\txxx\t..\zzz\n
a\txxx\t....\n
b\txxx\t.....\n
b\txxx\t.....\n
c\txxx\t.....\n"
"""
new_data = re.findall('(?<=\n\s\s\s)[a-zA-Z]', s)
uniques = [a for i, a in enumerate(new_data) if a not in new_data[:i]]
Output:
['a', 'b', 'c']
Try Pandas
import pandas as pd
df = pd.read_csv(filename, sep='\t')
uniques = df[df.columns[0]].unique()
After
lines=input.readlines()[1:] # reads all lines after the header
# you read already and skips the 1st one
uniques = list(set(x.split('\t')[0] for x in lines))
Caveat: This might reorder your uniques
Your list comprehension needs to start with an expression rather than a lambda. Currently your code just creates a list of lambdas (note that the outermost parentheses enclose a lambda, not an expression). You could fix it like this:
def getData(fin):
input = open(fin, 'r',encoding='utf-16')
headers=input.readline().split()
lines=input.readlines()[1:]
uniques=[itertools.takewhile(lambda char: char!='\t',line) for line in lines]
There are still a couple of bugs in this code: (1) by the time you get to readlines(), the first row will already have been removed from the input buffer, so you should probably drop the [1:]. (2) your uniques variable will have all the entries from the first column, including duplicates.
You could fix these bugs and streamline the code a little more like this:
with open(fin, 'r',encoding='utf-16') as input:
headers=input.next().split('\t')
uniques = set(line.split('\t')[0] for line in input)
uniques = list(uniques)
If order doesn't matter then try this approach,
Open the file and then just split the words and as you said first column is always what you want to just take what you need and leave the remaining content.
with open('file.txt','r') as f:
print(set([list(line)[0] for line in f]))
output:
{'b', 'a', 'c'}

How to read a txt file as list of lists in and access it easily in Python?

I have a txt file that goes like this:
1 13
#abor# #e#tun###agy#szel#2# #o##h#d#g ##rkasn#o#oka# #a#tunk e####a#akn##$#$#$##$$$$$$####
1 19
ta###t##ertunk ##gy #zel#####ok hide##f#r##sn#omo#at ##ttu## e#y patak#al$#$$$$$###$$$$$$$
6 19
0/# a #a#akon##uli ##mb## ##l#kok jatszo#####del ####l$$$#$$$$$#$#$#$#$$###$##$$##$$$$#$$$
5 17
a pat#k#a###ar##sok #em#j#l#nt##ztek nyomok volt#k$$$$#$$$#$$$#$#$##$$$$###$####$$#$$$$#$#
But continues for dozens of line.
I would like to read this txt file and store it in a list of lists, so that I can easily access each character like list_name[x][y] check if it is a #, etc.
What would be the best way to do this?
To ensure the file is closed after you've read it, use with open(). This method gives you a list of chars which can be accessed as you requested. You can
use:
with open('file.txt', 'r') as f_in:
my_list = [line for line in f_in]
# do something with my_list
['1 13\n',
'#abor# #e#tun###agy#szel#2# #o##h#d#g ##rkasn#o#oka# #a#tunk e####a#akn##$#$#$##$$$$$$####\n',
'1 19\n',
'ta###t##ertunk ##gy #zel#####ok hide##f#r##sn#omo#at ##ttu## e#y patak#al$#$$$$$###$$$$$$$\n',
...]
Calling my_list[1][0] then returns # - the first character in the second line.
This leaves the end of line character at the end of each line which you can remove by using my_list = [line.strip() for line in f_in]
Try:
list_name = [list(line) for line in open('myfile.txt')]
Then list_name[n] will be a list of characters from the nth line.
However, please note that, as other answerers have pointed out, strings share list syntax for getting the values (for string s you can use s[n] to get the nth element). Slicing is slightly different for list and strings: for string s = 'abcd', s[1:3] is 'bc', but for s = ['a', 'b', 'c', 'd'], s[1:3] is ['b', 'c']. So the other answers may be more fit for your purpose, depending on what you want to achieve specifically.
At first glance, I would do this using list comprehension:
>>> matrix = open('myfile.txt').read()
>>> matrix = [item.split() for item in matrix.split('\n')[:-1]]
You could also put this into a function pretty easily to take a file name and return the matrix.
You don't have to cast the string to list to be able to address the character as line[index].
The only thing you need is to read the file and to run the check function.
def check_the_line(line):
assert line[5] == "#"
with open('file.txt', 'r') as f_in:
for line in f_in:
check_the_line(line)

rstrip not working as expected (Python 2.7)

I have the following code:
file = open("file", "r")
array = file.readlines()
stats = [1, 1, 1, 1, 1] # creating an array to fill
print array
sh1 = array[1] # breaking the array extracted from the text file up for editing
sh2 = array[2]
sh3 = array[3]
sh4 = array[4]
stats[0] = string.rstrip(sh1[1])
stats[1] = string.rstrip(sh2[1])
stats[2] = string.rstrip(sh3[1])
stats[3] = string.rstrip(sh4[1])
print stats
I was expecting it to strip the newlines from the array extracted from the text file and place the new data into a separate array. What is instead happening is I'm having a seemingly random amount of characters stripped from either end of my variables. Please could someone explain what I've done wrong?
sh1,sh2,sh3,sh4 are strings, so sh1[1] is the second character from the string.
rstrip will remove trailing whitespace, so you will put either 1 or 0 character strings into your result array.
I suspect you want something like:
stats = []
for line in open("file").readlines():
line = line.rstrip()
stats.append(line)
print stats
or all on one line:
print [ l.rstrip() for l in open("file").readlines() ]
Use list-comprehension.
array = file.readlines()
print [i.rstrip() for i in array]
You should open the file using with, you don't need to call readlines first. You can simply iterate over the file object in a list comprehension calling rstrip on each line:
with open("file") as f: # with closes your file automatically
stats = [line.rstrip() for line in f]
Why your code removes random characters is because you are passing random characters to remove, you are passing the second character from the second, third,fourth and fifth lines respectively to rstrip and stripping from lines 1,2,3 and 4 so depending on what the strings end with and what you passed different chars will be removed. You can pass no substring to remove any whitespace or specify certain characters:
In [3]: "foobar".rstrip("bar")
Out[3]: 'foo'
In [4]: "foobar \n".rstrip()
Out[4]: 'foobar'
There is also no way you are removing data from the front of the string unless you are completely stripping the string. Lastly if you actually want to skip the first line and start at line 2 you would simply have to call next(f) on the file object before you iterate in the comprehension.

read line from file but store as list (python)

i want to read a specific line in a textfile and store the elements in a list.
My textfile looks like this
'item1' 'item2' 'item3'
I always end up with a list with every letter as an element
what I tried
line = file.readline()
for u in line:
#do something
line = file.readline()
for u in line.split():
# do stuff
This assumes the items are split by whitespace.
split the line by spaces and then add them to the list:
# line = ('item1' 'item2' 'item3') example of line
listed = []
line = file.readline()
for u in line.split(' '):
listed.append(u)
for e in listed:
print(e)
What you have there will read one whole line in, and then loop through each character that was in that line. What you probably want to do is split that line into your 3 items. Provided they are separated by a space, you could do this:
line = file.readline() # Read the line in as before
singles = line.split(' ') # Split the line wherever there are spaces found. You can choose any character though
for item in singles: # Loop through all items, in your example there will be 3
#Do something
You can reduce the number of lines (and variables) here by stringing the various functions used together, but I left them separate for ease of understanding.
You can try:
for u in line.split():
Which assumes there are whitespaces between each item. Otherwise you'll simply iterate over a str and thus iterate character by character.
You might also want to do:
u = u.strip('\'')
to get rid of the '
I'd use with, re and basically take anything between apostrophes... (this'll work for strings that have spaces inside them (eg: item 1 item 2, but obviously nested or string escape sequences won't be caught).
import re
with open('somefile') as fin:
print re.findall("'(.*?)'", next(fin))
# ['item1', 'item2', 'item3']
If you want all the characters of the line in a list you could try this.
This use double list comprehension.
with open('stackoverflow.txt', 'r') as file:
charlist = [c for word in file.readline().split(' ') for c in word ]
print(charlist)
If you want to get rid off some char, you can apply some filter for example; I don't want the char = ' in my list.
with open('stackoverflow.txt', 'r') as file:
charlist = [c for word in file.readline().split(' ') for c in word if(c != "'")]
print(charlist)
If this double list comprehension looks strange is the same of this.
with open('stackoverflow.txt', 'r') as file:
charlist = []
line = file.readline()
for word in line.split(' '):
for c in word:
if(c != "'"):
charlist.append(c)
print(charlist)

Categories