read line from file but store as list (python) - python

i want to read a specific line in a textfile and store the elements in a list.
My textfile looks like this
'item1' 'item2' 'item3'
I always end up with a list with every letter as an element
what I tried
line = file.readline()
for u in line:
#do something

line = file.readline()
for u in line.split():
# do stuff
This assumes the items are split by whitespace.

split the line by spaces and then add them to the list:
# line = ('item1' 'item2' 'item3') example of line
listed = []
line = file.readline()
for u in line.split(' '):
listed.append(u)
for e in listed:
print(e)

What you have there will read one whole line in, and then loop through each character that was in that line. What you probably want to do is split that line into your 3 items. Provided they are separated by a space, you could do this:
line = file.readline() # Read the line in as before
singles = line.split(' ') # Split the line wherever there are spaces found. You can choose any character though
for item in singles: # Loop through all items, in your example there will be 3
#Do something
You can reduce the number of lines (and variables) here by stringing the various functions used together, but I left them separate for ease of understanding.

You can try:
for u in line.split():
Which assumes there are whitespaces between each item. Otherwise you'll simply iterate over a str and thus iterate character by character.
You might also want to do:
u = u.strip('\'')
to get rid of the '

I'd use with, re and basically take anything between apostrophes... (this'll work for strings that have spaces inside them (eg: item 1 item 2, but obviously nested or string escape sequences won't be caught).
import re
with open('somefile') as fin:
print re.findall("'(.*?)'", next(fin))
# ['item1', 'item2', 'item3']

If you want all the characters of the line in a list you could try this.
This use double list comprehension.
with open('stackoverflow.txt', 'r') as file:
charlist = [c for word in file.readline().split(' ') for c in word ]
print(charlist)
If you want to get rid off some char, you can apply some filter for example; I don't want the char = ' in my list.
with open('stackoverflow.txt', 'r') as file:
charlist = [c for word in file.readline().split(' ') for c in word if(c != "'")]
print(charlist)
If this double list comprehension looks strange is the same of this.
with open('stackoverflow.txt', 'r') as file:
charlist = []
line = file.readline()
for word in line.split(' '):
for c in word:
if(c != "'"):
charlist.append(c)
print(charlist)

Related

read words from file, line by line and concatenate to paragraph

I have a really long list of words that are on each line. How do I make a program that takes in all that and print them all side by side?
I tried making the word an element of a list, but I don't know how to proceed.
Here's the code I've tried so far:
def convert(lst):
return([i for item in lst for i in item.split()])
lst = [''' -The list of words come here- ''']
print(convert(lst))
If you already have the words in a list, you can use the join() function to concatenate them. See https://docs.python.org/3/library/stdtypes.html#str.join
words = open('your_file.txt').readlines()
separator = ' '
print(separator.join(words))
Another, a little bit more cumbersome method would be to print the words using the builtin print() function but suppress the newline that print() normally adds automatically to the end of your argument.
words = open('your_file.txt').readlines()
for word in words:
print(word, end=' ')
Try this, and example.txt just has a list of words going down line by line.
with open("example.txt", "r") as a_file:
sentence = ""
for line in a_file:
stripped_line = line.strip()
sentence = sentence + f"{stripped_line} "
print(sentence)
If your input file is really large and you cant fit it all in memory, you can read the words lazy and write them to disk instead of holding the whole output in memory.
# create a generator that yields each individual line
lines = (l for l in open('words'))
with open("output", "w+") as writer:
# read the file line by line to avoid memory issues
while True:
try:
line = next(lines)
# add to the paragraph in the out file
writer.write(line.replace('\n', ' '))
except StopIteration:
break
You can check the working example here: https://replit.com/#bluebrown/readwritewords#main.py

How to convert txt file into 2d array of each char

I am trying to read a text file I created, which looks like this:
small.txt
%%%%%%%%%%%%%%%%%%%%%%%
%eeeeeee%eeeee%eeeee%G%
%%%e%e%%%%%e%e%%%e%e%e%
%e%e%eeeeeee%eee%e%eee%
%e%e%e%e%%%e%%%e%e%%%e%
%eeeee%eee%eeeeeeeee%e%
%e%%%e%e%e%e%e%e%%%%%e%
%e%e%eee%e%e%eeeeeee%e%
%e%e%e%%%e%%%%%e%e%%%e%
%Pee%eeeeeeeee%e%eeeee%
%%%%%%%%%%%%%%%%%%%%%%%
I want to create a a 2D array board[21][11] in the specific situation.
I want each char to be in a cell, because I want to implement BFS and other algorithms to reach a specific path, it's a kind of Pacman game.
Here is my code:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = rec.split()
print chars
inner_list = []
for each in chars:
inner_list.append(each)
output_list.append(inner_list)
print output_list
As you see the output i get now is [[%%%%%%%%%%%%%%%%%%%%%%%]]
You can just do:
with open('small.txt') as f:
board = f.readlines()
The file.readlines() method will return a list of strings, which you can then use as a 2D array:
board[1][5]
>>> 'e'
Note, that with this approach, the newline characters ('\n') will be put into each row at the last index. To get rid of them, you can use str.rstrip:
board = [row.rstrip('\n') for row in board]
As another answer noted, the line strings are already indexable by integer, but if you really want a list of lists:
array = [list(line.strip()) for line in f]
That removes the line endings and converts each string to a list.
There are a few problems with your code:
you try to split lines into lists of chars using split, but that will only split at spaces
assuming your indentation is correct, you are only ever treating the last value of chars in your second loop
that second loop just wraps each of the (not splitted) lines in chars (which due to the previous issue is only the last one) into a list
Instead, you can just convert str to list...
>>> list("abcde")
['a', 'b', 'c', 'd', 'e']
... and put those into output_list directly. Also, don't forget to strip the \n:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = list(rec.strip())
output_list.append(chars)
Or using with for autoclosing and a list-comprehension:
with open("small.txt") as f:
output_list = [list(line.strip()) for line in f]
Note, however, that is you do not want to change the values in that grid, you do not have to convert to a list of lists of chars at all; a list of strings will work just as well.
output_list = list(map(str.strip, f))

Get words from first line until first space and without first character in python

I have a textfile where I want to extract the first word, but without the first character and put it into a list. Is there a way in python to do this without using regex?
A text example of what I have looks like:
#blabla sjhdiod jszncoied
Where I want the first word in this case blabla without the #.
If regex is the only choice, then how will the regex look like?
This should do the trick:
l = []
for line in open('file'):
l.append(line.split()[0][1:])
Edit: If you have empty lines, this will throw an error. You will have to check for empty lines. Here is a possible solution:
l = []
for line in open('file'):
if line.strip():
l.append(line.split()[0][1:])
Pythonic way:
my_list = [line.split(' ', 1)[0][1:] for line in open('file') if line.startswith('#')]
a textfile where I want to extract the first word, but without the
first character and put it into a list
result = []
with open('file.txt', 'r') as f:
l = next(f).strip() # getting the 1st line
result.append(l[1:l.find(' ')])
print(result)
The output:
['blabla']
Simple enough if your input is so regular:
s = "#blabla sjhdiod jszncoied"
s.split()[0].strip('#')
blabla
split splits on whitespace by default. Take the first token and strip away '#'.

How to convert a list into float for using the '.join' function?

I have to compress a file into a list of words and list of positions to recreate the original file. My program should also be able to take a compressed file and recreate the full text, including punctuation and capitalization, of the original file. I have everything correct apart from the recreation, using the map function my program can't convert my list of positions into floats because of the '[' as it is a list.
My code is:
text = open("speech.txt")
CharactersUnique = []
ListOfPositions = []
DownLine = False
while True:
line = text.readline()
if not line:
break
TwoList = line.split()
for word in TwoList:
if word not in CharactersUnique:
CharactersUnique.append(word)
ListOfPositions.append(CharactersUnique.index(word))
if not DownLine:
CharactersUnique.append("\n")
DownLine = True
ListOfPositions.append(CharactersUnique.index("\n"))
w = open("List_WordsPos.txt", "w")
for c in CharactersUnique:
w.write(c)
w.close()
x = open("List_WordsPos.txt", "a")
x.write(str(ListOfPositions))
x.close()
with open("List_WordsPos.txt", "r") as f:
NewWordsUnique = f.readline()
f.close()
h = open("List_WordsPos.txt", "r")
lines = h.readlines()
NewListOfPositions = lines[1]
NewListOfPositions = map(float, NewListOfPositions)
print("Recreated Text:\n")
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions))
print(recreation)
The error I get is:
Task 3 Code.py", line 42, in <genexpr>
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions))
ValueError: could not convert string to float: '['
I am using Python IDLE 3.5 (32-bit). Does anyone have any ideas on how to fix this?
Why do you want to turn the position values in the list into floats, since they list indices, and those must be integer? I suspected this might be an instance of what is called the XY Problem.
I also found your code difficult to understand because you haven't followed the PEP 8 - Style Guide for Python Code. In particular, with how many (although not all) of the variable names are CamelCased, which according to the guidelines, should should be reserved for the class names.
In addition some of your variables had misleading names, like CharactersUnique, which actually [mostly] contained unique words.
So, one of the first things I did was transform all the CamelCased variables into lowercase underscore-separated words, like camel_case. In several instances I also gave them better names to reflect their actual contents or role: For example: CharactersUnique became unique_words.
The next step was to improve the handling of files by using Python's with statement to ensure they all would be closed automatically at the end of the block. In other cases I consolidated multiple file open() calls into one.
After all that I had it almost working, but that's when I discovered a problem with the approach of treating newline "\n" characters as separate words of the input text file. This caused a problem when the file was being recreated by the expression:
" ".join(NewWordsUnique[pos] for pos in (NewListOfPositions))
because it adds one space before and after every "\n" character encountered that aren't there in the original file. To workaround that, I ended up writing out the for loop that recreates the file instead of using a list comprehension, because doing so allows the newline "words" could be handled properly.
At any rate, here's the resulting rewritten (and working) code:
input_filename = "speech.txt"
compressed_filename = "List_WordsPos.txt"
# Two lists to represent contents of input file.
unique_words = ["\n"] # preload with newline "word"
word_positions = []
with open(input_filename, "r") as input_file:
for line in input_file:
for word in line.split():
if word not in unique_words:
unique_words.append(word)
word_positions.append(unique_words.index(word))
word_positions.append(unique_words.index("\n")) # add newline at end of each line
# Write representations of the two data-structures to compressed file.
with open(compressed_filename, "w") as compr_file:
words_repr = " ".join(repr(word) for word in unique_words)
compr_file.write(words_repr + "\n")
positions_repr = " ".join(repr(posn) for posn in word_positions)
compr_file.write(positions_repr + "\n")
def strip_quotes(word):
"""Strip the first and last characters from the string (assumed to be quotes)."""
tmp = word[1:-1]
return tmp if tmp != "\\n" else "\n" # newline "words" are special case
# Recreate input file from data in compressed file.
with open(compressed_filename, "r") as compr_file:
line = compr_file.readline()
new_unique_words = list(map(strip_quotes, line.split()))
line = compr_file.readline()
new_word_positions = map(int, line.split()) # using int, not float here
words = []
lines = []
for posn in new_word_positions:
word = new_unique_words[posn]
if word != "\n":
words.append(word)
else:
lines.append(" ".join(words))
words = []
print("Recreated Text:\n")
recreation = "\n".join(lines)
print(recreation)
I created my own speech.txt test file from the first paragraph of your question and ran the script on it with these results:
Recreated Text:
I have to compress a file into a list of words and list of positions to recreate
the original file. My program should also be able to take a compressed file and
recreate the full text, including punctuation and capitalization, of the
original file. I have everything correct apart from the recreation, using the
map function my program can't convert my list of positions into floats because
of the '[' as it is a list.
Per your question in the comments:
You will want to split the input on spaces. You will also likely want to use different data structures.
# we'll map the words to a list of positions
all_words = {}
with open("speech.text") as f:
data = f.read()
# since we need to be able to re-create the file, we'll want
# line breaks
lines = data.split("\n")
for i, line in enumerate(lines):
words = line.split(" ")
for j, word in enumerate(words):
if word in all_words:
all_words[word].append((i, j)) # line and pos
else:
all_words[word] = [(i, j)]
Note that this does not yield maximum compression as foo and foo. count as separate words. If you want more compression, you'll have to go character by character. Hopefully now you can use a similar approach to do so if desired.

Read a file and create arrays with the words of each column

I've the following file, salida.txt which has different number of columns in this example, just 2.
cil HUF, M1 NSS,
442, 1123,
20140130, 2014012,
20140131, 2014014,
I want to read the file and add each column into a new array. I wan't to have this:
['cli HUF', '442', '20140130', '20140131']
[' M1 NSS', '1123', '2014012', '2014014']
What I've tried so far:
file = open('salida.txt', 'r')
for line in file:
// add them to the arrays
I'm having problems to handle the number of arrays (it's not always 2, depends on the number of columns of the file) and taking each word from the line to add in the proper array. If I to put inside de loop print line[0] it prints me the entire line, and I want to handle it word by word.
arrays = []
with open('salida.txt', 'r') as wordfile:
for line in wordfile:
# Split the line on commas.
words = line.split(',')
for count, word in enumerate(words):
# Remove any whitespace.
word = word.strip()
# That might leave a blank string, e.g. at the end.
if word:
# Do we need to add another array to our list of arrays?
if count == len(arrays):
arrays.append([])
arrays[count].append(word)
print arrays
Strip the last commas, and then split the line in the center commas:
list1, list2 = [], []
file = open('salida.txt', 'r')
for line in file:
w1, w2 = line.strip(',').split(', ')
list1.append(w1)
list2.append(w2)
import csv
with open('salida.txt') as f:
whatYouWant = zip(*list(csv.reader(f)))[:-1]
There you go:
file = open('salida.txt', 'r')
lines = file.readlines()
file.close()
arrays = []
words = lines[0].split(",")
for i in range(0,len(words)):
arrays.append([words[i]])
for i in range(1,len(lines)):
words = lines[i].split(",")
for j in range(0,len(words)):
arrays[j].append(words[j])

Categories