I want to transfer data from text file to array - python

i'm new here and also new with programming with python
as an exercise i have to read data (lat & lon) from a txt file with many rows and convert them into shapefile with QGIS
After reading i find a way to extract data into array, as step1, but i have soem issues..
I use the following code
X=[]
Y=[]
f = open('D:/test_data/test.txt','r')
for line in f:
triplets=f.readline().split() #error
X=X.append(triplets[0])
Y=Y.append(triplets[1])
f.close()
for i in X:
print X[i]
with error:
ValueError: Mixing iteration and read methods would lose data
Propably it's a warning for losing the rest rows but i really don't want them for now.

for line in f: already iterates through the lines in the file, reading as it goes along. As such, it should be:
for line in f:
triplets = line.split()
Alternatively, you could do as below, though I recommend the method above.
with open('D:/test_data/test.txt','r') as f:
content = f.readlines()
for line in content:
triplets = line.split()
# append()
See Reading and Writing Files in python for more info.
Also, append() does what it sounds like, so you don't need assignment.
X.append(triplets[0]) # not X=X.append(triplets[0)

line already is the line. Get the triplets by
triplets = line.split()

Related

How to remove specific lines from a text file, using Python

I have a text file with data in it. I am trying to create a python code that will format this data in a particular way so another code I have can read it as an input. So, I am trying to remove specific lines and columns, etc. The text file contains some information at the top of the file, and then columns and columns of numerical data below it. However, I only want the numerical data. I do not want the first 34 lines of other information written in the text file.
So, my current question is: How can I remove specific lines of a text file? And how do I print this result so I can see if it worked or not?
Before you come at me: Yes, I have looked up my question on StackOverflow. I still don't get it, and I don't know what I'm doing wrong. I need your help! :)
(Image of my text file for reference)
Since each of the lines I want to remove in the text file begin with a '#', maybe it would be possible to tell the code to remove all lines that begin with "#'. Or maybe I should just specify I want the first 34 lines deleted. Either way, I'm confused on how to do this.
Separated below are all my different attempts: (Don't worry, I have imported pandas for my attempts, it's just not written here.)
with open('EBSD_data.txt','r') as f:
lines = f.readlines()
with open('EBSD_data.txt','w') as f:
for lines in lines:
if line.strip("\n").startswith('#'):
f.write(lines)
with open('EBSD_data.txt', 'r') as E:
data = E.read().splitlines(True)
with open('EBSD_data.txt', 'w') as EB:
EB.writelines(data[34:])
print(EB.read)
with open('EBSD_data.txt', 'w') as data:
for lines in data:
if not lines.startswith('#'):
data.write(lines)
with open('EBSD_data.txt', 'r') as data2:
print(data2)
x = open('EBSD_data.txt')
for line in x:
if line.startswith('#'):
del(line)
x.head()
lines = []
with open(r'EBSD_data.txt','r') as fp:
lines = fp.readlines()
with open(r'EBSD_data.txt','w') as data:
for number, line in enumerate(lines):
if number not in [34]:
fp.write(line)
With some of these attempts, it'll look like it runs fine, but then I have issues with printing my results. The typical error I am encountering after trying to print my result is:
<_io.TextIOWrapper name='EBSD_data.txt' mode='w' encoding='UTF-8'>
Any guidance you could give would be great! Thank you! :)
I personally would always write to a new file and keep the original one...\
This should work:
with open('EBSD_data.txt', 'r') as data:
with open('EBSD_data_filtered.txt', 'w') as outfile:
for line in data:
if not line.startswith('#'):
outfile.write(line)
with open('EBSD_data_filtered.txt', 'r') as data2:
for line in data2:
print(line)
Instead of manually reading each line with a Python context manager, you can use read_csv, skipping the first 34 rows of the file (skiprows=34), and without reading any column names from the file (header=None):
import pandas as pd
df = pd.read_csv('EBSD_data.txt', skiprows=34, header=None)
# print first five rows to confirm expected result
print(df.head())
# Optional: write result back to a CSV file, without the default
# pandas integer index
df.to_csv('EBSD_data_filtered.txt', index=None)

When writing to a file, can you be specific about where you want to write?

I have a text file, which has the following:
20
15
10
And I have the following code:
test_file = open("test.txt","r")
n = 21
line1 = test_file.readline(1)
line2 = test_file.readline(2)
line3 = test_file.readline(3)
test_file.close()
line1 = int(line1)
line2 = int(line2)
line3 = int(line3)
test_file = open("test.txt","a")
if n > line1:
test_file.write("\n")
n = str(n)
test_file.write(n)
test_file.close()
This code checks if the variable 'n' is bigger than line 1. What I wanted it to do is if it is bigger than line 1, it should be written in a line before the previous line 1. However this code will write it at the bottom of the file. Is there anything I can do to write something where I want to and not at the bottom of the file?
Any help is appreciated.
You can put your whole data in a variable, edit that variable then overwrite the information in the file.
with open('test.txt', 'r') as file:
# read a list of lines into data
data = file.readlines()
# now change the 2nd line, note that you have to add a newline
data[1] = "42\t\n"
# and write everything back
with open('test.txt', 'w') as file:
file.writelines( data )
This is a short answer, implement your own algorithm to solve your own problem.
As correctly pointed out by Amadan in a comment, the only way to obtain this result is a complete rewrite of the file.
This, clearly depending on how strict your requirements are, is fairly inefficient.
If you want to understand more about inefficiency just imagine the actions you would have to manually take to write a new 1st line in a physical notebook page.
Since the 1st line is already written you would have to turn the page, write the new first line, then copy again all the lines from the old page and, finally, tear the 1st page out and have your perfect notebook with a perfect page again.
You are writing with pen so there is no possibility to delete, only a new page will do the trick.
That is quite some work!
This is - well, more or less - what Python is doing behind the scenes when it is opening for reading (the 'r' part in my examples below) and then opening for writing (the 'w' part) the same file again.
As a general idea imagine that when you see for loops there is a lot of work to do.
I will clumsily over-simplify saying that the more the for loops the slower the code (countless pages of paper have been written by brilliant minds on performances, I suggest you diving dive deeper and searching for "Big O notation" using your preferred search engine. Here's an example: https://www.freecodecamp.org/news/all-you-need-to-know-about-big-o-notation-to-crack-your-next-coding-interview-9d575e7eec4/).
A better solution would be to change your data file and make sure that the last value is also the most recent one.
Rewriting the file is as easy as writing an empty file, code and result are identical.
The trick here is that we have in memory (in the variables data and new_data) everything we need.
In data we store the whole content of the file before the change.
In new_data we can easily apply the needed modification because it is just a list containing a number and a newline (\n) for each list item.
Once new_data contains the data in the desired order all we need to do is write that list into a file.
Here's a possible solution, as close as possible to your code:
n = 21
with open('test.txt', 'r') as file:
data = file.readlines()
first_entry = int(data[0])
if (n > first_entry):
new_data = []
new_value = str(n) + "\n"
new_data.append(new_value)
for item in data:
new_data.append(item)
with open('test.txt', 'w') as file:
file.writelines(new_data)
Here's a more portable one:
def prepend_to_file_if_bigger_than_first_line(filename, value):
"""Checks if value is bigger than the one found in the 1st line of the specified file,
if true prepends it to the file
Args:
filename (str): The file name to check.
value (str): The value to check.
"""
with open(filename, 'r') as file:
data = file.readlines()
first_entry = int(data[0])
if (value > first_entry):
new_value = "{}\n".format(value)
new_data = []
new_data.append(new_value)
for old_value in data:
new_data.append(old_value)
with open(filename, 'w') as file:
file.writelines(new_data)
prepend_to_file_if_bigger_than_first_line("test.txt", 301)
As bonus some food for thought and exercises to learn:
What if instead of rewriting everything you just add a new line to the end of the page? Wouldn't it be more efficient and effective?
How would you re-implement my function above just to check the last line in file and append a new value?
Try bench-marking the prepend and the append solution, which one is best?

While reading text file some lines are not detected?

text_file.txt
I am getting the output for first print statement but not for second print statement.Please sugget me the correct code is there anything i have to encode or decode? please help me i m new to python3
Here's a more straightforward implementation of what you're trying to achieve. You can read the file into a Python list and reference each line by a Python list index
with open('text_file.txt','r') as f: # automatically closes the file
input_file = f.readlines() # Read all lines into a Python list
for line_num in range(len(input_file)):
if "INBOIS BERCUKAI" in input_file[line_num]:
print(input_file[line_num + 2]) # offset by any number you want
# same for other if statements

Python removing duplicates and saving the result

I am trying to remove duplicates of 3-column tab-delimited txt file, but as long as the first two columns are duplicates, then it should be removed even if the two has different 3rd column.
from operator import itemgetter
import sys
input = sys.argv[1]
output = sys.argv[2]
#Pass any column number you want, note that indexing starts at 0
ig = itemgetter(0,1)
seen = set()
data = []
for line in input.splitlines():
key = ig(line.split())
if key not in seen:
data.append(line)
seen.add(key)
file = open(output, "w")
file.write(data)
file.close()
First, I get error
key = ig(line.split())
IndexError: list index out of range
Also, I can't see how to save the result to output.txt
People say saving to output.txt is a really basic matter. But no tutorial helped.
I tried methods that use codec, those that use with, those that use file.write(data) and all didn't help.
I could learn MatLab quite easily. The online tutorial was fantastic and a series of Googling always helped a lot.
But I can't find a helpful tutorial of Python yet. This is obviously because I am a complete novice. For complete novices like me, what would be the best tutorial with 1) comprehensiveness AND 2) lots of examples 3) line by line explanation that dosen't leave any line without explanation?
And why is the above code causing error and not saving result?
I'm assuming since you assign input to the first command line argument with input = sys.argv[1] and output to the second, you intend those to be your input and output file names. But you're never opening any file for the input data, so you're callling .splitlines() on a file name, not on file contents.
Next, splitlines() is the wrong approach here anyway. To iterate over a file line-by-line, simply use for line in f, where f is an open file. Those lines will include the newline at the end of the line, so it needs to be stripped if it's not supposed to be part of the third columns data.
Then you're opening and closing the file inside your loop, which means you'll try to write the entire contents of data to the file every iteration, effectively overwriting any data written to the file before. Therefore I moved that block out of the loop.
It's good practice to use the with statement for opening files. with open(out_fn, "w") as outfile will open the file named out_fn and assign the open file to outfile, and close it for you as soon as you exit that indented block.
input is a builtin function in Python. I therefore renamed your variables so no builtin names get shadowed.
You're trying to directly write data to the output file. This won't work since data is a list of lines. You need to join those lines first in order to turn them in a single string again before writing it to a file.
So here's your code with all those issues addressed:
from operator import itemgetter
import sys
in_fn = sys.argv[1]
out_fn = sys.argv[2]
getkey = itemgetter(0, 1)
seen = set()
data = []
with open(in_fn, 'r') as infile:
for line in infile:
line = line.strip()
key = getkey(line.split())
if key not in seen:
data.append(line)
seen.add(key)
with open(out_fn, "w") as outfile:
outfile.write('\n'.join(data))
Why is the above code causing error?
Because you haven't opened the file, you are trying to work with the string input.txtrather than with the file. Then when you try to access your item, you get a list index out of range because line.split() returns ['input.txt'].
How to fix that: open the file and then work with it, not with its name.
For example, you can do (I tried to stay as close to your code as possible)
input = sys.argv[1]
infile = open(input, 'r')
(...)
lines = infile.readlines()
infile.close()
for line in lines:
(...)
Why is this not saving result?
Because you are opening/closing the file inside the loop. What you need to do is write the data once you're out of the loop. Also, you cannot write directly a list to a file. Hence, you need to do something like (outside of your loop):
outfile = open(output, "w")
for item in data:
outfile.write(item)
outfile.close()
All together
There are other ways of reading/writing files, and it is pretty well documented on the internet but I tried to stay close to your code so that you would understand better what was wrong with it
from operator import itemgetter
import sys
input = sys.argv[1]
infile = open(input, 'r')
output = sys.argv[2]
#Pass any column number you want, note that indexing starts at 0
ig = itemgetter(0,1)
seen = set()
data = []
lines = infile.readlines()
infile.close()
for line in lines:
print line
key = ig(line.split())
if key not in seen:
data.append(line)
seen.add(key)
print data
outfile = open(output, "w")
for item in data:
outfile.write(item)
outfile.close()
PS: it seems to produce the result that you needed there Python to remove duplicates using only some, not all, columns

Reading from text file into python list

Very new to python and can't understand why this isn't working. I have a list of web addresses stored line by line in a text file. I want to store the first 10 in an array/list called bing, the next 10 in a list called yahoo, and the last 10 in a list called duckgo. I'm using the readlines function to read the data from the file into each array. The problem is nothing is being written to the lists. The count is incrementing like it should. Also, if I remove the loops altogether and just read the whole text file into one list it works perfectly. This leads me to believe that the loops are causing the problem. The code I am using is below. Would really appreciate some feedback.
count=0;
#Open the file
fo=open("results.txt","r")
#read into each array
while(count<30):
if(count<10):
bing = fo.readlines()
count+=1
print bing
print count
elif(count>=10 and count<=19):
yahoo = fo.readlines()
count+=1
print count
elif(count>=20 and count<=29):
duckgo = fo.readlines()
count+=1
print count
print bing
print yahoo
print duckgo
fo.close
You're using readlines to read the files. readlines reads all of the lines at once, so the very first time through your loop, you exhaust the entire file and store the result in bing. Then, every time through the loop, you overwrite bing, yahoo, or duckgo with the (empty) result of the next readlines call. So your lists all wind up being empty.
There are lots of ways to fix this. Among other things, you should consider reading the file a line at a time, with readline (no 's'). Or better yet, you could iterate over the file, line by line, simply by using a for loop:
for line in fo:
...
To keep the structure of your current code you could use enumerate:
for line_number, line in enumerate(fo):
if condition(line_number):
...
But frankly I think you should ditch your current system. A much simpler way would be to use readlines without a loop, and slice the resulting list!
lines = fo.readlines()
bing = lines[0:10]
yahoo = lines[10:20]
duckgo = lines[20:30]
There are many other ways to do this, and some might be better, but none are simpler!
readlines() reads all of the lines of the file. If you call it again, you get empty list. So you are overwriting your lists with empty data when you iterate through your loop.
You should be using readline() instead of readlines()
readlines() reads the entire file in at once, whereas readline() reads a single line from the file.
I suggest you rewrite it like so:
bing = []
yahoo = []
duckgo = []
with open("results.txt", "r") as f:
for i, line in enumerate(f):
if i < 10:
bing.append(line)
elif i < 20:
yahoo.append(line)
elif i < 30:
duckgo.append(line)
else:
raise RuntimeError, "too many lines in input file"
Note how we use enumerate() to get a running count of lines, rather than making our own count variable and needing to increment it ourselves. This is considered good style in Python.
But I think the best way to solve this problem would be to use itertools like so:
import itertools as it
with open("results.txt", "r") as f:
bing = list(it.islice(f, 10))
yahoo = list(it.islice(f, 10))
duckgo = list(it.islice(f, 10))
if list(it.islice(f, 1)):
raise RuntimeError, "too many lines in input file"
itertools.islice() (or it.islice() since I did the import itertools as it) will pull a specified number of items from an iterator. Our open file-handle object f is an iterator that returns lines from the file, so it.islice(f, 10) pulls exactly 10 lines from the input file.
Because it.islice() returns an iterator, we must explicitly expand it out to a list by wrapping it in list().
I think this is the simplest way to do it. It perfectly expresses what we want: for each one, we want a list with 10 lines from the file. There is no need to keep a counter at all, just pull the 10 lines each time!
EDIT: The check for extra lines now uses it.islice(f, 1) so that it will only pull a single line. Even one extra line is enough to know that there are more than just the 30 expected lines, and this way if someone accidentally runs this code on a very large file, it won't try to slurp the whole file into memory.

Categories