Python readlines function not reading first line in file - python

I am trying to search through a list of files and extract the line start with "id'. This occurs for many times in each file and often in the first line of text in the file.
The code I have written so far works, however it seems to miss the first line in each file (the first occurrence of 'id').
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
lines = f.readlines()
a=0
while a < len(lines):
temp_array = lines[a].rstrip().split(",")
if temp_array[0] == "id":
game_id = temp_array[1]
Any suggestions on how I can include this first line of text in the readlines? I tried changing a to -1 so it would include the first line of text (where a=0) but this didn't work.
EDIT:
I need to keep 'a' in my code as an index because I use it later on. The code I showed above was truncated. Here is more of the code for example. Any suggestions on how else I can remove "for line in f:"?
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
lines = f.readlines()
a=0
while a < len(lines):
temp_array = lines[a].rstrip().split(",")
if temp_array[0] == "id":
game_id = temp_array[1]
for o in range(a+1,a+7,1):
if lines[o].rstrip().split(",")[1]== "visteam":
awayteam = lines[o].rstrip().split(",")[2]
if lines[o].rstrip().split(",")[1]== "hometeam":
hometeam = lines[o].rstrip().split(",")[2]
if lines[o].rstrip().split(",")[1]== "date":
date = lines[o].rstrip().split(",")[2]
if lines[o].rstrip().split(",")[1]== "site":
site = lines[o].rstrip().split(",")[2]

for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
temp_array = line.rstrip().split(",")
if temp_array[0] == "id":
game_id = temp_array[1]
The above should work, it can also be made a bit faster as there is no need to create a list for each line:
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
if line.startswith("id,"):
temp_array = line.rstrip().split(",")
game_id = temp_array[1]
You can use enumerate to keep track of the current line number. Here is another way having seen your edit to the question;
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
lines = f.readlines()
for n, line in enumerate(lines):
if line.startswith("id,"):
game_id = line.rstrip().split(",")[1]
for o in range(n + 1, n + 7):
linedata = lines[o].rstrip().split(",")
spec = linedata[1]
if spec == "visteam":
awayteam = linedata[2]
elif spec == "hometeam":
hometeam = linedata[2]
elif spec == "date":
date = linedata[2]
elif spec == "site":
site = linedata[2]
You should also consider using the csv library for working with csv files.

Related

Eliminate specific number in a data file using Python

I have a large file and I want to delete all the values '24' within the data file. I have used this code but it doesn't do what I want. Suggestions please. Thanks
This is the data file
24,24,24,24,24,24,1000,1000,24,24,24,1000,1000,1000,1000,24,24,24,24,24,24,24,24,24,24,1000,1000,1000,1000,1000,1000,1000,1000,24,24,24,24,1000,1000,1000,1000,24,1000,24,24,24,24,1000,1000,1000,1000,1000,24,24,24,24,24,24,1000,24,24,24,24,1000,1000,1000,1000,1000,1000,1000,1000,1000,24,24,24,24,1000,1000,1000,1000,24,1000,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,1000,1000,24,24,24,24,24,24,1000,1000,1000,24,24,24,24,1000,1000,1000,1000,1000,1000,1000,1000,1000,24,24,24,24,24,24,24,24,24,24,24,24,24,1000,1000,24,24,24,24,24,24,24,24,24,1000,1000,1000,24,24,24,1000,24,24,1000,1000,24,24,24,24,1000,1000,1000,1000,1000,1000,1000,24,24,24,1000,1000,1000,1000,1000,1000,24,24,24,1000,1000,1000,1000,1000,1000,1000,24,24,24,24,1000,1000,24,1000,1000,24,24,1000,1000,1000,1000,1000,1000,1000,24,24,24,1000,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,1000,1000,24,24,24,1000,1000,1000,1000,1000,24,24,24,24,24,24,24,24,1000,1000,1000,1000,1000,24,24,24,24,24,24,1000,24,24,24,24,24,24,24,24,24,1000,1000,1000,1000,1000,1000,24,24,24,24,24,24,24,24,24,24,1000,1000,1000,24,1000,1000,1000,1000,24,24,1000,1000,24,24,24,24,24,24,24,1000,24,24,24,24,24,24,1000,1000,1000,1000,1000,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,24,1000,1000,1000,1000,1000
Code
content = open('txt1.txt', 'r').readlines()
cleandata = []
for line in content:
line = {i:None for i in line.replace("\n", "").split()}
for value in line.copy():
if value == "24":
line.pop(value)
cleandata.append(" ".join(line) + "\n")
open('txt2.txt', 'w').writelines(cleandata)
This should do it:
content = open('txt1.txt', 'r').readlines()
cleandata = []
for line in content:
line = line.replace('24','')
cleandata.append(line)
open('txt2.txt', 'w').writelines(cleandata)
You could use a regex for it, to match the 24 and delete it.
import re
regex24 = re.compile(r"\b24,?\b")
f = open('txt1.txt', 'r')
cleans = [regex24.sub("", line) for line in f.readlines()]
open('txt2.txt', 'w').writelines(cleans)

Copy last line in python file and modify it before adding a new line

I would like to check for a string "rele" in a python text file and if string is not present then copy the last line of the file and then modify it as below to add as a new entry.
Example:
Actual File: Where "rele" is not present
"123456",1,0,"mher",0,"N",01Jan1986 00:00,130:00,
"123456",1,1,"ermt",0,"N",01Jan1986 00:00,100:00,
"123456",1,2,"irbt",0,"N",01Jan1986 00:00,120:00,
Expected Output:
"123456",1,0,"mher",0,"N",01Jan1986 00:00,130:00,
"123456",1,1,"ermt",0,"N",01Jan1986 00:00,0:00,
"123456",1,2,"irbt",0,"N",01Jan1986 00:00,0:00,
"123456",1,3,"rele",0,"0000",01Jan1986 00:00,0:00,
Last entry of the file is similar to its previous except few changes to it's 3,4 and 6 columns.
My code:
fp = open(srcEtab.txt, 'w')
for line in lines:
if 'rele' in line:
foundRelOrPickup = True
if not foundRelOrPickup:
fp1 = open ( 'srcEtab.txt',"w" )
lineList = fp1.readlines()
new_line = lineList[len(lineList)-1]
fp1.write(new_line)
fp.close()
fp1.close()
with open(yourFile) as f:
lines = f.readlines()
if not any(map(lambda line: "rele" in line,lines)):
last_line_words = lines[-1].split(',')
last_line_words[2] = len(lines)
last_line_words[3] = '"rele"'
last_line_words[5] = '"0000"'
lines.append(",".join([str(i) for i in last_line_words]))
with open(otherFile, "w") as f1:
for line in lines:
f1.write(line)

How to split text file by id in python

I have a bunch of text files containing tab separated tables. The second column contains an id number, and each file is already sorted by that id number. I want to separate each file into multiple files by the id number in column 2. Here's what I have.
readpath = 'path-to-read-file'
writepath = 'path-to-write-file'
for filename in os.listdir(readpath):
with open(readpath+filename, 'r') as fh:
lines = fh.readlines()
lastid = 0
f = open(writepath+'checkme.txt', 'w')
f.write(filename)
for line in lines:
thisid = line.split("\t")[1]
if int(thisid) <> lastid:
f.close()
f = open(writepath+thisid+'-'+filename,'w')
lastid = int(thisid)
f.write(line)
f.close()
What I get is simply a copy of all the read files with the first id number from each file in front of the new filenames. It is as if
thisid = line.split("\t")[1]
is only done once in the loop. Any clue to what is going on?
EDIT
The problem was my files used \r rather than \r\n to terminate lines. Corrected code (simply adding 'rU' when opening the read file and swapping != for <>):
readpath = 'path-to-read-file'
writepath = 'path-to-write-file'
for filename in os.listdir(readpath):
with open(readpath+filename, 'rU') as fh:
lines = fh.readlines()
lastid = 0
f = open(writepath+'checkme.txt', 'w')
f.write(filename)
for line in lines:
thisid = line.split("\t")[1]
if int(thisid) != lastid:
f.close()
f = open(writepath+thisid+'-'+filename,'w')
lastid = int(thisid)
f.write(line)
f.close()
If you're dealing with tab delimited files, then you can use the csv module, and take advantage of the fact that itertools.groupby will do the previous/current tracking of the id for you. Also utilise os.path.join to make sure your filenames end up joining correctly.
Untested:
import os
import csv
from itertools import groupby
readpath = 'path-to-read-file'
writepath = 'path-to-write-file'
for filename in os.listdir(readpath):
with open(os.path.join(readpath, filename)) as fin:
tabin = csv.reader(fin, delimiter='\t')
for file_id, rows in groupby(tabin, lambda L: L[1]):
with open(os.path.join(writepath, file_id + '-' + filename), 'w') as fout:
tabout = csv.writer(fout, delimiter='\t')
tabout.writerows(rows)

How to compare all lines in some file with another line?

I am new at Python and need some help.
I have a file with x number of lines. I want to compare each line of that file with another line, and write that line to that file if they are different.
I looked for an answer but didn't find anything that I can use. I tried something myself but it doesn't work.
My code:
filename = ...
my_file = open(filename, 'r+')
for line in my_file:
new_line = ("text")
print line
print new_line
if new_line == line:
print('same')
else:
print('diffrent')
my_file.write('%s' % (new_line))
I only want my application to write the line to the file if it doesn't already exist there.
contents of filename
====================
text
text1
text2
In the case above where new line is "text", the application shouldn't do anything because that line already exist in the file. However, if the new line is "text3" then it should be written to the file as follows:
contents of filename
====================
text
text1
text2
text3
First, let's read the contents of the file so that we can check if the new line is already in there.
existing_lines = [line.rstrip('\n') for line in open(filename, 'r')]
Let's say you have a separate list named new_lines that contains all lines you'd like to check against the file. You can then check to see which ones are new as follows:
new = [line for line in new_lines if line not in existing_lines]
These are the lines that you'd then like to append to your existing file:
with open(filename, 'a') as f:
[f.write(line + '\n') for line in new]
with open('1.txt') as f1, open('2.txt') as f2, open('diff.txt','w') as dst:
while True:
l1 = f1.readline()
l2 = f2.readline()
if not l1 and not l2:
break
if l1 != l2:
dst.write(l1)
I would rather suggest you to create a new file and write the difference to the new file instead of editing the file2.txt
with open("file1.txt", "r") as first_file, open("file2.txt", "r") as second_file:
file1 = first_file.readlines()
file2 = second_file.readlines()
length = min(len(file1), len(file2))
for i in xrange(length):
if file1[i].strip() != file2[i].strip():
#Do something here

Using python to read txt files and answer questions

a01:01-24-2011:s1
a03:01-24-2011:s2
a02:01-24-2011:s2
a03:02-02-2011:s2
a03:03-02-2011:s1
a02:04-19-2011:s2
a01:05-14-2011:s2
a02:06-11-2011:s2
a03:07-12-2011:s1
a01:08-19-2011:s1
a03:09-19-2011:s1
a03:10-19-2011:s2
a03:11-19-2011:s1
a03:12-19-2011:s2
So I have this list of data as a txt file, where animal name : date : location
So I have to read this txt file to answer questions.
So so far I have
text_file=open("animal data.txt", "r") #open the text file and reads it.
I know how to read one line, but here since there are multiple lines im not sure how i can read every line in the txt.
Use a for loop.
text_file = open("animal data.txt","r")
for line in text_file:
line = line.split(":")
#Code for what you want to do with each element in the line
text_file.close()
Since you know the format of this file, you can shorten it even more over the other answers:
with open('animal data.txt', 'r') as f:
for line in f:
animal_name, date, location = line.strip().split(':')
# You now have three variables (animal_name, date, and location)
# This loop will happen once for each line of the file
# For example, the first time through will have data like:
# animal_name == 'a01'
# date == '01-24-2011'
# location == 's1'
Or, if you want to keep a database of the information you get from the file to answer your questions, you can do something like this:
animal_names, dates, locations = [], [], []
with open('animal data.txt', 'r') as f:
for line in f:
animal_name, date, location = line.strip().split(':')
animal_names.append(animal_name)
dates.append(date)
locations.append(location)
# Here, you have access to the three lists of data from the file
# For example:
# animal_names[0] == 'a01'
# dates[0] == '01-24-2011'
# locations[0] == 's1'
You can use a with statement to open the file, in case of the open was failed.
>>> with open('data.txt', 'r') as f_in:
>>> for line in f_in:
>>> line = line.strip() # remove all whitespaces at start and end
>>> field = line.split(':')
>>> # field[0] = animal name
>>> # field[1] = date
>>> # field[2] = location
You are missing the closing the file. You better use the with statement to ensure the file gets closed.
with open("animal data.txt","r") as file:
for line in file:
line = line.split(":")
# Code for what you want to do with each element in the line

Categories