Loop list and open file if found - python

Not sure where to start with this... I know how to read in a csv file but if I have a heap of files in the same directory, how can read them in according to whether they are in a list. For example, a list such as...
l= [['file1.csv','title1','1'], ['file2.csv','title2','1'],['file3.csv','title3','1']]
How can I get just those 3 files even though I up to 'file20.csv' in the directory.
Can I somehow loop through the list and use an if-statement to check the filenames and open the file if found?

for filedesc in l: #go over each sublist in l
fname, ftitle, _ = filedesc #unpack the information contained in it
with open(fname) as f: #open the file with the appropriate name
reader = csv.reader(f) #create reader of that file
#go about bussiness

An updated post because I've gotten so close with this....
lfiles= []
csvfiles=[]
for row in l:
lfiles= row[0] #This reads in just the filesnames from list 'l'
with open(lfiles) as x:
inread = csv.reader(x)
for i in x:
print i
That's print everything in the files that were read in, but now I want to append 'csvfiles' (an empty list) with a row if a particular column equals something.
Probably like this...????
for i in x:
for line in i:
if line= 'ThisThingAppears5Times':
csvfiles.append(line) # and now the 5 lines are in a 2dlist
Of course that doesn't work but close??

Related

In Python how can I print the element of a list if that element is present in a file

I am trying to identify if any elements of a list are present in a file. I then wish to print both the element and the file name. In future iterations of this, I would eventually like to run the list against multiple files in a folder to produce a list of elements and the files in which they are present.
I have the following:
#imports
import os
os.system('cls')
z = open('fieldnames.txt', 'r')
f = open('TestSQL.sql', 'r')
for element in z.readlines():
if element in f.read():
print(element)
print(f.name)
I am able to print the file name, but the element that was present won't print.
Apologies for any lack of clarity, I am really new to python and appreciate any guidance I can get.
Just read fieldnames.txt first at once and then use it to find elements:
with open('fieldnames.txt', 'r') as f:
fieldnames = f.read()
with open('TestSQL.sql', 'r') as f:
for element in z.readlines():
if element in fieldnames:
print("{} {}".format(element, f.name))

Reading a list, splitting it, then making it into 3 new lists

I have a file that includes a list of every zip code, city, and state in the US. When I read a list it looks like " '00501', 'Huntsville', 'NY' ".
So what I'm trying to do in Python is:
Open the file, read everysingle line, split the lines, then create 3 new lists Zip, City, State and place all the data from the original list into the new lists.
So far, I have the following code:
def main():
zipcode = []
city = []
state = []
file_object = open("zipcodes.txt", "r")
theList = file_object.readlines()
splitlist = theList.split(',')
zipcode.append(splitlist[0])
city.append(splitlist[1])
state.append(splitlist[2])
file_object.close()
You have the basics, you are just missing a loop so the process repeats for each line in the file:
theList = file_object.readlines()
for line in theList:
splitlist = line.split(',')
zipcode.append(splitlist[0])
city.append(splitlist[1])
state.append(splitlist[2])
Keep in mind that readlines() returns the entire file so your theList contains the entire file's contents. You just have to loop over it.
Here is a different version you can try:
def main():
zips = []
cities = []
states = []
with open('zipcodes.txt', 'r') as f:
for line in f:
bits = line.split(',')
zips.append(bits[0])
cities.append(bits[1])
states.append(bits[2])
The with_statement is a way to read files. It ensures that files are automatically closed.
The for line in f: loop is what you were missing in your code - this will go through each line.
The body of the for loop is exactly what you have written, so that part you had down well.

How to fix issues with comparing text of 2 files and comparing

I have a script that gets the list of distribution switches from device manager "Master-EDR-List.txt". It then grabs another txt file from a different server "New-EDR-List.txt". Master list is pretty static until New list has additional EDRs that Master list is missing.
I would like to compare these 2 files and save any EDR that is in New list but not in Master list. I did write a compare script but it is not reliable. I added some additional test EDRs in New list and I am getting unexpected results based on where I place them in the list. I always get the new ones but sometimes I also get an EDR that is in both list and sometimes I get both new EDRs in same line with no spaces.
Here is my code:
old_lines = set((line.strip() for line in open('Master-EDR-List.txt', 'r+')))
file_new = open('New-EDR-List.txt', 'r+')
#file_diff = open('file_diff.txt', 'w')
#Open Master File
with open('Master-EDR-List.txt', 'r') as f:
d = set(f.readlines())
#Open New File
with open('New-EDR-List.txt', 'r') as f:
e = set(f.readlines())
#Open Diff files to store differences
open('file_diff.txt','w').close()
with open('file_diff.txt', 'a') as f:
for line in list(e - d):
f.write(line)
Here are my lists I am using for testing:
Master List:
rts41d-an28edr1.rt.tst.com
rts41d-an28edr2.rt.tst.com
rts41d-an32edr1.rt.tst.com
rts41d-an32edr2.rt.tst.com
rts41d-as19edr1.rt.tst.com
rts41d-as19edr2.rt.tst.com
rts41d-as21edr1.rt.tst.com
rts41d-as21edr2.rt.tst.com
rts12a-ah46edr2.rt.tst.com
rts12a-al46edr2.rt.tst.com
rts12a-as46edr1.rt.tst.com
rts12a-as46edr2.rt.tst.com
rts12a-as46edr2.rt.tst.com
rts12a-aw46edr1.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12b-as46edr1.rt.tst.com
rts12b-ax46edr1.rt.tst.com
New List:
rts41d-an28edr1.rt.tst.com
rts41d-an28edr2.rt.tst.com
rts41d-an32edr1.rt.tst.com
rts41d-an32edr2.rt.tst.com
rts41d-as19edr1.rt.tst.com
rts41d-as19edr2.rt.tst.com
rt511-sps5.rt.tst.com
rts41d-as21edr1.rt.tst.com
rts41d-as21edr2.rt.tst.com
rts12a-ah46edr2.rt.tst.com
rts12a-al46edr2.rt.tst.com
rts12a-as46edr1.rt.tst.com
rts12a-as46edr2.rt.tst.com
rt511-sps6.rt.tst.com
rts12a-as46edr2.rt.tst.com
rts12a-aw46edr1.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12a-aw46edr2.rt.tst.com
rts12b-as46edr1.rt.tst.com
rts12b-ax46edr1.rt.tst.com
rt511-sps7.rt.tst.com
I added 2 test rt511-sps5, 6 and 7 in the list and instead of only getting these 3 item, I am getting this in my Diff file:
Diff File:
rt511-sps7.rt.tst.comrt511-sps5.rt.tst.com
rt511-sps6.rt.tst.com
rts12b-ax46edr1.rt.tst.com
As you can see, sps7 and 5 are in same line for some reason and "rts12b-ax46edr1" should not be there as it is already in both files.
Does anyone know why this is happening and how I can fix it? New List can discover any new distribution switch and put it anywhere on list depending on its name. I would like this script to only print out any new EDR in the list that Master does not have.
Thanks
I modified your script. Please use the below code to fulfil your requirement. Do not forget to close all the open files.
with open('Master-EDR-List.txt', 'r') as f:
d = f.readlines()
#Open New File
with open('New-EDR-List.txt', 'r') as f:
e = f.readlines()
out = open('file_diff.txt', 'a')
for newline in e:
found = False
for oldline in d:
if((newline.strip(' \t\n\r')) == (oldline.strip(' \t\n\r'))):
found = True
break;
if(found == False):
if(newline != '\n'):
out.writelines(newline)

Check if there are new strings in a txt file

I am trying to make a function which will compare two txt files. If it recognizes new lines that are in one file but not in the other, it will add them in a list and also in that file that does not contain those new lines. It fails to do that. Here is my function. What am I doing wrong?
newLinks = []
def newer():
with open('cnbcNewLinks.txt', 'w') as newL:
for line in open('cnbcCleanedLinks.txt'):
if line not in "cnbcNewLinks.txt":
newLinks.append(line)
newL.write(line)
else:
continue
cleaned = ''.join(newLinks)
print(cleaned)
I put in python code what #Alex suggested.
See the doc for set.
I replace you text file name by a.txt and b.txt to be easily readable.
# First read the files and compare then using `set`
with open('a.txt', 'r') as newL, open('b.txt', 'r') as cleanL:
a = set(newL)
b = set(cleanL)
add_to_cleanL = list(a - b) # list with line in newL that are not in cleanL
add_to_newL = list(b - a) # list with line in cleanL that are not in newL
# Then open in append mode to add at the end of the file
with open('a.txt', 'a') as newL, open('b.txt', 'a') as cleanL:
newL.write(''.join(add_to_newL)) # append the list at the end of newL
cleanL.write(''.join(add_to_cleanL)) # append the list at the end of cleanL
If files not big, then move data in list,
both of list convert in set and use 'differ' builtin functions, two times.
then add difference in files.

Python Error When Attempting to Iterate Over a List of File Names

I have a list of file names, all of which have the .csv ending. I am trying to use the linecache.getline function to get 2 parts of each csv - the second row, 5th item and the 46th row, 5th item and comparing the two values (they're stock returns).
import csv
import linecache
d = open('successful_scrapes.csv')
csv = csv.reader(d)
k = []
for row in csv:
k.append(row)
x =linecache.getline('^N225.csv',2)
y = float(x.split(",")[4])
for c in k:
g = linecache.getline(c,2)
t = float(g.split(",")[4])
Everything works until the for loop over the k list. It keeps returning the error "Unhashable type: list." I've tried including quotation marks before and after each .csv file name in the list. Additionally, all the files are included in the same directory. Any thoughts?
Thanks!
You're misusing linecache, which is for working with files. There is no point in using it at all if you are going to pull the whole file into memory first.
In this case, since you have the whole CSV copied into k, just do the comparison:
yourComparisonFunction(k[1][4],k[45][4])
Alternately, you could use linecache instead of csv, and do it like so:
import linecache
file_list = ['file1','file2','file3','etc']
for f in file_list:
line2 = linecache.getline(f,2)
line2val = float(line2.split(",")[4])
line46 = linecache.getline(f,46)
line46val = float(line46.split(",")[4])
And the, I assume, add some comparison logic.
You can read the file and then just append the values to a list depending on the row number.
import csv
with open("C/a.csv", "rb") as f:
reader = csv.reader(f)
lst = [x[4] for i, x in enumerate(reader) if i == 1 or i == 45]
Then you can do the comparison with the lst's items

Categories