AttributeError: FileInput instance has no attribute '__exit__' - python

I am trying to read from multiple input files and print the second row from each file next to each other as a table
import sys
import fileinput
with fileinput.input(files=('cutflow_TTJets_1l.txt ', 'cutflow_TTJets_1l.txt ')) as f:
for line in f:
proc(line)
def proc(line):
parts = line.split("&") # split line into parts
if "&" in line: # if at least 2 parts/columns
print parts[1] # print column 2
But I get a "AttributeError: FileInput instance has no attribute '__exit__'"

The problem is that as of python 2.7.10, the fileinput module does not support being used as a context manager, i.e. the with statement, so you have to handle closing the sequence yourself. The following should work:
f = fileinput.input(files=('cutflow_TTJets_1l.txt ', 'cutflow_TTJets_1l.txt '))
for line in f:
proc(line)
f.close()
Note that in recent versions of python 3, you can use this module as a context manager.
For the second part of the question, assuming that each file is similarly formatted with an equal number of data lines of the form xxxxxx & xxxxx, one can make a table of the data from the second column of each data as follows:
Start with an empty list to be a table where the rows will be lists of second column entries from each file:
table = []
Now iterate over all lines in the fileinput sequence, using the fileinput.isfirstline() to check if we are at a new file and make a new row:
for line in f:
if fileinput.isfirstline():
row = []
table.append(row)
parts = line.split('&')
if len(parts) > 1:
row.append(parts[1].strip())
f.close()
Now table will be the transpose of what you really want, which is each row containing the second column entries of a given line of each file. To transpose the list, one can use zip and then loop over rows the transposed table, using the join string method to print each row with a comma separator (or whatever separator you want):
for row in zip(*table):
print(', '.join(row))

If something has open/close methods, use contextlib.closing:
import sys
import fileinput
from contextlib import closing
with closing(fileinput.input(files=('cutflow_TTJets_1l.txt ', 'cutflow_TTJets_1l.txt '))) as f:
for line in f:
proc(line)

Related

How to filter lines by column in python

I need to filter some lines of a .csv file:
2017/06/07 10:42:35,THREAT,url,192.168.1.100,52.25.xxx.xxx,Rule-VWIRE-03,13423523,,web-browsing,80,tcp,block-url
2017/06/07 10:43:35,THREAT,url,192.168.1.101,52.25.xxx.xxx,Rule-VWIRE-03,13423047,,web-browsing,80,tcp,allow
2017/06/07 10:43:36,THREAT,end,192.168.1.102,52.25.xxx.xxx,Rule-VWIRE-03,13423047,,web-browsing,80,tcp,block-url
2017/06/07 10:44:09,TRAFFIC,end,192.168.1.101,52.25.xxx.xxx,Rule-VWIRE-03,13423111,,web-browsing,80,tcp,allow
2017/06/07 10:44:09,TRAFFIC,end,192.168.1.103,52.25.xxx.xxx,Rule-VWIRE-03,13423111,,web-browsing,80,tcp,block-url
I want to filter lines containing the string "THREAT" in the second column AND lines containing the ips 192.168.1.100 and 192.168.1.101 in the fourth column.
This is my implementation so far:
import csv
file= open(file.log, 'r')
f= open(column, 'w')
lines = file.readlines()
for line in lines:
input = raw_input()
col = line.split(',')
if line.find(col[1])=="THREAT":
f.write (line)
if line.find(col[3]==192.168.1.100 && 192.168.101:
f.write (line)
else:
pass
f.close()
file.close()
What is wrong with the code? This is the output I'm expecting to get:
2017/06/07 10:42:35,THREAT,url,192.168.1.100,52.25.xxx.xxx,Rule-VWIRE-03,13423523,,web-browsing,80,tcp,block-url
2017/06/07 10:43:35,THREAT,url,192.168.1.101,52.25.xxx.xxx,Rule-VWIRE-03,13423047,,web-browsing,80,tcp,allow
You use str.find method, which returns index if found and -1 otherwise. In your case - if, for example, THREAT in line - it will return some non-zero number, but then you compare that number with string, which is obviously returns False.
Also, you can union those if statements.
So, taking into account the above - your if statements should be:
if col[1] == "THREAT" or col[3] in ["192.168.1.100", "192.168.1.101"]:
f.write(line)
In addition - i don't understand, why you use raw_input on each iteration and never use again that value?
I suggest you use this little optimized code:
import csv # not used in provide snippet, could be deleted
file_log = open("file.log", 'r') # better to use absoulete path
filtered_log = open("column", 'w') # same as previous
for line in file: # no need to read entire file, just iterate over it line by line directly
col = line.split(',')
if col and (col[1] == "THREAT" or col[3] in ["192.168.1.100", "192.168.1. 101"]):
filtered_log.write(line)
file_log.close()
filtered_log.close()
Python's csv module provides a reader object which can be used to iterate over a .csv file lines.
In each line, you can extract column by it's index and apply some comparation logic before printing the line.
This implementation will filter the file as needed:
import csv
ip_list = ['192.168.1.100', '192.168.1.101']
with open('file.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for line in reader:
if (line[1]=="THREAT") and (line[3] in ip_list):
print(','.join(line))
As you can see, this implementation stores the ips in a list for comparing them using the python's in operator.

Trying to remove rows based in csv file based off column value

I'm trying to remove duplicated rows in a csv file based on if a column has a unique value. My code looks like this:
seen = set()
for line in fileinput.FileInput('DBA.csv', inplace=1):
if line[2] in seen:
continue # skip duplicated line
seen.add(line[2])
print(line, end='')
I'm trying to get the value of the 2 index column in every row and check if it's unique. But for some reason my seen set looks like this:
{'b', '"', 't', '/', 'k'}
Any advice on where my logic is flawed?
You're reading your file line by line, so when you pick line[2] you're actually picking the third character of each line you're running this on.
If you want to capture the value of the second column for each row, you need to parse your CSV first, something like:
import csv
seen = set()
with open("DBA.csv", "rUb") as f:
reader = csv.reader(f)
for line in reader:
if line[2] in seen:
continue
seen.add(line[2])
print(line) # this will NOT print valid CSV, it will print Python list
If you want to edit your CSV in place I'm afraid it will be a bit more complicated than that. If your CSV is not huge, you can load it in memory, truncate it and then write down your lines:
import csv
seen = set()
with open("DBA.csv", "rUb+") as f:
handler = csv.reader(f)
data = list(handler)
f.seek(0)
f.truncate()
handler = csv.writer(f)
for line in data:
if line[2] in seen:
continue
seen.add(line[2])
handler.writerow(line)
Otherwise you'll have to read your file line by line and use a buffer that you'll pass to csv.reader() to parse it, check the value of its third column and if not seen write the line to the live-editing file. If seen, you'll have to seek back to the previous line beginning before writing the next line etc.
Of course, you don't need to use the csv module if you know your line structures well which can simplify the things (you won't need to deal with passing buffers left and right), but for a universal solution it's highly advisable to let the csv module do your bidding.

How do I convert a table in notepad into CSV format?

I have this table of data in Notepad
But it's not really a table because there aren't like official columns. It's just looks like a table, but the data is organized using spaces.
I want to convert it into a CSV format. How should I go about doing this?
The panda python packages I am using for data analysis work best with CSV, as far as I understand.
Here is a hackjob python script to do exactly what you need. Just save the script as a python file and run it with the path of your input file as the only argument.
UPDATED: After reading the comments to my answer, my script now uses regular expressions to account for any number of spaces.
import re
from sys import argv
output = ''
with open(argv[1]) as f:
for i, line in enumerate(f.readlines()):
if i == 0:
line = line.strip()
line = re.sub('\s+', ',', line) + '\n'
else:
line = re.sub('\s\s+', ',', line)
output += line
with open(argv[1] + '.csv', 'w') as f:
f.write(output)
So this is put into a file (if you call it csvify.py) and executed as:
python csvify.py <input_file_name>
csvify.py:
from sys import argv
from re import finditer
#Method that returns fields separated by commas
def comma_delimit(line, ranges):
return ','.join(get_field(line, ranges))
#Method that returns field info in appropriate format
def get_field(line, ranges):
for span in ranges: #Iterate through column ranges
field = line[slice(*span)].strip() #Get field data based on range slice and trim
#Use str() function if field doesn't contain commas, otherwise use repr()
yield (repr if ',' in field else str)(field)
#Open the input text file from command line (readonly, closed automatically)
with open(argv[1], 'r') as inp:
#Convert the first line (assumed header) into range indexes
#Use finditer to split the line by word border until the next word
#This assumes no spaces within header names
columns = map(lambda match: match.span(), finditer(r'\b\w+\s*', inp.readline()))
inp.seek(0) #Reset file pointer to beginning to include header line
#Create new CSV based on input file name
with open(argv[1] + '.csv', 'w') as txt:
#Writes to file and join all converted lines with newline
txt.write('\n'.join(comma_delimit(line, columns) for line in inp.readlines()))

String Replace in csv

Below, I am trying to replace data in a csv. The code works, but it replaces anything matching stocklevelin the file.
def updatestocklevel(quantity, stocklevel, code):
newlevel = stocklevel - quantity
stocklevel = str(stocklevel)
newlevel = str(newlevel)
s = open("stockcontrol.csv").read()
s = s.replace (stocklevel ,newlevel) #be careful - will currently replace any number in the file matching stock level!
f = open("stockcontrol.csv", 'w')
f.write(s)
f.close()
My csv looks like this;
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
Elsewhere in my program, I have a function that searches for the code (8 digits)
Is it possible to say, if the code is in the line of the csv, replace the data in the second column? (data[2])
All the occurances of stocklevel are getting replaced with the value of newlevel as you are calling s.replace (stocklevel ,newlevel).
string.replace(s, old, new[, maxreplace]): Return a copy of string s
with all occurrences of substring old replaced by new. If the optional
argument maxreplace is given, the first maxreplace occurrences are
replaced.
source
As you suggested, you need to get the code and use it replace the stock level.
This is a sample script which takes the 8 digit code and the new stock level as the command line arguments adn replaces it:
import sys
import re
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
data=f.readlines()
print data
for i,line in enumerate(data):
if re.search('%s,\d+'%code,line): # search for the line with 8-digit code
data[i] = '%s,%d\n'%(code,newval) # replace the stock value with new value in the same line
f.close()
f=open("in.csv","w")
f.write("".join(data))
print data
f.close()
Another solution using the csv module of Python:
import sys
import csv
data=[]
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
reader=csv.DictReader(f,fieldnames=['code','level'])
for line in reader:
if line['code'] == code:
line['level']= newval
data.append('%s,%s'%(line['code'],line['level']))
f.close()
f=open("stockcontrol.csv","w")
f.write("\n".join(data))
f.close()
Warning: Keep a back up of the input file while trying out these scripts as they overwrite the input file.
If you save the script in a file called test.py then invoke it as:
python test.py 34512340 10.
This should replace the stockvalue of code 34512340 to 10.
Why not using good old regular expressions?
import re
code, new_value = '11813651', '885' # e.g, change 85 to 885 for code 11813651
print (re.sub('(^%s,).*'%code,'\g<1>'+new_value,open('stockcontrol.csv').read()))
Since it's a csv file I'd suggest using Python's csv module. You will need to write to a new file since reading and writing to the same file will turn out bad. You can always rename it afterwards.
This example uses StringIO (Python 2) to embed your csv data in the code and treat it as a file. Normally you would open a file to get the input.
Updated
import csv
# Python 2 and 3
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
CSV = """\
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
"""
def replace(key, value):
fr = StringIO(CSV)
with open('out.csv', 'w') as fw:
r = csv.reader(fr)
w = csv.writer(fw)
for row in r:
if row[0] == key:
row[1] = value
w.writerow(row)
replace('99823658', 42)

Reading csv file and compare objects to a list

I have a .txt file,primary list, with strings like this:
f
r
y
h
g
j
and I have a .csv file,recipes list, with rows like this:
d,g,r,e,w,s
j,f,o,b,x,q,h
y,n,b,w,q,j
My programe is going throw each row and counts number of objects which belongs to primary list, for example in this case outcome is:
2
3
2
I always get 0, the mistake must be silly, but I can't figure it out:
from __future__ import print_function
import csv
primary_data = open('test_list.txt','r')
primary_list = []
for line in primary_data.readlines():
line.strip('\n')
primary_list.append(line)
recipes_reader = csv.reader(open('test.csv','r'), delimiter =',')
for row in recipes_reader:
primary_count = 0
for i in row:
if i in primary_list:
primary_count += 1
print (primary_count)
Here's the bare-essentials pedal-to-the-metal version:
from __future__ import print_function
import csv
with open('test_list.txt', 'r') as f: # with statement ensures your file is closed
primary_set = set(line.strip() for line in f)
with open('test.csv', 'rb') as f: #### see note below ###
for row in csv.reader(f): # delimiter=',' is the default
print(sum(i in primary_set for i in row)) # i in primary_set has int value 0 or 1
Note: In Python 2.x, always open csv files in binary mode. In Python3.x, always open csv files with newline=''
The reading into primary_list adds \n to each number - you should remove it:
When appending to primary_list do:
for line in primary_data:
primary_list.append(line.strip())
Note the strip call. Also, as you can see, you don't really need realines, since for line in primary_data already does what you need when primary_data is a file object.
Now, as a general comment, since you're using the primary list for lookup, I suggest replacing the list by a set - this will make things much faster if the list is large. Python sets are very efficient for key-based lookup, lists are not designed for that purpose.
Following code would solve the problem.
from __future__ import print_function
import csv
primary_data = open('test_list.txt','r')
primary_list = [line.rstrip() for line in primary_data]
recipies_reader = csv.reader(open('recipies.csv','r'), delimiter =',')
for row in recipies_reader:
count = 0
for i in row:
if i in primary_list:
count += 1
print (count)
Output
2
3
2

Categories