Below, I am trying to replace data in a csv. The code works, but it replaces anything matching stocklevelin the file.
def updatestocklevel(quantity, stocklevel, code):
newlevel = stocklevel - quantity
stocklevel = str(stocklevel)
newlevel = str(newlevel)
s = open("stockcontrol.csv").read()
s = s.replace (stocklevel ,newlevel) #be careful - will currently replace any number in the file matching stock level!
f = open("stockcontrol.csv", 'w')
f.write(s)
f.close()
My csv looks like this;
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
Elsewhere in my program, I have a function that searches for the code (8 digits)
Is it possible to say, if the code is in the line of the csv, replace the data in the second column? (data[2])
All the occurances of stocklevel are getting replaced with the value of newlevel as you are calling s.replace (stocklevel ,newlevel).
string.replace(s, old, new[, maxreplace]): Return a copy of string s
with all occurrences of substring old replaced by new. If the optional
argument maxreplace is given, the first maxreplace occurrences are
replaced.
source
As you suggested, you need to get the code and use it replace the stock level.
This is a sample script which takes the 8 digit code and the new stock level as the command line arguments adn replaces it:
import sys
import re
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
data=f.readlines()
print data
for i,line in enumerate(data):
if re.search('%s,\d+'%code,line): # search for the line with 8-digit code
data[i] = '%s,%d\n'%(code,newval) # replace the stock value with new value in the same line
f.close()
f=open("in.csv","w")
f.write("".join(data))
print data
f.close()
Another solution using the csv module of Python:
import sys
import csv
data=[]
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
reader=csv.DictReader(f,fieldnames=['code','level'])
for line in reader:
if line['code'] == code:
line['level']= newval
data.append('%s,%s'%(line['code'],line['level']))
f.close()
f=open("stockcontrol.csv","w")
f.write("\n".join(data))
f.close()
Warning: Keep a back up of the input file while trying out these scripts as they overwrite the input file.
If you save the script in a file called test.py then invoke it as:
python test.py 34512340 10.
This should replace the stockvalue of code 34512340 to 10.
Why not using good old regular expressions?
import re
code, new_value = '11813651', '885' # e.g, change 85 to 885 for code 11813651
print (re.sub('(^%s,).*'%code,'\g<1>'+new_value,open('stockcontrol.csv').read()))
Since it's a csv file I'd suggest using Python's csv module. You will need to write to a new file since reading and writing to the same file will turn out bad. You can always rename it afterwards.
This example uses StringIO (Python 2) to embed your csv data in the code and treat it as a file. Normally you would open a file to get the input.
Updated
import csv
# Python 2 and 3
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
CSV = """\
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
"""
def replace(key, value):
fr = StringIO(CSV)
with open('out.csv', 'w') as fw:
r = csv.reader(fr)
w = csv.writer(fw)
for row in r:
if row[0] == key:
row[1] = value
w.writerow(row)
replace('99823658', 42)
Related
I have below csv file format which I need to convert to yaml.(or below example output)
CVS file format
CASSANDRA a a
DSE_OPSCENTER
IGNITE a
KAFKA_LEAD b
KAFKA_SMART
OAM
RBM a
I used below code to convert the file into expected output
datareader = csv.reader(csvfile, delimiter=',', quotechar='"')
data_headings = []
for i in datareader:
new_yaml = open('hosts', 'a')
yaml_text = ""
#heading = "["+i[0]+"]"
#new_yaml.write(heading)
new_yaml.write('\n')
for cell in i:
print cell
new_yaml.write(cell)
new_yaml.write('\n')
new_yaml.close()
csvfile.close()
And I get below output which is fine for me.
CASSANDRA
a
a
DSE_OPSCENTER
IGNITE
a
KAFKA_LEAD
KAFKA_SMART
...
I want a small help here in putting CASSANDRA, DSE_OPSCENTER and so on within square brackets. Something like below
[CASSANDRA]
a
a
[DSE_OPSCENTER]
...
Edit
I added a template format. But I dont know how to put values in their respective groups
HOST_VAR_TEMPLATE = """
[CASSANDRA]
{cell}
[DSE_OPSCENTER]
[SMART]
[SPARK]
[SPARK_MASTERS]
[ZK]
"""
csvfile = open('hosts.csv', 'r')
datareader = csv.reader(csvfile, delimiter=',', quotechar='"')
data_headings = []
for i in datareader:
print i[1:]
with open('hosts', "w") as f:
for cell in i:
print cell
f.write(
HOST_VAR_TEMPLATE.format(
cell=cell,
)
)
You are creating a file with a single document that consists of a multi-line scalar string, followed by an explicit end of document marker.
You try to write the file yourself, instead of using a YAML library. In principle that is possible, but there are some corner cases where you have to take special care.
You should first load the end result using a YAML library:
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = yaml.load("""\
[CASSANDRA]
a
a
[DSE_OPSCENTER]
...
""")
This will give you a ComposerError telling you that a (the first one) starts a new document. That is because starting with [ the parser assumes that the document consists of a single flow-style sequence, and once encountering the corresponding ], the document is done.
If you want to have a single, multi-line, string in your YAML file you're best of using a block style literal scalar. This is correct YAML:
|
[CASSANDRA]
a
a
[DSE_OPSCENTER]
...
If you don't want to run the risk of creating an invalid YAML file, then create a Python string variable data and append each line to it including newlines, then write it to file using the YAML library:
import sys
import ruamel.yaml
from ruamel.yaml.scalarstring import PreservedScalarString
yaml = ruamel.yaml.YAML()
yaml.explicit_end = True
data = ''
data += '[CASSANDRA]' + '\n'
data += 'a' + '\n'
data += 'a' + '\n'
data += '[DSE_OPSCENTER]' + '\n'
yaml.dump(PreservedScalarString(data), sys.stdout)
(the PreservedScalarString type wraps your multiline Python string into something that ruamel.yaml dumps as a block style literal scalar.
I have this table of data in Notepad
But it's not really a table because there aren't like official columns. It's just looks like a table, but the data is organized using spaces.
I want to convert it into a CSV format. How should I go about doing this?
The panda python packages I am using for data analysis work best with CSV, as far as I understand.
Here is a hackjob python script to do exactly what you need. Just save the script as a python file and run it with the path of your input file as the only argument.
UPDATED: After reading the comments to my answer, my script now uses regular expressions to account for any number of spaces.
import re
from sys import argv
output = ''
with open(argv[1]) as f:
for i, line in enumerate(f.readlines()):
if i == 0:
line = line.strip()
line = re.sub('\s+', ',', line) + '\n'
else:
line = re.sub('\s\s+', ',', line)
output += line
with open(argv[1] + '.csv', 'w') as f:
f.write(output)
So this is put into a file (if you call it csvify.py) and executed as:
python csvify.py <input_file_name>
csvify.py:
from sys import argv
from re import finditer
#Method that returns fields separated by commas
def comma_delimit(line, ranges):
return ','.join(get_field(line, ranges))
#Method that returns field info in appropriate format
def get_field(line, ranges):
for span in ranges: #Iterate through column ranges
field = line[slice(*span)].strip() #Get field data based on range slice and trim
#Use str() function if field doesn't contain commas, otherwise use repr()
yield (repr if ',' in field else str)(field)
#Open the input text file from command line (readonly, closed automatically)
with open(argv[1], 'r') as inp:
#Convert the first line (assumed header) into range indexes
#Use finditer to split the line by word border until the next word
#This assumes no spaces within header names
columns = map(lambda match: match.span(), finditer(r'\b\w+\s*', inp.readline()))
inp.seek(0) #Reset file pointer to beginning to include header line
#Create new CSV based on input file name
with open(argv[1] + '.csv', 'w') as txt:
#Writes to file and join all converted lines with newline
txt.write('\n'.join(comma_delimit(line, columns) for line in inp.readlines()))
I am trying to read from multiple input files and print the second row from each file next to each other as a table
import sys
import fileinput
with fileinput.input(files=('cutflow_TTJets_1l.txt ', 'cutflow_TTJets_1l.txt ')) as f:
for line in f:
proc(line)
def proc(line):
parts = line.split("&") # split line into parts
if "&" in line: # if at least 2 parts/columns
print parts[1] # print column 2
But I get a "AttributeError: FileInput instance has no attribute '__exit__'"
The problem is that as of python 2.7.10, the fileinput module does not support being used as a context manager, i.e. the with statement, so you have to handle closing the sequence yourself. The following should work:
f = fileinput.input(files=('cutflow_TTJets_1l.txt ', 'cutflow_TTJets_1l.txt '))
for line in f:
proc(line)
f.close()
Note that in recent versions of python 3, you can use this module as a context manager.
For the second part of the question, assuming that each file is similarly formatted with an equal number of data lines of the form xxxxxx & xxxxx, one can make a table of the data from the second column of each data as follows:
Start with an empty list to be a table where the rows will be lists of second column entries from each file:
table = []
Now iterate over all lines in the fileinput sequence, using the fileinput.isfirstline() to check if we are at a new file and make a new row:
for line in f:
if fileinput.isfirstline():
row = []
table.append(row)
parts = line.split('&')
if len(parts) > 1:
row.append(parts[1].strip())
f.close()
Now table will be the transpose of what you really want, which is each row containing the second column entries of a given line of each file. To transpose the list, one can use zip and then loop over rows the transposed table, using the join string method to print each row with a comma separator (or whatever separator you want):
for row in zip(*table):
print(', '.join(row))
If something has open/close methods, use contextlib.closing:
import sys
import fileinput
from contextlib import closing
with closing(fileinput.input(files=('cutflow_TTJets_1l.txt ', 'cutflow_TTJets_1l.txt '))) as f:
for line in f:
proc(line)
I am still learner in python. I was not able to find a specific string and insert multiple strings after that string in python. I want to search the line in the file and insert the content of write function
I have tried the following which is inserting at the end of the file.
line = '<abc hij kdkd>'
dataFile = open('C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml', 'a')
dataFile.write('<!--Delivery Date: 02/15/2013-->\n<!--XML Script: 1.0.0.1-->\n')
dataFile.close()
You can use fileinput to modify the same file inplace and re to search for particular pattern
import fileinput,re
def modify_file(file_name,pattern,value=""):
fh=fileinput.input(file_name,inplace=True)
for line in fh:
replacement=value + line
line=re.sub(pattern,replacement,line)
sys.stdout.write(line)
fh.close()
You can call this function something like this:
modify_file("C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml",
"abc..",
"!--Delivery Date:")
Python strings are immutable, which means that you wouldn't actually modify the input string -you would create a new one which has the first part of the input string, then the text you want to insert, then the rest of the input string.
You can use the find method on Python strings to locate the text you're looking for:
def insertAfter(haystack, needle, newText):
""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
You could use it like
print insertAfter("Hello World", "lo", " beautiful") # prints 'Hello beautiful world'
Here is a suggestion to deal with files, I suppose the pattern you search is a whole line (there is nothing more on the line than the pattern and the pattern fits on one line).
line = ... # What to match
input_filepath = ... # input full path
output_filepath = ... # output full path (must be different than input)
with open(input_filepath, "r", encoding=encoding) as fin \
open(output_filepath, "w", encoding=encoding) as fout:
pattern_found = False
for theline in fin:
# Write input to output unmodified
fout.write(theline)
# if you want to get rid of spaces
theline = theline.strip()
# Find the matching pattern
if pattern_found is False and theline == line:
# Insert extra data in output file
fout.write(all_data_to_insert)
pattern_found = True
# Final check
if pattern_found is False:
raise RuntimeError("No data was inserted because line was not found")
This code is for Python 3, some modifications may be needed for Python 2, especially the with statement (see contextlib.nested. If your pattern fits in one line but is not the entire line, you may use "theline in line" instead of "theline == line". If your pattern can spread on more than one line, you need a stronger algorithm. :)
To write to the same file, you can write to another file and then move the output file over the input file. I didn't plan to release this code, but I was in the same situation some days ago. So here is a class that insert content in a file between two tags and support writing on the input file: https://gist.github.com/Cilyan/8053594
Frerich Raabe...it worked perfectly for me...good one...thanks!!!
def insertAfter(haystack, needle, newText):
#""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
with open(sddraft) as f1:
tf = open("<path to your file>", 'a+')
# Read Lines in the file and replace the required content
for line in f1.readlines():
build = insertAfter(line, "<string to find in your file>", "<new value to be inserted after the string is found in your file>") # inserts value
tf.write(build)
tf.close()
f1.close()
shutil.copy("<path to the source file --> tf>", "<path to the destination where tf needs to be copied with the file name>")
Hope this helps someone:)
I have a .txt file,primary list, with strings like this:
f
r
y
h
g
j
and I have a .csv file,recipes list, with rows like this:
d,g,r,e,w,s
j,f,o,b,x,q,h
y,n,b,w,q,j
My programe is going throw each row and counts number of objects which belongs to primary list, for example in this case outcome is:
2
3
2
I always get 0, the mistake must be silly, but I can't figure it out:
from __future__ import print_function
import csv
primary_data = open('test_list.txt','r')
primary_list = []
for line in primary_data.readlines():
line.strip('\n')
primary_list.append(line)
recipes_reader = csv.reader(open('test.csv','r'), delimiter =',')
for row in recipes_reader:
primary_count = 0
for i in row:
if i in primary_list:
primary_count += 1
print (primary_count)
Here's the bare-essentials pedal-to-the-metal version:
from __future__ import print_function
import csv
with open('test_list.txt', 'r') as f: # with statement ensures your file is closed
primary_set = set(line.strip() for line in f)
with open('test.csv', 'rb') as f: #### see note below ###
for row in csv.reader(f): # delimiter=',' is the default
print(sum(i in primary_set for i in row)) # i in primary_set has int value 0 or 1
Note: In Python 2.x, always open csv files in binary mode. In Python3.x, always open csv files with newline=''
The reading into primary_list adds \n to each number - you should remove it:
When appending to primary_list do:
for line in primary_data:
primary_list.append(line.strip())
Note the strip call. Also, as you can see, you don't really need realines, since for line in primary_data already does what you need when primary_data is a file object.
Now, as a general comment, since you're using the primary list for lookup, I suggest replacing the list by a set - this will make things much faster if the list is large. Python sets are very efficient for key-based lookup, lists are not designed for that purpose.
Following code would solve the problem.
from __future__ import print_function
import csv
primary_data = open('test_list.txt','r')
primary_list = [line.rstrip() for line in primary_data]
recipies_reader = csv.reader(open('recipies.csv','r'), delimiter =',')
for row in recipies_reader:
count = 0
for i in row:
if i in primary_list:
count += 1
print (count)
Output
2
3
2