I am trying to pass unique rows to a txt file after doing a web scraping for certain values. So the txt file involves the following:
Current date Amount Gained
15/07/2017 660
16/07/2017 -200
17/07/2017 300
So basically what I want to do is to write a script that only allows unique rows I dont want any duplicates because values change daily. So if a user by accident runs the script two times in one day I dont want a duplicate row in my txt file because it will affect further calculations in my data analysis. So this is the function that I currently have and I will like to know what modifications should I make?
def Cost_Revenues_Difference():
nrevenue = revenue
ndifference = difference
dateoftoday = time.strftime('%d/%m/%Y')
Net_Result.append(nrevenue)
with open('Net_Result.txt', 'a') as ac:
for x in Net_Result:
ac.write('\n' + dateoftoday + ' ' + str(Net_Result))
Cost_Revenues_Difference()
You can read all data of your file into list before:
with open('Net_Result.txt') as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content]
Then check if the line you want to add does not exist in your content list, if not, add that line to file:
newLine = dateoftoday + ' ' + str(Net_Result);
if not newLine in content:
ac.write('\n' + newLine)
If the file is affordable to be loaded into RAM and has the structure you given in the example lines, maybe dump the data as a python object into .pkl. For example:
import pickle
data = {'15/07/2017': 660,
'16/07/2017': -200,
'17/07/2017': 300}
with open('/path/to/the/file.pkl', 'wb') as file:
pickle.dump(data, file)
pickle files are friendly for python objects, you can utilise dictionary object's built-in methods to avoid redundant entries or make updates.
For more complicate structures, take a look at pandas.Dataframes. If your program works with languages other than python, json or xml might be better choices.
There are many ways you can do this. Two alternative ways described below.
1 (this alt updates the value)
One is to put them in a dictionary with key and value in pairs and use the json library to import and export data (benefit: very common data structure).
import json
with open("test.json") as f:
data = json.loads(f.read())
data["18-05-17"] = 123
with open("test.json", "w") as f:
json.dump(data,f,indent=4)
Test.json
{
"18-05-17": 123,
"17-05-17": 123
}
As a dictionary only can hold unique keys you won't have duplicates.
2 (this alt will not update the value)
Another solution that comes in mind is put the current date in the filename:
import datetime
import os
today = datetime.datetime.today().strftime("%y%m%d")
filedate = [i for i in os.listdir() if i.startswith("Net_result")][0]
# If today is different than the filedate continue
if today != os.path.splitext(filedate)[0].split("_")[-1]:
# code here
with open(filedate, "a") as f:
f.write('\n' + dateoftoday + ' ' + str(Net_Result))
# rename
os.rename(filedate,"Net_result_{}.csv".format(today))
You could start with a file with yesterdays date ("Net_result_170716") and the code would check if the file-ending is different from today (which it is) and add new value, rename file and save. Running the code again would not do anything (not even open the file).
Related
FYI I am new to Python and this website!
I have a csv file:
Product Number,Account Number,Transactions,Year Number,Left Output,Mid Output
43854835,12345,23123,12,12,45
4353454,23456,123213213,4,23,56
7657657,34567,321321,5,34,67
21321312,45678,321321,8,45,78
21312313,56789,2131233,3,56,89
If I want to refer to column 2 as the one where I need to conduct left and mid in Python, what is the best approach without libraries? I also want to append at the last column of the data as seen in the image.
This takes in a .csv file, reads the lines into a list, appends new data using the LEFT and MID functions and saves it to a new file (newFile.csv). This works according to the data in the imgur link.
Note: the script is hardly optimised; it was tested on ~2 million lines and it took a couple of minutes and ALOT of ram (2-3GB), so be careful about running this (backup original csv file, save work, close programs, etc ...)
I could modify this to batch process lines so memory is freed as well as maybe some sort of cache, but since I'm assuming it will be used sparingly, it should probably be fine.
filename = "myFile.csv"
# don't want to overwrite original
new_filename = "newFile.csv"
def LEFT(s, length):
# example: LEFT("apple",3) returns "app".
return str(s[:length])
def MID(s, start, length):
# example: MID("apple",2,3) returns "ppl"
return str(s[start - 1: start - 1 + length])
# read file contents into list
with open(filename, 'r') as file:
# store file data in a list
file_data = file.read().splitlines()
# loop and append new data to list
for i, line in enumerate(file_data):
# ignore header
if (i == 0): continue;
# parse 2nd column
second_column = line.split(",")[1]
# append 5th and 6th column
file_data[i] += "," + LEFT(second_column, 2) \
+ "," + MID(second_column, 4, 2)
# write modified list to new file
with open(new_filename, 'w') as file:
for line in file_data:
file.write(line + '\n')
Is there a way to do this? Say I have a file that's a list of names that goes like this:
Alfred
Bill
Donald
How could I insert the third name, "Charlie", at line x (in this case 3), and automatically send all others down one line? I've seen other questions like this, but they didn't get helpful answers. Can it be done, preferably with either a method or a loop?
This is a way of doing the trick.
with open("path_to_file", "r") as f:
contents = f.readlines()
contents.insert(index, value)
with open("path_to_file", "w") as f:
contents = "".join(contents)
f.write(contents)
index and value are the line and value of your choice, lines starting from 0.
If you want to search a file for a substring and add a new text to the next line, one of the elegant ways to do it is the following:
import os, fileinput
old = "A"
new = "B"
for line in fileinput.FileInput(file_path, inplace=True):
if old in line :
line += new + os.linesep
print(line, end="")
There is a combination of techniques which I found useful in solving this issue:
with open(file, 'r+') as fd:
contents = fd.readlines()
contents.insert(index, new_string) # new_string should end in a newline
fd.seek(0) # readlines consumes the iterator, so we need to start over
fd.writelines(contents) # No need to truncate as we are increasing filesize
In our particular application, we wanted to add it after a certain string:
with open(file, 'r+') as fd:
contents = fd.readlines()
if match_string in contents[-1]: # Handle last line to prevent IndexError
contents.append(insert_string)
else:
for index, line in enumerate(contents):
if match_string in line and insert_string not in contents[index + 1]:
contents.insert(index + 1, insert_string)
break
fd.seek(0)
fd.writelines(contents)
If you want it to insert the string after every instance of the match, instead of just the first, remove the else: (and properly unindent) and the break.
Note also that the and insert_string not in contents[index + 1]: prevents it from adding more than one copy after the match_string, so it's safe to run repeatedly.
You can just read the data into a list and insert the new record where you want.
names = []
with open('names.txt', 'r+') as fd:
for line in fd:
names.append(line.split(' ')[-1].strip())
names.insert(2, "Charlie") # element 2 will be 3. in your list
fd.seek(0)
fd.truncate()
for i in xrange(len(names)):
fd.write("%d. %s\n" %(i + 1, names[i]))
The accepted answer has to load the whole file into memory, which doesn't work nicely for large files. The following solution writes the file contents with the new data inserted into the right line to a temporary file in the same directory (so on the same file system), only reading small chunks from the source file at a time. It then overwrites the source file with the contents of the temporary file in an efficient way (Python 3.8+).
from pathlib import Path
from shutil import copyfile
from tempfile import NamedTemporaryFile
sourcefile = Path("/path/to/source").resolve()
insert_lineno = 152 # The line to insert the new data into.
insert_data = "..." # Some string to insert.
with sourcefile.open(mode="r") as source:
destination = NamedTemporaryFile(mode="w", dir=str(sourcefile.parent))
lineno = 1
while lineno < insert_lineno:
destination.file.write(source.readline())
lineno += 1
# Insert the new data.
destination.file.write(insert_data)
# Write the rest in chunks.
while True:
data = source.read(1024)
if not data:
break
destination.file.write(data)
# Finish writing data.
destination.flush()
# Overwrite the original file's contents with that of the temporary file.
# This uses a memory-optimised copy operation starting from Python 3.8.
copyfile(destination.name, str(sourcefile))
# Delete the temporary file.
destination.close()
EDIT 2020-09-08: I just found an answer on Code Review that does something similar to above with more explanation - it might be useful to some.
You don't show us what the output should look like, so one possible interpretation is that you want this as the output:
Alfred
Bill
Charlie
Donald
(Insert Charlie, then add 1 to all subsequent lines.) Here's one possible solution:
def insert_line(input_stream, pos, new_name, output_stream):
inserted = False
for line in input_stream:
number, name = parse_line(line)
if number == pos:
print >> output_stream, format_line(number, new_name)
inserted = True
print >> output_stream, format_line(number if not inserted else (number + 1), name)
def parse_line(line):
number_str, name = line.strip().split()
return (get_number(number_str), name)
def get_number(number_str):
return int(number_str.split('.')[0])
def format_line(number, name):
return add_dot(number) + ' ' + name
def add_dot(number):
return str(number) + '.'
input_stream = open('input.txt', 'r')
output_stream = open('output.txt', 'w')
insert_line(input_stream, 3, 'Charlie', output_stream)
input_stream.close()
output_stream.close()
Parse the file into a python list using file.readlines() or file.read().split('\n')
Identify the position where you have to insert a new line, according to your criteria.
Insert a new list element there using list.insert().
Write the result to the file.
location_of_line = 0
with open(filename, 'r') as file_you_want_to_read:
#readlines in file and put in a list
contents = file_you_want_to_read.readlines()
#find location of what line you want to insert after
for index, line in enumerate(contents):
if line.startswith('whatever you are looking for')
location_of_line = index
#now you have a list of every line in that file
context.insert(location_of_line, "whatever you want to append to middle of file")
with open(filename, 'w') as file_to_write_to:
file_to_write_to.writelines(contents)
That is how I ended up getting whatever data I want to insert to the middle of the file.
this is just pseudo code, as I was having a hard time finding clear understanding of what is going on.
essentially you read in the file to its entirety and add it into a list, then you insert your lines that you want to that list, and then re-write to the same file.
i am sure there are better ways to do this, may not be efficient, but it makes more sense to me at least, I hope it makes sense to someone else.
A simple but not efficient way is to read the whole content, change it and then rewrite it:
line_index = 3
lines = None
with open('file.txt', 'r') as file_handler:
lines = file_handler.readlines()
lines.insert(line_index, 'Charlie')
with open('file.txt', 'w') as file_handler:
file_handler.writelines(lines)
I write this in order to reutilize/correct martincho's answer (accepted one)
! IMPORTANT: This code loads all the file into ram and rewrites content to the file
Variables index, value may be what you desire, but pay attention to making value string and end with '\n' if you don't want it to mess with existing data.
with open("path_to_file", "r+") as f:
# Read the content into a variable
contents = f.readlines()
contents.insert(index, value)
# Reset the reader's location (in bytes)
f.seek(0)
# Rewrite the content to the file
f.writelines(contents)
See the python docs about file.seek method: Python docs
Below is a slightly awkward solution for the special case in which you are creating the original file yourself and happen to know the insertion location (e.g. you know ahead of time that you will need to insert a line with an additional name before the third line, but won't know the name until after you've fetched and written the rest of the names). Reading, storing and then re-writing the entire contents of the file as described in other answers is, I think, more elegant than this option, but may be undesirable for large files.
You can leave a buffer of invisible null characters ('\0') at the insertion location to be overwritten later:
num_names = 1_000_000 # Enough data to make storing in a list unideal
max_len = 20 # The maximum allowed length of the inserted line
line_to_insert = 2 # The third line is at index 2 (0-based indexing)
with open(filename, 'w+') as file:
for i in range(line_to_insert):
name = get_name(i) # Returns 'Alfred' for i = 0, etc.
file.write(F'{i + 1}. {name}\n')
insert_position = file.tell() # Position to jump back to for insertion
file.write('\0' * max_len + '\n') # Buffer will show up as a blank line
for i in range(line_to_insert, num_names):
name = get_name(i)
file.write(F'{i + 2}. {name}\n') # Line numbering now bumped up by 1.
# Later, once you have the name to insert...
with open(filename, 'r+') as file: # Must use 'r+' to write to middle of file
file.seek(insert_position) # Move stream to the insertion line
name = get_bonus_name() # This lucky winner jumps up to 3rd place
new_line = F'{line_to_insert + 1}. {name}'
file.write(new_line[:max_len]) # Slice so you don't overwrite next line
Unfortunately there is no way to delete-without-replacement any excess null characters that did not get overwritten (or in general any characters anywhere in the middle of a file), unless you then re-write everything that follows. But the null characters will not affect how your file looks to a human (they have zero width).
I need to make a program that receives a integer and stores it on a file. When it has 15 (or 20, the exact number doesn't matter) it will overwrite the first one that it wrote. They may be on the same line or each one in a new line.
This program reads temperature from a sensor and then i will show that on a site with a php chart.
I thought about writing a value every half an hour maybe, and when it has 15 values and a new one comes it overwrites the oldest one.
I'm having troubles saving the values, i dont know how to save the list as a string with new lines, it saves double new lines, i'm new at python and i get really lost.
This doesn't work but it is a "sample" of what i want to do:
import sys
import os
if not( sys.argv[1:] ):
print "No parameter"
exit()
# If file doesn't exist, create it and save the value
if not os.path.isfile("tempsHistory"):
data = open('tempsHistory', 'w+')
data.write( ''.join( sys.argv[1:] ) + '\n' )
else:
data = open('tempsHistory', 'a+')
temps = []
for line in data:
temps += line.split('\n')
if ( len( temps ) < 15 ):
data.write( '\n'.join( sys.argv[1:] ) + '\n' )
else:
#Maximum amount reached, save new, delete oldest
del temps[ 0 ]
temps.append( '\n'.join( sys.argv[1:] ) )
data.truncate( 0 )
data.write( '\n'.join(str(e) for e in temps) )
data.close( )
Im getting lost with the ''.join and \n etc... I mean, i have to write with join to make the list save as a string and not with the [ '', '']. If i use '\n'.join, it saves double space, i think.
Thank you in advance!
I think what you want is something like this:
import sys
fileTemps = 'temps'
with open(fileTemps, 'rw') as fd:
temps = fd.readlines()
if temps.__len__() >= 15:
temps.pop(0)
temps.append(' '.join(sys.argv[1:]) + '\n')
with open(fileTemps, 'w') as fd:
for l in temps:
fd.write(l)
First you open the file for reading. The fd.readlines() call will give you the lines in the file. Then you check the size, and if the number of lines is greater than 15, then you pop the first value and append the new line. Then you write everything to a file.
In Python, generally, when you read from a file (e.g. using readline()) gives you the line with an '\n' at the end, that is why you get double line breaks.
Hope this helps.
You want something like
values = open(target_file, "r").read().split("\n")
# ^ this solves your original problem as readline() will keep the \n in returned list items
if len(values) >= 15:
# keep the values at 15
values.pop()
values.insert(0, new_value)
# push new value at the start of the list
tmp_fd, tmp_fn = tempfile.mkstemp()
# ^ this part is important
os.write(tmp_fd, "\n".join(values))
os.close(tmp_fd)
shutil.move(tmp_fn, target_file)
# ^ as here, the operation of actual write to the file, your webserver is reading, is atomic
# this is eg. how text editors save files
But anyway, I'd suggest you to consider using a database, be it postgresql, redis, sqlite or whatever floats your boat
You should try to not confuse storing data in lists with formatting in strings. Data does not require the "\n"s
So just temps.append(sys.argv[1:]) is enough.
In addition you should not serialize / deserialize the data on your own. Have a look into pickle. This is much simpler to use than reading / writing lists on your own.
I am trying to find the min and max out of a csv file, and have it output into a text file, currently my code outputs all data into the output file, and I am unsure of how to grab the data out of the multiple columns and have them sorted accordingly.
Any guidance would be appreciated, as I don't have a good lead on how to figure this out
read_file = open("riskfactors.csv", 'r')
def create_file():
read_file = open("riskfactors.csv", 'r')
write_file = open("best_and_worst.txt", "w")
for line_str in read_file:
read_file.readline()
print (line_str,file=write_file)
write_file.close()
read_file.close()
Assuming your file is a standard .csv file containing only numbers separated by semicolons:
1;5;7;6;
3;8;1;1;
Then it's easiest to use the str.split() command, followed by a type conversion to int.
You could store all values in a list (or quicker: set) and then get the maximum:
valuelist=[]
for line_str in read_file:
for cell in line_str.split(";"):
valuelist.append(int(cell))
print(max(valuelist))
print(min(valuelist))
Warning: If your file contains non-number entries you'd have to filter them out. .csv-files can also have different delimiters.
import sys, csv
def cmp_risks(x, y):
# This assumes risk factors are prioritised by key columns 1, 3
# and that column 1 is numeric while column 3 is textual
return cmp(int(x[0]), int(y[0])) or cmp(x[2], y[2])
l = sorted(csv.reader(sys.stdin), cmp_risks))
# Write out the first and last rows
csv.writer(sys.stdout).writerows([l[0], l[len(l)-1]])
Now, I took a shortcut and said the input and output files were sys.stdin and sys.stdout. You'd probably replace these with the file objects you created in your original question. (e.g. read_file and write_file)
However, in my case, I'd probably just run it (if I were using linux) with:
$ ./foo.py <riskfactors.csv >best_and_worst.txt
Is there a way to do this? Say I have a file that's a list of names that goes like this:
Alfred
Bill
Donald
How could I insert the third name, "Charlie", at line x (in this case 3), and automatically send all others down one line? I've seen other questions like this, but they didn't get helpful answers. Can it be done, preferably with either a method or a loop?
This is a way of doing the trick.
with open("path_to_file", "r") as f:
contents = f.readlines()
contents.insert(index, value)
with open("path_to_file", "w") as f:
contents = "".join(contents)
f.write(contents)
index and value are the line and value of your choice, lines starting from 0.
If you want to search a file for a substring and add a new text to the next line, one of the elegant ways to do it is the following:
import os, fileinput
old = "A"
new = "B"
for line in fileinput.FileInput(file_path, inplace=True):
if old in line :
line += new + os.linesep
print(line, end="")
There is a combination of techniques which I found useful in solving this issue:
with open(file, 'r+') as fd:
contents = fd.readlines()
contents.insert(index, new_string) # new_string should end in a newline
fd.seek(0) # readlines consumes the iterator, so we need to start over
fd.writelines(contents) # No need to truncate as we are increasing filesize
In our particular application, we wanted to add it after a certain string:
with open(file, 'r+') as fd:
contents = fd.readlines()
if match_string in contents[-1]: # Handle last line to prevent IndexError
contents.append(insert_string)
else:
for index, line in enumerate(contents):
if match_string in line and insert_string not in contents[index + 1]:
contents.insert(index + 1, insert_string)
break
fd.seek(0)
fd.writelines(contents)
If you want it to insert the string after every instance of the match, instead of just the first, remove the else: (and properly unindent) and the break.
Note also that the and insert_string not in contents[index + 1]: prevents it from adding more than one copy after the match_string, so it's safe to run repeatedly.
You can just read the data into a list and insert the new record where you want.
names = []
with open('names.txt', 'r+') as fd:
for line in fd:
names.append(line.split(' ')[-1].strip())
names.insert(2, "Charlie") # element 2 will be 3. in your list
fd.seek(0)
fd.truncate()
for i in xrange(len(names)):
fd.write("%d. %s\n" %(i + 1, names[i]))
The accepted answer has to load the whole file into memory, which doesn't work nicely for large files. The following solution writes the file contents with the new data inserted into the right line to a temporary file in the same directory (so on the same file system), only reading small chunks from the source file at a time. It then overwrites the source file with the contents of the temporary file in an efficient way (Python 3.8+).
from pathlib import Path
from shutil import copyfile
from tempfile import NamedTemporaryFile
sourcefile = Path("/path/to/source").resolve()
insert_lineno = 152 # The line to insert the new data into.
insert_data = "..." # Some string to insert.
with sourcefile.open(mode="r") as source:
destination = NamedTemporaryFile(mode="w", dir=str(sourcefile.parent))
lineno = 1
while lineno < insert_lineno:
destination.file.write(source.readline())
lineno += 1
# Insert the new data.
destination.file.write(insert_data)
# Write the rest in chunks.
while True:
data = source.read(1024)
if not data:
break
destination.file.write(data)
# Finish writing data.
destination.flush()
# Overwrite the original file's contents with that of the temporary file.
# This uses a memory-optimised copy operation starting from Python 3.8.
copyfile(destination.name, str(sourcefile))
# Delete the temporary file.
destination.close()
EDIT 2020-09-08: I just found an answer on Code Review that does something similar to above with more explanation - it might be useful to some.
You don't show us what the output should look like, so one possible interpretation is that you want this as the output:
Alfred
Bill
Charlie
Donald
(Insert Charlie, then add 1 to all subsequent lines.) Here's one possible solution:
def insert_line(input_stream, pos, new_name, output_stream):
inserted = False
for line in input_stream:
number, name = parse_line(line)
if number == pos:
print >> output_stream, format_line(number, new_name)
inserted = True
print >> output_stream, format_line(number if not inserted else (number + 1), name)
def parse_line(line):
number_str, name = line.strip().split()
return (get_number(number_str), name)
def get_number(number_str):
return int(number_str.split('.')[0])
def format_line(number, name):
return add_dot(number) + ' ' + name
def add_dot(number):
return str(number) + '.'
input_stream = open('input.txt', 'r')
output_stream = open('output.txt', 'w')
insert_line(input_stream, 3, 'Charlie', output_stream)
input_stream.close()
output_stream.close()
Parse the file into a python list using file.readlines() or file.read().split('\n')
Identify the position where you have to insert a new line, according to your criteria.
Insert a new list element there using list.insert().
Write the result to the file.
location_of_line = 0
with open(filename, 'r') as file_you_want_to_read:
#readlines in file and put in a list
contents = file_you_want_to_read.readlines()
#find location of what line you want to insert after
for index, line in enumerate(contents):
if line.startswith('whatever you are looking for')
location_of_line = index
#now you have a list of every line in that file
context.insert(location_of_line, "whatever you want to append to middle of file")
with open(filename, 'w') as file_to_write_to:
file_to_write_to.writelines(contents)
That is how I ended up getting whatever data I want to insert to the middle of the file.
this is just pseudo code, as I was having a hard time finding clear understanding of what is going on.
essentially you read in the file to its entirety and add it into a list, then you insert your lines that you want to that list, and then re-write to the same file.
i am sure there are better ways to do this, may not be efficient, but it makes more sense to me at least, I hope it makes sense to someone else.
A simple but not efficient way is to read the whole content, change it and then rewrite it:
line_index = 3
lines = None
with open('file.txt', 'r') as file_handler:
lines = file_handler.readlines()
lines.insert(line_index, 'Charlie')
with open('file.txt', 'w') as file_handler:
file_handler.writelines(lines)
I write this in order to reutilize/correct martincho's answer (accepted one)
! IMPORTANT: This code loads all the file into ram and rewrites content to the file
Variables index, value may be what you desire, but pay attention to making value string and end with '\n' if you don't want it to mess with existing data.
with open("path_to_file", "r+") as f:
# Read the content into a variable
contents = f.readlines()
contents.insert(index, value)
# Reset the reader's location (in bytes)
f.seek(0)
# Rewrite the content to the file
f.writelines(contents)
See the python docs about file.seek method: Python docs
Below is a slightly awkward solution for the special case in which you are creating the original file yourself and happen to know the insertion location (e.g. you know ahead of time that you will need to insert a line with an additional name before the third line, but won't know the name until after you've fetched and written the rest of the names). Reading, storing and then re-writing the entire contents of the file as described in other answers is, I think, more elegant than this option, but may be undesirable for large files.
You can leave a buffer of invisible null characters ('\0') at the insertion location to be overwritten later:
num_names = 1_000_000 # Enough data to make storing in a list unideal
max_len = 20 # The maximum allowed length of the inserted line
line_to_insert = 2 # The third line is at index 2 (0-based indexing)
with open(filename, 'w+') as file:
for i in range(line_to_insert):
name = get_name(i) # Returns 'Alfred' for i = 0, etc.
file.write(F'{i + 1}. {name}\n')
insert_position = file.tell() # Position to jump back to for insertion
file.write('\0' * max_len + '\n') # Buffer will show up as a blank line
for i in range(line_to_insert, num_names):
name = get_name(i)
file.write(F'{i + 2}. {name}\n') # Line numbering now bumped up by 1.
# Later, once you have the name to insert...
with open(filename, 'r+') as file: # Must use 'r+' to write to middle of file
file.seek(insert_position) # Move stream to the insertion line
name = get_bonus_name() # This lucky winner jumps up to 3rd place
new_line = F'{line_to_insert + 1}. {name}'
file.write(new_line[:max_len]) # Slice so you don't overwrite next line
Unfortunately there is no way to delete-without-replacement any excess null characters that did not get overwritten (or in general any characters anywhere in the middle of a file), unless you then re-write everything that follows. But the null characters will not affect how your file looks to a human (they have zero width).