Modify a DBF file - python

I want to modify one column in a .dpf file using Python with this library http://pythonhosted.org/dbf/. When I want to print out some column, it works just fine. But when I am trying to modify a column, I get error
unable to modify fields individually except in with or Process()
My code:
table = dbf.Table('./path/to/file.dbf')
table.open()
for record in table:
record.izo = sys.argv[2]
table.close()
In docs, they recommend doing it like
for record in Write(table):
But I also get an error:
name 'Write' is not defined
And:
record.write_record(column=sys.argv[2])
Also gives me an error that
write_record - no such field in table
Thanks!

My apologies for the state of the docs. Here are a couple options that should work:
table = dbf.Table('./path/to/file.dbf')
# Process will open and close a table if not already opened
for record in dbf.Process(table):
record.izo = sys.argv[2]
or
with dbf.Table('./path/to/file.dbf')
# with opens a table, closes when done
for record in table:
with record:
record.izo = sys.argv[2]

I have been trying to make a change to my dbf file for several days and searched and browsed several websites, this page was the only one that gave me a solution that worked. Just to add a little more information so that whoever lands here would understand the above piece of code that Ethan Furman shared.
import dbf
table = dbf.Table('your_dbf_filename.dbf')
# Process will open and close a table if not already opened
for record in dbf.Process(table):
record.your_column_name = 'New_Value_of_that_column'
Now, because you don't have a condition mentioned here, you would end you updating all the rows of your column. Remember, this statement will immediately reflect the new value in that column. So, the advice is to save a copy of this dbf file before making any edits to it.
I also tried the 2nd solution that Ethan mentions, but it throws an error that 'table' not defined.

Related

Storing DataFrame output utilizing Pandas to either csv or MySql DB

question regarding pandas:
Say I created a dataframe and generated output under separate variables, rather than printing them, how would I go about combining them back into another dataframe correctly to either send as a CSV and then upload to a DB or directly upload to a DB?
Everything works fine code wise, I just haven't really seen or know of the best practice to do this. I know we can store things in list, dict, etc
What I did was:
#imported all modules
object = df.iloc[0,0]
#For loop magic goes here
#nested for loop
#if conditions are met, do this
result = df.iloc[i, k+1]
print(object, result)
I've also stored them into a separate DataFrame trying:
df2 = pd.DataFrame({'object': object, 'result' : result}, index=[0])
df2.to_csv('output.csv', index=False, mode='a')
The only problem with that is that it appends everything to each row, most likely do to the append and perhaps not including it in the for loop. Which is odd because the raw output is EXACTLY how I'm trying to get it into a csv or into a DB
As saying though, looking to combine both values back into a dataframe for speed. I tried concat etc, but no luck, so I was wondering what the correct format would be? Thanks
So it turns out that after more research and revising, I solved my issue
Referenced this and personal revisions, this is a basis of what I did:
Empty space in between rows after using writer in python
import csv
/* Had to wrap in a for loop that is not listed and append to file while clearing it first to remove spaces in each row*/
with open('csvexample.csv', wb+, newline='') as file:
reader = csv.reader(file)
for row in reader:
print(row)
Additional supporting material:
Confused by python file mode "w+"

Retrieve document '_id' of a GridFS document, by its 'filename'

I am currently working on a project in which I must retrieve a document uploaded on a MongoDB database using GridFS and store it in my local directory.
Up to now I have written these lines of code:
if not fs.exists({'filename': 'my_file.txt'}):
CRAWLED_FILE = os.path.join(SAVING_FOLDER, 'new_file.txt')
else:
file = fs.find_one({'filename': 'my_file.txt'})
CRAWLED_FILE = os.path.join(SAVING_FOLDER, 'new_file.txt')
with open(CRAWLED_FILE, 'wb') as f:
f.write(file.read())
f.close()
I believe that find_one doesn't allow me to write in a new file the content of the file previously stored in the database. f.write(file.read()) writes in the file just created (new_file.txt) the directory in which (new_file.txt) is stored! So I have a txt completely different from the one I have uploaded in the database and the only line in the txt is: E:\\my_folder\\sub_folder\\my_file.txt
It's kind of weird, I don't even know why it is happening.
I thought it could work if I use the fs.get(ObjectId(ID)) method, which, according to the official documentation of Pymongo and GridFS, it provides a file-like interface for reading. However I just know the name of the txt saved in the database, I have no clue what is the object ID, I cannot use a list or dict to store all the IDs of my documents since it wouldn't be worthy. I have checked with many posts here on StackOverflow and everyone suggests to use subscription. Basically you create a cursor using fs.find()then you can iterate over the cursor for example like this:
for x in fs.find({'filename': 'my_file.txt'}):
ID = x['_id']
see, many answers here suggest me to do the following, the only problem is that Cursor object is not subscriptable and I have no clue how I can resolve this issue.
I must find way to get the document '_id' given the filename of the document so I can later use it combined with fs.get(ObjectId(ID)).
Hope you can help me, thank you a lot!
Matteo
You can just access it like this:
ID = x._id
But "_" is a protected member in Python, so I was looking around for other solutions (could not find much). For getting just the ID, you could do:
for ID in fs.find({'filename': 'my_file.txt'}).distinct('_id'):
# do something with ID
Since that only gets the IDs, you would probably need to do:
query = fs.find({'filename': 'my_file.txt'}).limit(1) # equivalent to find_one
content = next(query, None) # Iterate GridOutCursor, should have either one element or None
if content:
ID = content._id
...

How to delete rows in an excel file with csv

Sorry for dragging the entire thread for a millennium and making it way too unsolvable, but I have solved majority of the problems, it's just that I want to fix my delete code but I have no idea where to start with it.
def delete_record():
with open('StudentDetails.csv', 'wb',) as csvfile:
csvFileWriter = csv.writer(csvfile)
a = delete_student_entry.get()
b = delete_password_entry.get()
csvFileWriter.writerow([a,b])
I want the program to take the input from the entry box in the window (as I am using Tkinter) and delete an entire row from the input. I will welcome any rewrites of the code as it will be a subroutine for my "delete student" menu (unconventional, I know, but using other forms of calling the subroutine will result in an error in which I had no choice but to do so...)
Do you mean deleting the row, which you get from your Entries, from your CSV file?
If so your answer can be found here:
using Python for deleting a specific line in a file

Python program for line search and replacement in a CSV file

I am building a small tool in Python, the function of the tool is the following:
Open a master data file (cvs format)
open a log file (cvs format)
Request the user to select the pointer for the field that will need to be compared in both files.
Start comparing record by record
when the field is not found in the record proceed to look forward in the log file until the field can be found, in the meantime keep in memory the pointer for when the comparison will be continued.
Once the field is found proceed to cut the whole record off that line and place it in the right position
Here is an example :
Data file
"1","1234","abc"
"2","5678","def"
"3","9012","ghi"
log file
"1","1234","abc"
"3","9012","ghi"
"2","5678","def"
final log file :
"1","1234","abc"
"2","5678","def"
"3","9012","ghi"
I was looking in the cvs lib in python and the sqlite3 lib but there is nothing that seems to really do a swap in file so i was thinking that maybe i should just create a new file with all the records in order.
What could be done in this regard ? Is there a library or a command that can move records in an existing file ?
I would prefer to modify the existing file instead of creating a new one but if it's not possible i would just move to create a new one.
In addition to that the code i was planning to use to verify the files was this :
import csv
reader1 = csv.reader(open('data.csv', 'rb'), delimiter=',', quotechar='"'))
row1 = reader1.next()
reader2 = csv.reader(open('log.csv', 'rb'), delimiter=',', quotechar='"'))
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):
#here it move to the next record
else:
#here it would run a function that replace the field
Please note that this piece of code was found at this page :
Python: Comparing specific columns in two csv files
(i don't want to take away the glory from another coder).
I just like it for it's simplicity.
Thanks to all for the attention.
Regards
Danilo

Python DBF module is adding extra rows to a table export

I am doing a bulk import of dbf files to sqlite. I wrote a simple script in python using the dbf module at http://dbfpy.sourceforge.net/. It works fine and as expected except for a small few cases. In a very discreet numbr of cases the module seems to have added a few erroneous records to the table it was reading.
I know this sounds crazy right but it really seems to be the case. I have exported the dbase file in question to csv using open office and imported it directly to sqlite using .import and the 3 extra records are not there.
But if I iterate through the file using python and the dbfpy module the 3 extra records are added.
I am wondering is it possible that these three records were flagged as deleted in the dbf file and while invisible to open office are being picked up by the dbf module. I could be way off in this possibility but I am really scratching my head on this one.
Any help is appreciated.
What follows is a sample of my method for reading the dbf file. I have removed the loop and used one single case instead.
conn = lite.connect('../data/my_dbf.db3')
#used to get rid of the 8 byte string error from sqlite3
conn.text_factory = str
cur = conn.cursor()
rows_list = []
db = dbf.Dbf("../data/test.dbf")
for rec in db:
***if not rec.deleted:***
row_tuple = (rec["name"], rec["address"], rec["age"])
rows_list.append(row_tuple)
print file_name + " processed"
db.close()
cur.executemany("INSERT INTO exported_data VALUES(?, ?, ?)", rows_list)
#pprint.pprint(rows_list)
conn.commit()
Solution
Ok after about another half hour of testing before lunch I discovered that my possible hypothesis was in fact correct some files had not been packed and as such had records which had been flagged for deleted still remaining in them. They should not have been in an unpacked state after export so this caused more confusion.
I manually packed one file and tested it and it immediately returned the proper results.
A big thanks for the help on this. I had added in the solution given below to ignore the deleted records. I had searched and searched for this method(deleted) in this module but could not find an api doc for it, I even looked in the code but in the fog of it all it must have slipped by. Thanks a million for the solution and help guys.
If you wont to discard records marked as deleted, you can write:
for rec in db:
if not rec.deleted:
row_tuple = (rec["name"], rec["address"], rec["age"])
rows_list.append(row_tuple)

Categories