I am doing a bulk import of dbf files to sqlite. I wrote a simple script in python using the dbf module at http://dbfpy.sourceforge.net/. It works fine and as expected except for a small few cases. In a very discreet numbr of cases the module seems to have added a few erroneous records to the table it was reading.
I know this sounds crazy right but it really seems to be the case. I have exported the dbase file in question to csv using open office and imported it directly to sqlite using .import and the 3 extra records are not there.
But if I iterate through the file using python and the dbfpy module the 3 extra records are added.
I am wondering is it possible that these three records were flagged as deleted in the dbf file and while invisible to open office are being picked up by the dbf module. I could be way off in this possibility but I am really scratching my head on this one.
Any help is appreciated.
What follows is a sample of my method for reading the dbf file. I have removed the loop and used one single case instead.
conn = lite.connect('../data/my_dbf.db3')
#used to get rid of the 8 byte string error from sqlite3
conn.text_factory = str
cur = conn.cursor()
rows_list = []
db = dbf.Dbf("../data/test.dbf")
for rec in db:
***if not rec.deleted:***
row_tuple = (rec["name"], rec["address"], rec["age"])
rows_list.append(row_tuple)
print file_name + " processed"
db.close()
cur.executemany("INSERT INTO exported_data VALUES(?, ?, ?)", rows_list)
#pprint.pprint(rows_list)
conn.commit()
Solution
Ok after about another half hour of testing before lunch I discovered that my possible hypothesis was in fact correct some files had not been packed and as such had records which had been flagged for deleted still remaining in them. They should not have been in an unpacked state after export so this caused more confusion.
I manually packed one file and tested it and it immediately returned the proper results.
A big thanks for the help on this. I had added in the solution given below to ignore the deleted records. I had searched and searched for this method(deleted) in this module but could not find an api doc for it, I even looked in the code but in the fog of it all it must have slipped by. Thanks a million for the solution and help guys.
If you wont to discard records marked as deleted, you can write:
for rec in db:
if not rec.deleted:
row_tuple = (rec["name"], rec["address"], rec["age"])
rows_list.append(row_tuple)
Related
I'm new with psycopg2 and I do have a question (which I cannot really find a respond in the Internet): Do we have any difference (for exemaple in the aspect of performance) between using copy_xxx() method and combo execute() + fetchxxx() method when we try to write the result of query into a CSV file?
...
query_str = "SELECT * FROM mytable"
cursor.execute(query_str)
with open("my_file.csv", "w+") as file:
writer = csv.writer(file)
while True:
rows = cursor.fetchmany()
if not rows:
break
writer.writerows(rows)
vs
...
query_str = "SELECT * FROM mytable"
output_query = f"COPY {query_str} TO STDOUT WITH CSV HEADER"
with open("my_file.csv", "w+") as file:
cursor.copy_expert(output_query, file)
And if I try to do a very complex query (my assumption is that we cannot simplify this query anymore for ex) with psycopg2, which method should I use? Or do you guys have any advice, please?
Many thanks!!!
COPY is faster, but if query execution time is dominant or the file is small, it won't matter much.
You don't show us how the cursor was declared. If it is an anonymous cursor, then execute/fetch will read all query data into memory upfront, leading to out of memory conditions for very large queries. If it is a named cursor, then you will individually request every row from the server, leading to horrible performance (which can be overcome by specifying a count argument to fetchmany, as the default is bizarrely set to 1)
I am using python and pyscopg2.
If I run code below, the user provided csv file will be open and read. Then the content contains in csv file will be transferred to database.
I want to know if the code is at risk of SQL injection when some unexpected words or symbols contain in the csv file.
conn_config = dict(port="5432", dbname="test", password="test")
with psycopg2.connection(**conn_config) as conn:
with conn.cursor() as cur:
with open("test.csv") as f:
cur.copy_expert(sql="COPY test FROM STDIN", file=f)
I read some documents of psycopg2 and postgres, but I did not found the result.
Please know that English is not my native language, and I may make some confusing mistakes
The command simply copies the data into the table. No part of the copied data may be interpreted as an SQL command, so SQL injection is out of the question. Additional security is the rigid CSV format. If the data contains extra (redundant) writes, the command will simply fail. The only risk of the command operation may be strange contents in the table.
I want to modify one column in a .dpf file using Python with this library http://pythonhosted.org/dbf/. When I want to print out some column, it works just fine. But when I am trying to modify a column, I get error
unable to modify fields individually except in with or Process()
My code:
table = dbf.Table('./path/to/file.dbf')
table.open()
for record in table:
record.izo = sys.argv[2]
table.close()
In docs, they recommend doing it like
for record in Write(table):
But I also get an error:
name 'Write' is not defined
And:
record.write_record(column=sys.argv[2])
Also gives me an error that
write_record - no such field in table
Thanks!
My apologies for the state of the docs. Here are a couple options that should work:
table = dbf.Table('./path/to/file.dbf')
# Process will open and close a table if not already opened
for record in dbf.Process(table):
record.izo = sys.argv[2]
or
with dbf.Table('./path/to/file.dbf')
# with opens a table, closes when done
for record in table:
with record:
record.izo = sys.argv[2]
I have been trying to make a change to my dbf file for several days and searched and browsed several websites, this page was the only one that gave me a solution that worked. Just to add a little more information so that whoever lands here would understand the above piece of code that Ethan Furman shared.
import dbf
table = dbf.Table('your_dbf_filename.dbf')
# Process will open and close a table if not already opened
for record in dbf.Process(table):
record.your_column_name = 'New_Value_of_that_column'
Now, because you don't have a condition mentioned here, you would end you updating all the rows of your column. Remember, this statement will immediately reflect the new value in that column. So, the advice is to save a copy of this dbf file before making any edits to it.
I also tried the 2nd solution that Ethan mentions, but it throws an error that 'table' not defined.
I have the following lines as part of Python code when working with .db SQLite file:
sql = "SELECT * FROM calculations"
cursor.execute(sql)
results = cursor.fetchall()
where "calculations" is a table I previously created during the execution of my code. When I do
print results
I see
[(1,3.56,7,0.3), (7,0.4,18,1.45), (11,23.18,2,4.44)]
what I need to do is save this output as another .db file named "output_YYYY_MM_DD_HH_MM_SS.db" using the module "datetime" so that when I connect to "output_YYYY_MM_DD_HH_MM_SS.db" and select all I would see an output exactly equal the list above.
Any ideas on how to do this?
Many thanks in advance.
If I remind well, sqlite3 creates a database with connect() if the database does not exist in the directory of the Python script:
"""
1. connect to the database assigning the name you want (use ``datetime`` time-to-string method)
2. execute multiple inserts on the new db to dump the list you have
3. close connection
"""
Feel free to ask if something is unclear.
I need some help with my weather station. I would like to save all results into mysql database, but at the moment i've got all results in txt files.
Can you help me to write a script in python, to read from txt file and save into mysql?
My txt file (temperature.txt) contains data and temperature. It looks like:
2013-09-29 13:24 22.60
I'm using python script to get temperature and current time from big "result.txt" file:
#!/usr/bin/python
import time
buffer = bytes()
fh = open("/home/style/pomiar/result.txt")
for line in fh:
pass
last = line
items = last.strip().split()
fh.close();
print time.strftime("%Y-%m-%d %H:%M"), items[1]
But I would like to "print" that into mysql table. I know how to connect, but I dont know how to save data into table.
I know I need to use:
#!/usr/bin/python
import MySQLdb
# Open database connection
db = MySQLdb.connect("localhost","user","password","weather" )
And I've got my database "weather" with table "temperature". Dont know if I made good table (first - datatime, second varchar (5)). And now I need python script to read from this file and save into mysql.
Thanks a lot for ur support.
Next step is simple:
from contextlib import closing
with closing(self.db.cursor()) as cur:
cur.execute("INSERT INTO table1(`measured_at`,`temp`) VALUES(%s, %s)", (measured_at, temp))
self.db.commit()
P.S. It looks like you ask this question because you didn't make your homework and didn't read via ANY python tutorial how to work with MySQL.