I started created a database with postgresql and I am currently facing a problem when I want to copy the data from my csv file to my database
Here is my code:
connexion = psycopg2.connect(dbname= "db_test" , user = "postgres", password ="passepasse" )
connexion.autocommit = True
cursor = connexion.cursor()
cursor.execute("""CREATE TABLE vocabulary(
fname integer PRIMARY KEY,
label text,
mids text
)""")
with open (r'C:\mypathtocsvfile.csv', 'r') as f:
next(f) # skip the header row
cursor.copy_from(f, 'vocabulary', sep=',')
connexion.commit()
I asked to allocate 4 column to store my csv data, the problem is that datas in my csv are stored like this:
fname,labels,mids,split
64760,"Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music","/m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf",train
16399,"Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music","/m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf",train
...
There is comas inside my columns label and mids, thats why i get the following error:
BadCopyFileFormat: ERROR: additional data after the last expected column
Which alternativ should I use to copy data from this csv file?
ty
if the file is small, then the easiest way is to open the file in LibreOffice and save the file with a new separetor.
I usually use ^.
If the file is large, write a script to replace ," and "," on ^" and "^", respectively.
COPY supports csv as a format, which already does what you want. But to access it via psycopg2, I think you will need to use copy_expert rather than copy_from.
cursor.copy_expert('copy vocabulary from stdin with csv', f)
Related
I have some CSV files being exported from an SQL database and transferred to me daily for me to import into my SQL server. The files all have a "title" line in them with 27 characters, the business name and date. I.e. "busname: 08-31-2020". I need a script that can remove those first 27 characters so they aren't imported into the database.
Is this possible? I can't find anything that will let me select a specific number of characters at the beginning of the file.
If you value is in the column 1 you can use str[27:] to get all the str after the given value.
import csv
with open('file.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
process_str = row[1][27:]
You can after create a new file using this processed string
I am trying to read a file from GCS and store it in variable and create a postgresql table from it. I can connect to GCs from my code and be able to store data in variabe with this:
result = blob.download_as_string()
result = result.decode('utf8').strip()
which the printed result are in the correct format. Then, I tried to insert data from this variable into the table which I did:
sql = "COPY tablename FROM STDIN WITH DELIMITER ',' NULL AS '\\N' CSV HEADER;"
cursor.copy_expert(sql, result)
and I got this error:
file must be a readable file-like object for COPY FROM; a writable file-like object for COPY TO
I also tried with another function:
cursor.copy_from(result, 'table' ,sep=',')
but I got this result:
argument 1 must have a .read() method
So, according to my question how can I put the data in the variable to the table I created. Or do I have to download it to my local and use this:
sql = "COPY tablename from STDIN WITH DELIMITER ',' CSV HEADER"
with open('/path/to/csv' , 'r+') as file:
cursor.copy_expert(sql, file)
This one is worked but I don't want to download it to my local. I just want to read it and insert it into my table
Thank you
In my use case, I have a csv stored as a string and I want to load it into a MySQL table. Is there a better way than saving the string as a file, use LOAD DATA INFILE, and then deleting the file? I find this answer but it's for JDBC and I haven't find a Python equivalent to it.
Yes what you describe is very possible! Say, for example, that your csv file has three columns:
import MySQLdb
conn = MySQLdb.connect('your_connection_string')
cur = conn.cursor()
with open('yourfile.csv','rb') as fin:
for row in fin:
cur.execute('insert into yourtable (col1,col2,col3) values (%s,%s,%s)',row)
cur.close(); conn.close()
I want to copy csv data from different files and then store in a table. But the problem is, the number of column differes in each csv files, So some csv file have 3 columns while some have 4. So if there are 4 columns in a file, I want to simply ignore the fourth column and save only first three.
Using following code, I can copy data into the table, if there are only 3 columns,
CREATE TABLE ImportCSVTable (
name varchar(100),
address varchar(100),
phone varchar(100));
COPY ImportCSVTable (name , address , phone)
FROM 'path'
WITH DELIMITER ';' CSV QUOTE '"';
But I am looking forward to check each row individually and then store it in the table.
Thank you.
Since you want to read and store it one line at a time, the Python csv module should make it easy to read the first 3 columns from your CSV file regardless of any extra columns.
You can construct an INSERT statement and execute it with your preferred Python-PostGreSQL module. I have used pyPgSQL in the past; don't know what's current now.
#!/usr/bin/env python
import csv
filesource = 'PeopleAndResources.csv'
with open(filesource, 'rb') as f:
reader = csv.reader(f, delimiter=';', quotechar='"')
for row in reader:
statement = "INSERT INTO ImportCSVTable " + \
"(name, address, phone) " + \
"VALUES ('%s', '%s', '%s')" % (tuple(row[0:3]))
#execute statement
Use a text utility to chop off the fourth column. That way, all your input files will have three columns. Some combination of awk, cut, and sed should take care of it for you, but it depends on what your columns look like.
You can also just make your input table have a fourth column that is nullable, then after the import drop the extra column.
I'm reading a 6 million entry .csv file with Python, and I want to be able to search through this file for a particular entry.
Are there any tricks to search the entire file? Should you read the whole thing into a dictionary or should you perform a search every time? I tried loading it into a dictionary but that took ages so I'm currently searching through the whole file every time which seems wasteful.
Could I possibly utilize that the list is alphabetically ordered? (e.g. if the search word starts with "b" I only search from the line that includes the first word beginning with "b" to the line that includes the last word beginning with "b")
I'm using import csv.
(a side question: it is possible to make csv go to a specific line in the file? I want to make the program start at a random line)
Edit: I already have a copy of the list as an .sql file as well, how could I implement that into Python?
If the csv file isn't changing, load in it into a database, where searching is fast and easy. If you're not familiar with SQL, you'll need to brush up on that though.
Here is a rough example of inserting from a csv into a sqlite table. Example csv is ';' delimited, and has 2 columns.
import csv
import sqlite3
con = sqlite3.Connection('newdb.sqlite')
cur = con.cursor()
cur.execute('CREATE TABLE "stuff" ("one" varchar(12), "two" varchar(12));')
f = open('stuff.csv')
csv_reader = csv.reader(f, delimiter=';')
cur.executemany('INSERT INTO stuff VALUES (?, ?)', csv_reader)
cur.close()
con.commit()
con.close()
f.close()
you can use memory mapping for really big files
import mmap,os,re
reportFile = open( "big_file" )
length = os.fstat( reportFile.fileno() ).st_size
try:
mapping = mmap.mmap( reportFile.fileno(), length, mmap.MAP_PRIVATE, mmap.PROT_READ )
except AttributeError:
mapping = mmap.mmap( reportFile.fileno(), 0, None, mmap.ACCESS_READ )
data = mapping.read(length)
pat =re.compile("b.+",re.M|re.DOTALL) # compile your pattern here.
print pat.findall(data)
Well, if your words aren't too big (meaning they'll fit in memory), then here is a simple way to do this (I'm assuming that they are all words).
from bisect import bisect_left
f = open('myfile.csv')
words = []
for line in f:
words.extend(line.strip().split(','))
wordtofind = 'bacon'
ind = bisect_left(words,wordtofind)
if words[ind] == wordtofind:
print '%s was found!' % wordtofind
It might take a minute to load in all of the values from the file. This uses binary search to find your words. In this case I was looking for bacon (who wouldn't look for bacon?). If there are repeated values you also might want to use bisect_right to find the the index of 1 beyond the rightmost element that equals the value you are searching for. You can still use this if you have key:value pairs. You'll just have to make each object in your words list be a list of [key, value].
Side Note
I don't think that you can really go from line to line in a csv file very easily. You see, these files are basically just long strings with \n characters that indicate new lines.
You can't go directly to a specific line in the file because lines are variable-length, so the only way to know when line #n starts is to search for the first n newlines. And it's not enough to just look for '\n' characters because CSV allows newlines in table cells, so you really do have to parse the file anyway.
my idea is to use python zodb module to store dictionaty type data and then create new csv file using that data structure. do all your operation at that time.
There is a fairly simple way to do this.Depending on how many columns you want python to print then you may need to add or remove some of the print lines.
import csv
search=input('Enter string to search: ')
stock=open ('FileName.csv', 'wb')
reader=csv.reader(FileName)
for row in reader:
for field in row:
if field==code:
print('Record found! \n')
print(row[0])
print(row[1])
print(row[2])
I hope this managed to help.