python Postgresql: Ignoring the last column from csv file - python

I have problem with importing a CSV file. I am using postgresql's COPY FROM command to copy a CSV file into a 2-column table.
I have a CSV file in the following format;
"1";"A"
"2";"B"
"3";"C";"CAD450"
"4";"D";"ABX123"
I want to import all these lines of the CSV file into the table but I want to skip any extra added columns.
Currently I am skipping any lines that contain extra columns, for example here columns "1";"C";"CAD450" and "1";"D";"ABX123" are skipped and I am importing only the first two columns. But I want to copy all these four lines into my table. So is there any way where I can ignore the last column and copy all the four lines into my table, like this
"1";"A"
"1";"B"
"1";"C"
"1";"D"

Preprocess the file with awk to strip the extra columns:
awk -F';' '{print $1 ";" $2 }' > new_file.csv

Piping it through cut or awk (as suggested above) is easier than using python/psycopg.
cat csv_file.csv | cut -d';' -f1,2 | psql -u USER DATABASE -c "COPY table FROM STDIN WITH DELIMITER ';';"

with open("file.csv","r") as f:
t=[line.strip().split(";")[:2] for line in f]

Myriad ways to handle the problem.
I'd probably do something like this:
import csv
import psycopg2
dr = csv.DictReader(open('test.csv','rb'),
delimiter=';',
quotechar='"',
fieldnames=['col1','col2']) # need not specify other cols
CONNSTR = """
host=127.0.0.1
dbname=mydb
user=me
password=pw
port=5432"""
cxn = psycopg2.connect(CONNSTR)
cur = cxn.cursor()
cur.execute("""CREATE TABLE from_csv (
id serial NOT NULL,
col1 character varying,
col2 character varying,
CONSTRAINT from_csv_pkey PRIMARY KEY (id));""")
cur.executemany("""INSERT INTO from_csv (col1,col2)
VALUES (%(col1)s,%(col2)s);""", dr)
cxn.commit()

Related

Other way than splitting with coma to store in a database?

I started created a database with postgresql and I am currently facing a problem when I want to copy the data from my csv file to my database
Here is my code:
connexion = psycopg2.connect(dbname= "db_test" , user = "postgres", password ="passepasse" )
connexion.autocommit = True
cursor = connexion.cursor()
cursor.execute("""CREATE TABLE vocabulary(
fname integer PRIMARY KEY,
label text,
mids text
)""")
with open (r'C:\mypathtocsvfile.csv', 'r') as f:
next(f) # skip the header row
cursor.copy_from(f, 'vocabulary', sep=',')
connexion.commit()
I asked to allocate 4 column to store my csv data, the problem is that datas in my csv are stored like this:
fname,labels,mids,split
64760,"Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music","/m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf",train
16399,"Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music","/m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf",train
...
There is comas inside my columns label and mids, thats why i get the following error:
BadCopyFileFormat: ERROR: additional data after the last expected column
Which alternativ should I use to copy data from this csv file?
ty
if the file is small, then the easiest way is to open the file in LibreOffice and save the file with a new separetor. 
I usually use ^. 
If the file is large, write a script to replace ," and "," on ^" and "^", respectively.
COPY supports csv as a format, which already does what you want. But to access it via psycopg2, I think you will need to use copy_expert rather than copy_from.
cursor.copy_expert('copy vocabulary from stdin with csv', f)

How to copy from CSV file to PostgreSQL table with headers (including special characters) in CSV file?

I have 500 different CSV files in a folder.
I want to take a CSV file and import to Postgres table.
There are unknown number of columns in csv so I do not want to keep opening CSV file, then create table and then import using \copy
I know I can do this:
COPY users FROM 'user_data.csv' DELIMITER ';' CSV HEADER
However, the CSV file is something like:
user_id,5username,pas$.word
1,test,pass
2,test2,query
I have to convert this to postgres, but postgres does not allow column name to start with number or special character like . and $ in the column name.
I want the postgres table to look something like:
user_id ___5username pas______word
1 test pass
2 test2 query
I want to replace special characters with ___ and if column name starts with number, then prefix with ___.
Is there a way to do this? I am open to a Python or Postgres solution.
If pandas is an option for you, try to:
Create data frames from the CSV files using .read_csv()
Save the created data frames into SQL database with .to_sql()
You can also see my tutorial on pandas IO API.

How to export table schema into a csv file

Basically, I want to export a hive table's schema into a csv file. I can create a datframe and then show its schema but I want to write its schema to a csv file. Seems pretty simple but it wont work.
Incase you wanna do it within Hive console. This is how you do it
hive>
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/user1/file1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
SELECT * from tablename
And then in Unix
[user1]$
cat file1/* > file1.csv
zip file1 file1.csv

Insert data from files into SQLite database

I have an SQLite database on this form:
Table1
Column1 | Column 2 | Column 3 | Column 4
I want to populate this database with data stored in some hundred .out files in this form, where every file has millions of rows:
value1;value2;value3;value4;
2value1;2value2;2value3;2value4;
... etc
Is there a fast way to populate the database with these data? One way would be to read in the data line for line in python and insert, however there probably should be a faster way to just input the whole file?
Bash, SQLite, Python preferrably
SQLite has a .import command.
.import FILE TABLE Import data from FILE into TABLE
You can use it like this (shell).
for f in *.out
do
sqlite3 -separator ';' my.db ".import $f Table1"
done

Reading and Storing csv data line by line in a postgres

I want to copy csv data from different files and then store in a table. But the problem is, the number of column differes in each csv files, So some csv file have 3 columns while some have 4. So if there are 4 columns in a file, I want to simply ignore the fourth column and save only first three.
Using following code, I can copy data into the table, if there are only 3 columns,
CREATE TABLE ImportCSVTable (
name varchar(100),
address varchar(100),
phone varchar(100));
COPY ImportCSVTable (name , address , phone)
FROM 'path'
WITH DELIMITER ';' CSV QUOTE '"';
But I am looking forward to check each row individually and then store it in the table.
Thank you.
Since you want to read and store it one line at a time, the Python csv module should make it easy to read the first 3 columns from your CSV file regardless of any extra columns.
You can construct an INSERT statement and execute it with your preferred Python-PostGreSQL module. I have used pyPgSQL in the past; don't know what's current now.
#!/usr/bin/env python
import csv
filesource = 'PeopleAndResources.csv'
with open(filesource, 'rb') as f:
reader = csv.reader(f, delimiter=';', quotechar='"')
for row in reader:
statement = "INSERT INTO ImportCSVTable " + \
"(name, address, phone) " + \
"VALUES ('%s', '%s', '%s')" % (tuple(row[0:3]))
#execute statement
Use a text utility to chop off the fourth column. That way, all your input files will have three columns. Some combination of awk, cut, and sed should take care of it for you, but it depends on what your columns look like.
You can also just make your input table have a fourth column that is nullable, then after the import drop the extra column.

Categories