I have an SQLite database on this form:
Table1
Column1 | Column 2 | Column 3 | Column 4
I want to populate this database with data stored in some hundred .out files in this form, where every file has millions of rows:
value1;value2;value3;value4;
2value1;2value2;2value3;2value4;
... etc
Is there a fast way to populate the database with these data? One way would be to read in the data line for line in python and insert, however there probably should be a faster way to just input the whole file?
Bash, SQLite, Python preferrably
SQLite has a .import command.
.import FILE TABLE Import data from FILE into TABLE
You can use it like this (shell).
for f in *.out
do
sqlite3 -separator ';' my.db ".import $f Table1"
done
Related
I have 500 different CSV files in a folder.
I want to take a CSV file and import to Postgres table.
There are unknown number of columns in csv so I do not want to keep opening CSV file, then create table and then import using \copy
I know I can do this:
COPY users FROM 'user_data.csv' DELIMITER ';' CSV HEADER
However, the CSV file is something like:
user_id,5username,pas$.word
1,test,pass
2,test2,query
I have to convert this to postgres, but postgres does not allow column name to start with number or special character like . and $ in the column name.
I want the postgres table to look something like:
user_id ___5username pas______word
1 test pass
2 test2 query
I want to replace special characters with ___ and if column name starts with number, then prefix with ___.
Is there a way to do this? I am open to a Python or Postgres solution.
If pandas is an option for you, try to:
Create data frames from the CSV files using .read_csv()
Save the created data frames into SQL database with .to_sql()
You can also see my tutorial on pandas IO API.
Basically, I want to export a hive table's schema into a csv file. I can create a datframe and then show its schema but I want to write its schema to a csv file. Seems pretty simple but it wont work.
Incase you wanna do it within Hive console. This is how you do it
hive>
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/user1/file1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
SELECT * from tablename
And then in Unix
[user1]$
cat file1/* > file1.csv
zip file1 file1.csv
I have a very short csv file with 3 lines:
45,55,45
45,12,54
45,45,48
I want to copy it in a table of 3 columns on my postgres using python.. I create the table of 3 columns (type: character varying) which worked well. Then I open the file and try to copy it as follows:
f = open('/path/to/file.csv')
cur.copy_from(f,'table_name',sep=",")
f.close()
conn.commit()
conn.close()
Where cur and conn are already defined as cursor and conenction.
I always get this error (also when try it with another csv file):
psycopg2.DataError: missing data for column "b"
CONTEXT: COPY table_name, line 4
What I don't get is that there can't be a missed data because there are in all lines 3 data and I just have 3 columns and the error tells that it's in line 4 .. but I only have 3 lines in my csv file !
HELP :D
I would like to execute this query:
select datetime(date/1000,'unixepoch','localtime') as DATE, address as RECEIVED, body as BODY from sms;
And save it's output to a .csv file in a specified directory. Usually in Ubuntu terminal it is far more easy to manually give commands to save the output of the above query to a file. But i am not familiar with Python-sqlite3. I would like to know how do i execute this query and save it's output to custom directory in a .csv file. Please help me out !
Quick and dirty:
import sqlite
db = sqlite.connect('database_file')
cursor = db.cursor()
cursor.execute("SELECT ...")
rows = cursor.fetchall()
# Itereate rows and write your CSV
cursor.close()
db.close()
Rows will be a list with all matching records, which you can then iterate and manipulate into your csv file.
If you just want to make a csv file, look at the csv module. The following page should get you going https://docs.python.org/2/library/csv.html
You can also look at the pandas module to help create the file.
I have problem with importing a CSV file. I am using postgresql's COPY FROM command to copy a CSV file into a 2-column table.
I have a CSV file in the following format;
"1";"A"
"2";"B"
"3";"C";"CAD450"
"4";"D";"ABX123"
I want to import all these lines of the CSV file into the table but I want to skip any extra added columns.
Currently I am skipping any lines that contain extra columns, for example here columns "1";"C";"CAD450" and "1";"D";"ABX123" are skipped and I am importing only the first two columns. But I want to copy all these four lines into my table. So is there any way where I can ignore the last column and copy all the four lines into my table, like this
"1";"A"
"1";"B"
"1";"C"
"1";"D"
Preprocess the file with awk to strip the extra columns:
awk -F';' '{print $1 ";" $2 }' > new_file.csv
Piping it through cut or awk (as suggested above) is easier than using python/psycopg.
cat csv_file.csv | cut -d';' -f1,2 | psql -u USER DATABASE -c "COPY table FROM STDIN WITH DELIMITER ';';"
with open("file.csv","r") as f:
t=[line.strip().split(";")[:2] for line in f]
Myriad ways to handle the problem.
I'd probably do something like this:
import csv
import psycopg2
dr = csv.DictReader(open('test.csv','rb'),
delimiter=';',
quotechar='"',
fieldnames=['col1','col2']) # need not specify other cols
CONNSTR = """
host=127.0.0.1
dbname=mydb
user=me
password=pw
port=5432"""
cxn = psycopg2.connect(CONNSTR)
cur = cxn.cursor()
cur.execute("""CREATE TABLE from_csv (
id serial NOT NULL,
col1 character varying,
col2 character varying,
CONSTRAINT from_csv_pkey PRIMARY KEY (id));""")
cur.executemany("""INSERT INTO from_csv (col1,col2)
VALUES (%(col1)s,%(col2)s);""", dr)
cxn.commit()