I am working on an assignment where in we were provided a bunch of csv files to work on and extract information . I have succesfuly completed that part. As a bonus question we have 1 SQlite file with a .db extension . I wanted to know if any module exists to convert such files to .csv or to read them directly ?
In case such a method doesnt exist , ill probably insert the file into a database and use the python sqlite3 module to extract the data I need.
You can use the sqlite commandline tool to dump table data to CSV.
To export an SQLite table (or part of a table) as CSV, simply set the "mode" to "csv" and then run a query to extract the desired rows of the table.
sqlite> .header on
sqlite> .mode csv
sqlite> .once c:/work/dataout.csv
sqlite> SELECT * FROM tab1;
In the example above, the ".header on" line causes column labels to be
printed as the first row of output. This means that the first row of
the resulting CSV file will contain column labels. If column labels
are not desired, set ".header off" instead. (The ".header off" setting
is the default and can be omitted if the headers have not been
previously turned on.)
The line ".once FILENAME" causes all query output to go into the named
file instead of being printed on the console. In the example above,
that line causes the CSV content to be written into a file named
"C:/work/dataout.csv".
http://www.sqlite.org/cli.html
Related
I have 500 different CSV files in a folder.
I want to take a CSV file and import to Postgres table.
There are unknown number of columns in csv so I do not want to keep opening CSV file, then create table and then import using \copy
I know I can do this:
COPY users FROM 'user_data.csv' DELIMITER ';' CSV HEADER
However, the CSV file is something like:
user_id,5username,pas$.word
1,test,pass
2,test2,query
I have to convert this to postgres, but postgres does not allow column name to start with number or special character like . and $ in the column name.
I want the postgres table to look something like:
user_id ___5username pas______word
1 test pass
2 test2 query
I want to replace special characters with ___ and if column name starts with number, then prefix with ___.
Is there a way to do this? I am open to a Python or Postgres solution.
If pandas is an option for you, try to:
Create data frames from the CSV files using .read_csv()
Save the created data frames into SQL database with .to_sql()
You can also see my tutorial on pandas IO API.
Basically, I want to export a hive table's schema into a csv file. I can create a datframe and then show its schema but I want to write its schema to a csv file. Seems pretty simple but it wont work.
Incase you wanna do it within Hive console. This is how you do it
hive>
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/user1/file1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
SELECT * from tablename
And then in Unix
[user1]$
cat file1/* > file1.csv
zip file1 file1.csv
I have a 175 GB csv that I am trying to pull into MySQL.
The table is set up and formatted.
The problem is, the csv uses unorthodox delimeters and line seperators (both are 3 character strings, #%# and #^#).
After a lot of trial and error I was able to get the process to start in HeidiSQL, but it would freeze up and never actually populate any data.
I would ideally like to use Python, but the parser only accepts 1-character line separators, making this tricky.
Does anyone have any tips on getting this to work?
MySQL LOAD DATA statement will process a csv file with multiple character delimiters
https://dev.mysql.com/doc/refman/5.7/en/load-data.html
I'd expect something like this:
LOAD DATA LOCAL INFILE '/dir/my_wonky.csv'
INTO TABLE my_table
FIELDS TERMINATED BY '#%#'
LINES TERMINATED BY '#^#'
( col1
, col2
, col3
)
I'd use a very small subset of the .csv file and do the load into a test table, just to get it working, make necessary adjustments, verify the results.
I would also want to break up the load into more manageable chunks, and avoid blowing out rollback space in the ibdata1 file. I would use something like pt-fifo-split (part of the Percona toolkit) to break the file up into a series of separate loads, but unfortunately, pt-fifo-split doesn't provide a way to specify the line delimiter character(s). To make use of that, we'd have to pre-process the file, to replace existing new line characters, and replace the line delimiter #^# with new line characters.
(If I had to load the whole file in a single shot, I'd do that into a MyISAM table, and not an InnoDB table, as a staging table. And I'd have a separate process that copied rows (in reasonably sized chunks) from the MyISAM staging table into the InnoDB table.)
I would like to execute this query:
select datetime(date/1000,'unixepoch','localtime') as DATE, address as RECEIVED, body as BODY from sms;
And save it's output to a .csv file in a specified directory. Usually in Ubuntu terminal it is far more easy to manually give commands to save the output of the above query to a file. But i am not familiar with Python-sqlite3. I would like to know how do i execute this query and save it's output to custom directory in a .csv file. Please help me out !
Quick and dirty:
import sqlite
db = sqlite.connect('database_file')
cursor = db.cursor()
cursor.execute("SELECT ...")
rows = cursor.fetchall()
# Itereate rows and write your CSV
cursor.close()
db.close()
Rows will be a list with all matching records, which you can then iterate and manipulate into your csv file.
If you just want to make a csv file, look at the csv module. The following page should get you going https://docs.python.org/2/library/csv.html
You can also look at the pandas module to help create the file.
I am trying to convert a .csv file to a postgres database. I have set up the database with the appropriate number of columns to match the .csv file. I have also taken care to strip all "," (comma characters) from the .csv file.
Here is my command I am typing into psql:
COPY newtable FROM 'path/to/file.csv' CSV HEADER;
I have tried everything I can think of. Any idea how to fix this?