CSV to Postgres database: "Error: extra data after last expected column" - python

I am trying to convert a .csv file to a postgres database. I have set up the database with the appropriate number of columns to match the .csv file. I have also taken care to strip all "," (comma characters) from the .csv file.
Here is my command I am typing into psql:
COPY newtable FROM 'path/to/file.csv' CSV HEADER;
I have tried everything I can think of. Any idea how to fix this?

Related

How to copy from CSV file to PostgreSQL table with headers (including special characters) in CSV file?

I have 500 different CSV files in a folder.
I want to take a CSV file and import to Postgres table.
There are unknown number of columns in csv so I do not want to keep opening CSV file, then create table and then import using \copy
I know I can do this:
COPY users FROM 'user_data.csv' DELIMITER ';' CSV HEADER
However, the CSV file is something like:
user_id,5username,pas$.word
1,test,pass
2,test2,query
I have to convert this to postgres, but postgres does not allow column name to start with number or special character like . and $ in the column name.
I want the postgres table to look something like:
user_id ___5username pas______word
1 test pass
2 test2 query
I want to replace special characters with ___ and if column name starts with number, then prefix with ___.
Is there a way to do this? I am open to a Python or Postgres solution.
If pandas is an option for you, try to:
Create data frames from the CSV files using .read_csv()
Save the created data frames into SQL database with .to_sql()
You can also see my tutorial on pandas IO API.

How to export table schema into a csv file

Basically, I want to export a hive table's schema into a csv file. I can create a datframe and then show its schema but I want to write its schema to a csv file. Seems pretty simple but it wont work.
Incase you wanna do it within Hive console. This is how you do it
hive>
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/user1/file1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
SELECT * from tablename
And then in Unix
[user1]$
cat file1/* > file1.csv
zip file1 file1.csv

How to fix upload csv file in bigquery using python

while uploading csv file on BigQuery through storage , I am getting below error:
CSV table encountered too many errors, giving up. Rows: 5; errors: 1. Please look into the error stream for more details.
In schema , I am using all parameter as string.
In csv file,I have below data:
It's Time. Say "I Do" in my style.
I am not able upload csv file in BigQuery containing above sentence
Does the CSV file have the exact same structure of the dataset schema? Both must match for the upload to be successful.
If your CSV file has only one sentence in the first row of the first column, then your schema must have a table with exactly one field as STRING. If there is content in the second column of the CSV, the schema must then have a second field for it, etc. Conversely, if your scheman has say 2 fields set as STRING, there must be data in first two columns in the CSV.
Data location must also match, if your BigQuery dataset is in US, then your Cloud Storage bucket must be in US too for the upload to work.
Check here for details of uploading CSV into BigQuery.
Thanks to all for a response.
Here is my solution to this problem:
with open('/path/to/csv/file', 'r') as f:
text = f.read()
converted_text = text.replace('"',"'") print(converted_text)
with open('/path/to/csv/file', 'w') as f:
f.write(converted_text)

Importing Large csv (175 GB) to MySQL Server with Unusual Delimeters

I have a 175 GB csv that I am trying to pull into MySQL.
The table is set up and formatted.
The problem is, the csv uses unorthodox delimeters and line seperators (both are 3 character strings, #%# and #^#).
After a lot of trial and error I was able to get the process to start in HeidiSQL, but it would freeze up and never actually populate any data.
I would ideally like to use Python, but the parser only accepts 1-character line separators, making this tricky.
Does anyone have any tips on getting this to work?
MySQL LOAD DATA statement will process a csv file with multiple character delimiters
https://dev.mysql.com/doc/refman/5.7/en/load-data.html
I'd expect something like this:
LOAD DATA LOCAL INFILE '/dir/my_wonky.csv'
INTO TABLE my_table
FIELDS TERMINATED BY '#%#'
LINES TERMINATED BY '#^#'
( col1
, col2
, col3
)
I'd use a very small subset of the .csv file and do the load into a test table, just to get it working, make necessary adjustments, verify the results.
I would also want to break up the load into more manageable chunks, and avoid blowing out rollback space in the ibdata1 file. I would use something like pt-fifo-split (part of the Percona toolkit) to break the file up into a series of separate loads, but unfortunately, pt-fifo-split doesn't provide a way to specify the line delimiter character(s). To make use of that, we'd have to pre-process the file, to replace existing new line characters, and replace the line delimiter #^# with new line characters.
(If I had to load the whole file in a single shot, I'd do that into a MyISAM table, and not an InnoDB table, as a staging table. And I'd have a separate process that copied rows (in reasonably sized chunks) from the MyISAM staging table into the InnoDB table.)

Reading a SQLite file using Python

I am working on an assignment where in we were provided a bunch of csv files to work on and extract information . I have succesfuly completed that part. As a bonus question we have 1 SQlite file with a .db extension . I wanted to know if any module exists to convert such files to .csv or to read them directly ?
In case such a method doesnt exist , ill probably insert the file into a database and use the python sqlite3 module to extract the data I need.
You can use the sqlite commandline tool to dump table data to CSV.
To export an SQLite table (or part of a table) as CSV, simply set the "mode" to "csv" and then run a query to extract the desired rows of the table.
sqlite> .header on
sqlite> .mode csv
sqlite> .once c:/work/dataout.csv
sqlite> SELECT * FROM tab1;
In the example above, the ".header on" line causes column labels to be
printed as the first row of output. This means that the first row of
the resulting CSV file will contain column labels. If column labels
are not desired, set ".header off" instead. (The ".header off" setting
is the default and can be omitted if the headers have not been
previously turned on.)
The line ".once FILENAME" causes all query output to go into the named
file instead of being printed on the console. In the example above,
that line causes the CSV content to be written into a file named
"C:/work/dataout.csv".
http://www.sqlite.org/cli.html

Categories