I am trying to convert a .csv file I've download into a .db so that I can analyze it in DBreaver with SQLite3.
I'm using Anaconda Prompt and python within it.
Can anyone point out where I'm mistaken?
import pandas as pd
import sqlite 3
df = pd.read_csv('0117002-eng.csv')
df.to_sql('health', conn)
And I just haven't been able to figure out how to set up conn appropriately. All the guides I've read have you do something like:
conn = sqlite3.connect("file.db")
But, as I mentioned I have only the csv file. And when I did try to do that, it also doesn't work.
Related
I have a python script to insert a csv file into mongodb collection
import pymongo
import pandas as pd
import json
client = pymongo.MongoClient("mongodb://localhost:27017")
df = pd.read_csv("iris.csv")
data = df.to_dict(oreint = "records")
db = client["Database name"]
db.CollectionName.insert_many(data)
Here all the columns of csv files are getting inserted into mongo collection. How can I achieve a usecase where I want to insert only specific columns of csv file in the mongo collection .
What changes I can make to existing code.
Lets say I also have database already created in my Mongo. Will this command work even if the database is present (db = client["Database name"])
Have you checked out pymongoarrow? the latest release has write support where you can import a csv file into mongodb. Here are the release notes and documentation. You can also use mongoimport to import a csv file, documentation is here, but I can't see any way to exclude fields like the way you can with pymongoarrow.
I want to connect Oracle Db via python and take query results data and create excel or csv reports by using these data. I never tried before and did not see anyone who did something like this around me, do you have any recommendations or ideas for that case?
Regards
You can connect Oracle db with python cx_Oracle library using syntax below for connection string. You should be aware that your connection_oracle_textfile.txt file and your .py file which had your python code must be in the samefolder for start.
connection_oracle_textfile.txt -> username/password#HOST:PORT/SERVICE_NAME(you can find all of them but username and password in tnsnames.ora file)
import cx_Oracle as cx_Oracle
import pandas as pd
def get_oracle_table_from_dbm(sql_text):
if 'connection_oracle' not in globals():
print('connection does not exist. Try to connect it...')
f = open('connection_oracle_textfile.txt', "r")
fx = f.read()
####
global connection_oracle
connection_oracle = cx_Oracle.connect(fx)
####
print('connection established!!')
print('Already have connection. Just fetch data!!')
return pd.read_sql(sql_text, con=connection_oracle)
df=get_oracle_table_from_dbm('select * from dual')
There are other stackoverflow answers to this, e.g. How to export a table to csv or excel format. Remember to tune cursor.arraysize.
You don't strictly need the pandas library for to create csv files, though you may want it for future data analysis.
The cx_Oracle documentation discussions installation, connection, and querying, amongst other topics.
If you want to read from a CSV file, see Loading CSV Files into Oracle Database.
I have a sqlite database that is populated with values from csv files. I would like to create a script that when run:
deletes the old tables
creates new tables with the same schema (with newly updated values)
I noticed that sqlite script files don't accept ".mode csv" or .import "csv". Is there a way to automate this is with a script of some sort?
If you want a Python approach, you can use to_sql method from the pandas package to write to SQLite. Pandas can replace existing tables and automatically generate the schema based on the CSV file read.
import sqlite3
import pandas as pd
conn = sqlite3.connect('my.db')
# read the csv file
df = pd.read_csv("my.csv")
# write to SQLite
df.to_sql("my_tbl", conn, if_exists="replace")
conn.close()
I have a multi-million record SQL table that I'm planning to write out to many parquet files in a folder, using the pyarrow library. The data content seems too large to store in a single parquet file.
However, I can't seem to find an API or parameter with the pyarrow library that allows me to specify something like:
file_scheme="hive"
As is supported by the fastparquet python library.
Here's my sample code:
#!/usr/bin/python
import pyodbc
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
conn_str = 'UID=username;PWD=passwordHere;' +
'DRIVER=FreeTDS;SERVERNAME=myConfig;DATABASE=myDB'
#----> Query the SQL database into a Pandas dataframe
conn = pyodbc.connect( conn_str, autocommit=False)
sql = "SELECT * FROM ClientAccount (NOLOCK)"
df = pd.io.sql.read_sql(sql, conn)
#----> Convert the dataframe to a pyarrow table and write it out
table = pa.Table.from_pandas(df)
pq.write_table(table, './clients/' )
This throws an error:
File "/usr/local/lib/python2.7/dist-packages/pyarrow/parquet.py", line 912, in write_table
os.remove(where)
OSError: [Errno 21] Is a directory: './clients/'
If I replace that last line with the following, it works fine but writes only one big file:
pq.write_table(table, './clients.parquet' )
Any ideas how I can do the multi-file output thing with pyarrow?
Try pyarrow.parquet.write_to_dataset https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L938.
I opened https://issues.apache.org/jira/browse/ARROW-1858 about adding some more documentation about this.
I recommend seeking support for Apache Arrow on the mailing list dev#arrow.apache.org. Thanks!
I have an sql file locally stored in my PC. I want to open and read it using the pandas library. Here it iswhat I have tried:
import pandas as pd
import sqlite3
my_file = 'C:\Users\me\Downloads\\database.sql'
#I am creating an empty database
conn = sqlite3.connect(r'C:\Users\test\Downloads\test.db')
#I am reading my file
df = pd.read_sql(my_file, conn)
However, I am receiving the following error:
DatabaseError: Execution failed on sql 'C:\Users\me\Downloads\database.sql': near "C": syntax error
Try moving the file to D://
Sometimes Python is not granted access to read/write in C.
Hence may be that is an issue.
You can also try alternative method using cursors.
cur=conn.cursor()
r=cur.fetchall()
This r would contain a tuple of your dataset.