Using a dataframe to create a database in sql - python

How do I insert data stored in a dataframe to a database in SQL. I've been told that i should use pandas.
Here is the question:
Get data from Quandl. Store this in a dataframe. (I've done this part)
Insert data into a sqlite database. Create a database in sqlite and insert the data into a table with an appropriate schema. This can be done with pandas so there is no need to go outside of your program to do this.
Only started python coding couple of days ago, so bit of a noob to this.
What I've got so far:
import quandl
df = quandl.get("ML/AATRI", start_date="2008-01-01")
import pandas as pd
import sqlite3
Thanks!

You may like to first convert the data to .sql and then import the file in database workbench you are using. Suppose you have dataframe df, then using pandas, convert to sql.
import pandas as pd
df.to_sql('filename.sql', engine)
I hope this works for you.

Related

Pandas Dataframe (with Date as Header Column) to MySQL

I have been trying to send/export a particular set of data from an excel file to Python to MySQL.
The data from an excel file looks like the one in the screenshot shown below:
Data in Excel
After using 'iloc' and some other pandas functions i get it converted it into the one below:
Data in Python Pandas
Now the problem really is with the Dataframe header column which is a date. I want this data, when exported to MySQL to look like:
Data in MySQL
I have tried converting Date to both string or datetime.datetime etc but so far have not been able to export it to MySQL the way I want to.
Any help would be very much appreciated.
Thanks.

Create Database using Python on Jupyter Notebook

so i am building a database for a larger program and do not have much experience in this area of coding (mostly embedded system programming). My task is to import a large excel file into python. It is large so i'm assuming I must convert it to a CSV then truncate it by parsing and then partitioning and then import to avoid my computer crashing. Once the file is imported i must be able to extract/search specific information based on the column titles. There are other user interactive aspects that are simply string based so not very difficult. As for the rest, I am getting the picture but would like a more efficient and specific design. Can anyone offer me guidance on this?
An excel or csv can be read into python using pandas. The data is stored as rows and columns and is called a dataframe. To import data in such a structure, you need to import pandas first and then read the csv or excel into the dataframe structure.
import pandas as pd
df1= pd.read_csv('excelfilename.csv')
This dataframe structure is similar to tables and you can perform joining of different dataframes, grouping of data etc.
I am not sure if this is what you need, let me know if you need any further clarifications.
I would recommend actually loading it into a proper database such as Mariadb or Postgresql. This will allow you to access the data from other applications and it takes the load off of you for writing a database. You can then use a ORM if you would like to interact with the data or simply use plain SQL via python.
read the CSV
df = pd.read_csv('sample.csv')
connect to a database
conn = sqlite3.connect("Any_Database_Name.db") #if the db does not exist, this creates a Any_Database_Name.db file in the current directory
store your table in the database:
df.to_sql('Some_Table_Name', conn)
read a SQL Query out of your database and into a pandas dataframe
sql_string = 'SELECT * FROM Some_Table_Name' df = pd.read_sql(sql_string, conn)

Zeppelin: What the best way to query data with SQL and work with it?

I want to use Zeppelin to query databases. I currently see two possibilities but none of them is sufficient for me:
Configure a database connection as "interpreter", name it e.g. "sql1", use it in a paragraph, run a sql query and use the inbuilt nice plotting tools. It seems that all the tutorials and tips deal with it but then the documentation suddenly stops! But I want to do more with the data: I want to filter and process. If I want to plot it again (with other limitations), I have to do the query (that may last some seconds or minutes) again (see my other question Zeppelin SQL: reuse data of query without another interpreter or a new query)
Use spark with python, scala or similar. But the documentation seems only to load csv data, put in into a dataframe and then accesses this dataframe with sql. There is no accessing the data with sql in the first place. How do I access the sql data the best way? Can I use a already configured "interpreter" (database connection)?
You can use Zeppelin API to retrieve paragraph data:
val buffer = scala.io.Source.fromURL("http://XXXXX:9995/api/notebook/2CN2QP93H/paragraph/20170713-092810_1633770798").mkString
val df = sqlContext.read.json(sc.parallelize(buffer :: Nil)).select("body.text")
df.first.getAs[String](0)
This Spark Scala lines will retrieve the SQL query used by a paragprah. You could do same thing to get results I think.
I cannot find a solution for 1. But I have made a short solution for 2. that works within zeppelin with python (2.7), sqlalchemy (sql wrapper), mysqldb (mysql implementation) and pandas (make sure that have these packages installed, all of them are in Debian 9). I wonder why I have not found such a solution before...
%python
from sqlalchemy import create_engine
import pandas as pd
sql = "select col1, col2 from table limit 10"
df = pd.read_sql(sql,
create_engine('mysql+mysqldb://user:password#host:3306/database').connect())
z.show(df)
If you want to connect to another database like db2 or oracle, you have to use other python packages and adjust the first part in the create_engine string.

How do you upload a .py file into mongodb through pymongo

I have a lot of data from an excel sheet and I used python to read that data with xlrd and am now outputting all of that data from python. My question is, how do I take that data that I am outputting through python, and upload it on MongoDB. I understand that pymongo must be used, but am not quite sure how to do it. Any help is greatly appreciated.
Let's assume you've read the tutorials but still don't get it...
You'll need to convert your xlrd data into a list of dictionaries, one dictionary for each row in your spreadsheet. Here's a clue: Python Creating Dictionary from excel data
Once you have the list of dictionaries/rows, make sure you have MongoDB running on your machine, then:
from pymongo import MongoClient
db = MongoClient().mydb # create a database called 'mydb'
for row_dict in list_of_rows:
db.rows.save(row_dict) # saves each row in collection called "rows"

Can I import a table from SQL Server (=MS SQL) into a Python / Pandas data frame?

I am using Matlab and its 'datasets', and R and its 'dataframes'.
I am thinking of using Python but I need an equivalent data-storing format.
Extension 'Pandas' for Python has a class called dataframe which is similar.
Now I would like to be able to send a query to a SQL Server and store the result of that query in a Panda Dataframe
e.g.: newDataFrame = GetDataFrameFromSQLServer('SELECT * from schema.table',sqlConnection)
I had the impression that Pandas only talks to SQLite. Is that the case?
I believe pandas can handle any reading from any DB API v2.0 compliant data source. Have a look at pandas.io.sql(link) for a bunch of functions that facilitate this.
One thing to note, is that writing to a database requires a "flavor" where that flavor defaults to sqlite, but in the write_frame definition possible values are {'sqlite', 'mysql', 'oracle'}.

Categories