Pandas to process table - python

I have a table loaded in Jupyter Notebook. I am using Pandas to prepare the data before later analysis.
What I want to do is to "get a client that has highest revenue under each household. The result should be including all columns as a table.
Can someone tell me how to use Pandas to write the codes? Thanks.

Related

Pandas Dataframe (with Date as Header Column) to MySQL

I have been trying to send/export a particular set of data from an excel file to Python to MySQL.
The data from an excel file looks like the one in the screenshot shown below:
Data in Excel
After using 'iloc' and some other pandas functions i get it converted it into the one below:
Data in Python Pandas
Now the problem really is with the Dataframe header column which is a date. I want this data, when exported to MySQL to look like:
Data in MySQL
I have tried converting Date to both string or datetime.datetime etc but so far have not been able to export it to MySQL the way I want to.
Any help would be very much appreciated.
Thanks.

Read Impala tables and column names from Spark job

I have tables in different impala data bases, stored as parquet files, structured as below. I'm trying to figure out a good way to scan all the table names and column names, under all databases, from there I hope to check if the table or column names contain certain values, if so I'd like to read the values etc.
I understand that there are impala query like describe database.tablename, but with all the other processing, I'd like to do this in a spark job. Could some one please help to shed some light on this? Many thanks.
database1.tableOne
database1.tableTwo
database2.tableThree
....
You need to connect to impala with jdbc data frame of spark with .query() option containing describe table. And then reading the dataframe returning from jdbc operator gives the column information.

Create Database using Python on Jupyter Notebook

so i am building a database for a larger program and do not have much experience in this area of coding (mostly embedded system programming). My task is to import a large excel file into python. It is large so i'm assuming I must convert it to a CSV then truncate it by parsing and then partitioning and then import to avoid my computer crashing. Once the file is imported i must be able to extract/search specific information based on the column titles. There are other user interactive aspects that are simply string based so not very difficult. As for the rest, I am getting the picture but would like a more efficient and specific design. Can anyone offer me guidance on this?
An excel or csv can be read into python using pandas. The data is stored as rows and columns and is called a dataframe. To import data in such a structure, you need to import pandas first and then read the csv or excel into the dataframe structure.
import pandas as pd
df1= pd.read_csv('excelfilename.csv')
This dataframe structure is similar to tables and you can perform joining of different dataframes, grouping of data etc.
I am not sure if this is what you need, let me know if you need any further clarifications.
I would recommend actually loading it into a proper database such as Mariadb or Postgresql. This will allow you to access the data from other applications and it takes the load off of you for writing a database. You can then use a ORM if you would like to interact with the data or simply use plain SQL via python.
read the CSV
df = pd.read_csv('sample.csv')
connect to a database
conn = sqlite3.connect("Any_Database_Name.db") #if the db does not exist, this creates a Any_Database_Name.db file in the current directory
store your table in the database:
df.to_sql('Some_Table_Name', conn)
read a SQL Query out of your database and into a pandas dataframe
sql_string = 'SELECT * FROM Some_Table_Name' df = pd.read_sql(sql_string, conn)

Using a dataframe to create a database in sql

How do I insert data stored in a dataframe to a database in SQL. I've been told that i should use pandas.
Here is the question:
Get data from Quandl. Store this in a dataframe. (I've done this part)
Insert data into a sqlite database. Create a database in sqlite and insert the data into a table with an appropriate schema. This can be done with pandas so there is no need to go outside of your program to do this.
Only started python coding couple of days ago, so bit of a noob to this.
What I've got so far:
import quandl
df = quandl.get("ML/AATRI", start_date="2008-01-01")
import pandas as pd
import sqlite3
Thanks!
You may like to first convert the data to .sql and then import the file in database workbench you are using. Suppose you have dataframe df, then using pandas, convert to sql.
import pandas as pd
df.to_sql('filename.sql', engine)
I hope this works for you.

Data Validation in Hive

In our application we are migrating huge volume of data from teradata to Hive.
Need to validate the data between source and target.We are planning to do it using python & pandas dataframe.
My queries are
1.Will pandas data-frame can handle around 15 million of data ?
2.Is there any other way to do it ?
What is the best way to achieve the above using python ?
Thanks in advance

Categories