Python - How to create a pandas Dataframe directly from Smartsheets? - python

I don't understand how to import a Smartsheet and convert it to a pandas dataframe. I want to manipulate the data from smartsheets, currently I go to smartsheets export to csv and import csv in python but want to eliminate this step so that it can run on a schedule.
import smartsheet
import pandas as pd
access_token ='#################'
smartsheet = Smartsheet(access_token)
sheet = smartsheet.sheets.get('Sheet 1')
pd.DataFrame(sheet)

Here is a simple method to convert a sheet to a dataframe:
def simple_sheet_to_dataframe(sheet):
col_names = [col.title for col in sheet.columns]
rows = []
for row in sheet.rows:
cells = []
for cell in row.cells:
cells.append(cell.value)
rows.append(cells)
data_frame = pd.DataFrame(rows, columns=col_names)
return data_frame
The only issue with creating a dataframe from smartsheets is that for certain column types cell.value and cell.display_value are different. For example, contact columns will either display the name or the email address depending on which is used.
Here is a snippet of what I use when needing to pull in data from Smartsheet into Pandas. Note, I've included garbage collection as I regularly work with dozens of sheets at or near the 200,000 cell limit.
import smartsheet
import pandas as pd
import gc
configs = {'api_key': 0000000,
'value_cols': ['Assigned User']}
class SmartsheetConnector:
def __init__(self, configs):
self._cfg = configs
self.ss = smartsheet.Smartsheet(self._cfg['api_key'])
self.ss.errors_as_exceptions(True)
def get_sheet_as_dataframe(self, sheet_id):
sheet = self.ss.Sheets.get_sheet(sheet_id)
col_map = {col.id: col.title for col in sheet.columns}
# rows = sheet id, row id, cell values or display values
data_frame = pd.DataFrame([[sheet.id, row.id] +
[cell.value if col_map[cell.column_id] in self._cfg['value_cols']
else cell.display_value for cell in row.cells]
for row in sheet.rows],
columns=['Sheet ID', 'Row ID'] +
[col.title for col in sheet.columns])
del sheet, col_map
gc.collect() # force garbage collection
return data_frame
def get_report_as_dataframe(self, report_id):
rprt = self.ss.Reports.get_report(report_id, page_size=0)
page_count = int(rprt.total_row_count/10000) + 1
col_map = {col.virtual_id: col.title for col in rprt.columns}
data = []
for page in range(1, page_count + 1):
rprt = self.ss.Reports.get_report(report_id, page_size=10000, page=page)
data += [[row.sheet_id, row.id] +
[cell.value if col_map[cell.virtual_column_id] in self._cfg['value_cols']
else cell.display_value for cell in row.cells] for row in rprt.rows]
del rprt
data_frame = pd.DataFrame(data, columns=['Sheet ID', 'Row ID']+list(col_map.values()))
del col_map, page_count, data
gc.collect()
return data_frame
This adds additional columns for sheet and row IDs so that I can write back to Smartsheet later if needed.

Sheets cannot be retrieved by name, as you've shown in your example code. It is entirely possible for you to have multiple sheets with the same name. You must retrieve them with their sheetId number.
For example:
sheet = smartsheet_client.Sheets.get_sheet(4583173393803140) # sheet_id
http://smartsheet-platform.github.io/api-docs/#get-sheet
Smartsheet sheets have a lot of properties associated with them. You'll need to go through the rows and columns of your sheet to retrieve the information you're looking for, and construct it in a format your other system can recognize.
The API docs contain a listing of properties and examples. As a minimal example:
for row in sheet.rows:
for cell in row.cells
# Do something with cell.object_value here

Get the sheet as a csv:
(https://smartsheet-platform.github.io/api-docs/?python#get-sheet-as-excel-pdf-csv)
smartsheet_client.Sheets.get_sheet_as_csv(
1531988831168388, # sheet_id
download_directory_path)
Read the csv into a DataFrame:
(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)
pandas.read_csv

You can use this library
Very easy to use and allows Sheets or Reports to be delivered as a Dataframe.
pip install smartsheet-dataframe
Get a report as df
from smartsheet_dataframe import get_as_df, get_report_as_df
df = get_report_as_df(token='smartsheet_auth_token',
report_id=report_id_int)
Get a sheet as df
from smartsheet_dataframe import get_as_df, get_sheet_as_df
df = get_sheet_as_df(token='smartsheet_auth_token',
sheet_id=sheet_id_int)
replace 'smartsheet_auth_token' with your token (numbers and letters)
replace sheet_id_int with your sheet/report id (numbers only)

Related

Read CSV sheet data and created new one

I have a CSV file which have multiple sheets in it. Want to read it sheet by sheet and filter some data and want to create csv file in same format. how can I do that. Please suggest. I was trying it though pandas.ExcelReader but its not working for CSV file.
you can use the following code for this may help!
import pandas as pd
def read_excel_sheets(xls_path):
"""Read all sheets of an Excel workbook and return a single DataFrame"""
print(f'Loading {xls_path} into pandas')
xl = pd.ExcelFile(xls_path)
df = pd.DataFrame()
columns = None
for idx, name in enumerate(xl.sheet_names):
print(f'Reading sheet #{idx}: {name}')
sheet = xl.parse(name)
if idx == 0:
# Save column names from the first sheet to match for append
columns = sheet.columns
sheet.columns = columns
# Assume index of existing data frame when appended
df = df.append(sheet, ignore_index=True)
return df
the resource for this code is the link below:
click here
and for converting it back to csv you can follow the post which link is
attached here

Convert excel file with many sheets (with spaces in the name of the shett) in pandas data frame

I would like to convert an excel file to a pandas dataframe. All the sheets name have spaces in the name, for instances, ' part 1 of 22, part 2 of 22, and so on. In addition the first column is the same for all the sheets.
I would like to convert this excel file to a unique dataframe. However I dont know what happen with the name in python. I mean I was hable to import them, but i do not know the name of the data frame.
The sheets are imported but i do not know the name of them. After this i would like to use another 'for' and use a pd.merge() in order to create a unique dataframe
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
print(sheet_name.info())
Using only the code snippet you have shown, each sheet (each DataFrame) will be assigned to the variable sheet_name. Thus, this variable is overwritten on each iteration and you will only have the last sheet as a DataFrame assigned to that variable.
To achieve what you want to do you have to store each sheet, loaded as a DataFrame, somewhere, a list for example. You can then merge or concatenate them, depending on your needs.
Try this:
all_my_sheets = []
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
all_my_sheets.append(sheet_name)
Or, even better, using list comprehension:
all_my_sheets = [pd.read_excel(Matrix, sheet_name) for sheet_name in Matrix.sheet_names]
You can then concatenate them into one DataFrame like this:
final_df = pd.concat(all_my_sheets, sort=False)
You might consider using the openpyxl package:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename=file_path, read_only=True)
all_my_sheets = wb.sheetnames
# Assuming your sheets have the same headers and footers
n = 1
for ws in all_my_sheets:
records = []
for row in ws._cells_by_row(min_col=1,
min_row=n,
max_col=ws.max_column,
max_row=n):
rec = [cell.value for cell in row]
records.append(rec)
# Make sure you don't duplicate the header
n = 2
# ------------------------------
# Set the column names
records = records[header_row-1:]
header = records.pop(0)
# Create your df
df = pd.DataFrame(records, columns=header)
It may be easiest to call read_excel() once, and save the contents into a list.
So, the first step would look like this:
dfs = pd.read_excel(["Sheet 1", "Sheet 2", "Sheet 3"])
Note that the sheet names you use in the list should be the same as those in the excel file. Then, if you wanted to vertically concatenate these sheets, you would just call:
final_df = pd.concat(dfs, axis=1)
Note that this solution would result in a final_df that includes column headers from all three sheets. So, ideally they would be the same. It sounds like you want to merge the information, which would be done differently; we can't help you with the merge without more information.
I hope this helps!

Programming Data to CSV using Pandas

I am trying to make a CSV, Excel I followed an online aid however, it appears not to work, and it brings up KeyError: 'Teflon'. Any thoughts why?
Here is the aid I was following Aid
import pandas as pd
import os
def sort_data_frame_by_Teflon_column(dataframe):
dataframe = dataframe.sort_values(by= ['Teflon'])
def sort_data_frame_by_LacticAcid_column(dataframe):
dataframe = dataframe.sort_values(by= ['Lactic Acid'])
def sort_data_frame_by_ExperimentalWeight_column(dataframe):
dataframe = dataframe.sort_values(by= ['Experimental Weight'])
def sort_data_frame_by_Ratio_column(dataframe):
dataframe = dataframe.sort_values(by= ['Ratio of Lactic Acid to Experimental Weight'])
def get_data_in_Teflon(dataframe):
dataframe = dataframe.loc[dataframe['Teflon']]
dataframe = dataframe.sort_values(by=['Teflon'])
def get_data_in_LacticAcid(dataframe):
dataframe = dataframe.loc[dataframe['Lactic Acid']]
dataframe = dataframe.sort_values(by= ['Lactic Acid'])
def get_data_in_ExperimentalWeight(dataframe):
dataframe = dataframe.loc[dataframe['Experimental Weight']]
dataframe = dataframe.sort_values(by= ['Experimental Weight'])
def get_data_in_Ratio(dataframe):
dataframe = dataframe.loc[dataframe['Ratio of Lactic Acid to Experimental Weight']]
dataframe = dataframe.sort_values(by= ['Ratio of Lactic Acid to Experimental Weight'])
path = 'C:\\Users\\Light_Wisdom\\Documents\\Spyder\\Mass-TeflonLacticAcidRatio.csv'
#output_file = open(path,'x')
#text = input("Input Data: ")
#text.replace('\\n', '\n')
#output_file.write(text. replace('\\', ''))
#output_file.close()
csv_file = 'C:\\Users\\Light_Wisdom\\Documents\\Spyder\\Mass-TeflonLacticAcidRatio.csv'
dataframe = pd.read_csv(csv_file)
dataframe = dataframe.set_index('Teflon')
sort_data_frame_by_Teflon_column(dataframe)
sort_data_frame_by_LacticAcid_column(dataframe)
sort_data_frame_by_ExperimentalWeight_column(dataframe)
sort_data_frame_by_Ratio_column(dataframe)
get_data_in_Teflon(dataframe)
get_data_in_LacticAcid(dataframe)
get_data_in_ExperimentalWeight(dataframe)
get_data_in_Ratio(dataframe)
write_to_csv_file_by_pandas("C:\\images\\Trial1.csv", dataframe)
write_to_excel_file_by_pandas("C:\\images\\Trial1.xlsx", dataframe)
#data_frame.to_csv(csv_file_path)
#excel_writer = pd.ExcelWriter(excel_file_path, engine = 'xlsxwriter')
#excel_writer.save()
Here is the CSV:
Teflon,Lactic Acid,Experimental Weight,Ratio of Lactic Acid to Experimental Weight
1.973,.2201,1.56,.14
2.05,.15,.93,.16
1.76,.44,1.56,.28
Edit New Question 7/24/19
I am trying to automate an answer with functions and I was on the attempt when I got this error.
def get_Data():
check = 'No'
while(check == 'Yes'):
row_name = input("What is the row number? ")
row_name = []
data = float(input("Teflon, Lactic_Acid, Expt_Wt, LacticAcid_to_Expt1_Wt: "))
dataframe = []
check = input("Add another row? ")
return row_name,data, dataframe
def row_inputter(row_name,data,dataframe):
row_name.append(data)
dataframe.append(row_name)
return row_name, dataframe
# Define your data
#row1 = [ 1.973, .2201, 1.56, .14]
#row2 = [2.05, .15, .93, .16]
#row3 = [1.76, .44, 1.56, .28]
row_name,data, dataframe = get_Data()
row, df = row_inputter()
I can tell that you are a Pandas beginner. No worries... Here's how you do the first few operations.
The AID that you reference is doing things the old fashioned way, and not leveraging many fine tools already created for working with CSV and XLSX data in and out of Pandas and Python.
XLSXWriter is a fabulous library that reads and writes Pandas data easily.
[XLXSwriter.com][1]https://xlsxwriter.readthedocs.io/working_with_pandas.html
# Do necessary imports
import pandas as pd
import os
import xlsxwriter
# Define your data
expt_data = ["Teflon", "Lactic_Acid", "Expt_Wt", "LacticAcid_to_Exptl_Wt"]
row1 = [ 1.973, .2201, 1.56, .14]
row2 = [2.05, .15, .93, .16]
row3 = [1.76, .44, 1.56, .28]
# Create dataframe using constructor method
df1 = pd.DataFrame([row1, row2, row3], columns=expt_data)
# Output dataframe
df1
# Sort dataframe by Teflon column values and output it
Teflon_Sorted = df1.sort_values(by=["Teflon"])
Teflon_Sorted
# Sort dataframe by Lactic_Acid column values and output it
Lactic_Acid_Sorted = df1.sort_values(by=["Lactic_Acid"])
Lactic_Acid_Sorted
# Sort dataframe by Expt_Wt column values and output it
Expt_Wt_sorted = df1.sort_values(by=["Expt_Wt"])
Expt_Wt_sorted
# Sort dataframe by Expt_Wt column values and output it
LacticAcid_to_Exptl_Wt_sorted = df1.sort_values(by=["LacticAcid_to_Exptl_Wt"])
LacticAcid_to_Exptl_Wt_sorted
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("Trial1.xlsx", engine='xlsxwriter')
# Convert all dataframes to XlsxWriter Excel objects and then write each to a different worksheet in the workbook created above named "Trial1.xlsx".
Teflon_Sorted.to_excel(writer, sheet_name='Teflon_Sorted')
Lactic_Acid_Sorted.to_excel(writer, sheet_name='Lactic_Acid_Sorted')
Expt_Wt_sorted.to_excel(writer, sheet_name='Expt_Wt_sorted')
LacticAcid_to_Exptl_Wt_sorted.to_excel(writer, sheet_name='LacticAcid_to_Exptl_Wt_sorted')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
# now go to your current directory in your file system where the Jupyter Notebook or Python file is executing and find your file.
# Type !dir in Jupyter cell to list current directory on MS-Windows
!dir
[XLXSwriter.com][1]https://xlsxwriter.readthedocs.io/working_with_pandas.html
Sorry this is not a complete application that does everything you want, but I have limited time. I showed you how to write out your final results. I left it as a learning exercise for you to learn how to read in your data file, rather than creating it "on the fly" inline in your Python program.
My recommendation is to use XLXSwriter for everything related to Excel or Pandas. Follow the fabulous tutorial on the XLSXwriter website. XLSXwriter is probably the best and easiest Python-Pandas-Excel toolkit right now. It does everything programmatically that someone would normally have to do manually ("interactively").
You already set Teflon as index by
dataframe = dataframe.set_index('Teflon')
you dataframe no longer contains that columns. Your function
sort_data_frame_by_Teflon_column()
would fail and through that error.
Also, the other functions like:
def get_data_in_LacticAcid(dataframe):
dataframe = dataframe.loc[dataframe['Lactic Acid']]
dataframe = dataframe.sort_values(by= ['Lactic Acid'])
will likely fail or turns your dataframe to an empty one due to the first line. What exactly are you trying to achieve with those functions?

Import Excel Tables into pandas dataframe

I would like to import excel tables (made by using the Excel 2007 and above tabulating feature) in a workbook into separate dataframes. Apologies if this has been asked before but from my searches I couldn't find what I wanted. I know you can easily do this using the read_excel function however this requires the specification of a Sheetname or returns a dict of dataframes for each sheet.
Instead of specifying sheetname, I was wondering whether there was a way of specifying tablename or better yet return a dict of dataframes for each table in the workbook.
I know this can be done by combining xlwings with pandas but was wondering whether this was built-into any of the pandas functions already (maybe ExcelFile).
Something like this:-
import pandas as pd
xls = pd.ExcelFile('excel_file_path.xls')
# to read all tables to a map
tables_to_df_map = {}
for table_name in xls.table_names:
table_to_df_map[table_name] = xls.parse(table_name)
Although not exactly what I was after, I have found a way to get table names with the caveat that it's restricted to sheet name.
Here's an excerpt from the code that I'm currently using:
import pandas as pd
import openpyxl as op
wb=op.load_workbook(file_location)
# Connecting to the specified worksheet
ws = wb[sheetname]
# Initliasing an empty list where the excel tables will be imported
# into
var_tables = []
# Importing table details from excel: Table_Name and Sheet_Range
for table in ws._tables:
sht_range = ws[table.ref]
data_rows = []
i = 0
j = 0
for row in sht_range:
j += 1
data_cols = []
for cell in row:
i += 1
data_cols.append(cell.value)
if (i == len(row)) & (j == 1):
data_cols.append('Table_Name')
elif i == len(row):
data_cols.append(table.name)
data_rows.append(data_cols)
i = 0
var_tables.append(data_rows)
# Creating an empty list where all the ifs will be appended
# into
var_df = []
# Appending each table extracted from excel into the list
for tb in var_tables:
df = pd.DataFrame(tb[1:], columns=tb[0])
var_df.append(df)
# Merging all in one big df
df = pd.concat(var_df,axis=1) # This merges on columns

Copy unique row to pandas dataframe?

I have an excel workbook with multiple sheets with some sales data. I am trying to sort them so that each customer has a separate sheet(different workbook), and has the item details. I have created a dictionary with all customernames.
for name in cust_dict.keys():
cust_dict[name] = pd.DataFrame(columns=cols)
for sheet in sheets:
ws = sales_wb.sheet_by_name(sheet)
code = ws.cell(4, 0).value #This is the item code
df = pd.read_excel(sales_wb, engine='xlrd', sheet_name=sheet, skiprows=7)
df = df.fillna(0)
count = 0
for index,row in df.iterrows():
print('rotation '+str(count))
count+=1
if row['Particulars'] != 0 and row['Particulars'] not in no_cust:
cust_name = row['Particulars']
# try:
cust_dict[cust_name] = cust_dict[cust_name].append(df.loc[df['Particulars'] == cust_name],ignore_index=False)
cust_dict[cust_name] = cust_dict[cust_name].drop_duplicates()
cust_dict[cust_name]['Particulars'] = code
Right now I have to drop duplicates because the Particulars has the client name more than once and hence the cope appends the data say x number of times.
I would like to avoid this but I can't seem to figure out a good way to do it.
The second problem is that since the code changes to the code in the last sheet for all rows, but I want it to remain the same for the rows pulled from a particular sheet.
I can't seem to figure out a way around both the above problems.
Thanks

Categories