Python not reading all the information form excel properly - python

I am trying to open a few excel folders inside a directory and then be able to do stuff with the data (like take the average of one row for three files).
My main goal right now is just to be able to display the information in each excel file. I used the following code to do so. But when I display it, it prints out the '0' element to the '29' element...then it skips 30-50 and and it prints out 51-80.
Here is a snip of my output on python:
import numpy as np
import scipy.io as sio
import scipy
import matplotlib.pyplot as plt
import os
import pandas as pd
from tkinter import filedialog
from tkinter import *
import matplotlib.image as image
import xlsxwriter
import openpyxl
import xlwt
import xlrd
#GUI
root=Tk()
root.withdraw() #closes tkinter window pop-up
path=filedialog.askdirectory(parent=root,title='Choose a file')
path=path+'/'
print('Folder Selected',path)
files=os.listdir(path)
length=len(files)
print('Files inside the folder',files)
Files=[]
for s in os.listdir(path):
Files.append(pd.read_excel(path+s))
print (Files)

I'm quite sure your data is being correctly read. The dots between rows 29 and 51 show that there is more data there. pandas elides these rows, so your console looks cleaner. If you want to see all the rows, you could use the solution from this answer:
with pd.option_context('display.max_rows', None, 'display.max_columns', 3):
print(Files)
Where None sets display limit on rows (no limit) and 3 sets display limit on columns. Here you can find more info on options.

This is actually the standard way to print the data, notice the ellipses between 29 and 51:
29 7.8000 [cont.]
...
51 12.19999 [cont.]
You can still operate on every row. To get the number of rows in a dataframe, you can call
len(df.index)

Related

What command can fix visual output (layout)

Can someone tell me why the columns are arranged like that or tell how to fix it ?
Thanks
# import libraries
import numpy as np
import pandas as pd
from time import time
import mysql.connector
from IPython.display import display # Allows the use display() for dataframes
data = pd.read_csv("car_dataset.csv", delimiter = ";")
# Display result (example (5))
display(data.head(n=5))
I don't know what else to try.
If you look closely, your data is delimited by , not ;. Remove the delimiter parameter.
data = pd.read_csv("car_dataset.csv")

Python Script saves files one directory above user input directory

I am writing a python script which uses tkinter to take the user input for a .xlsx file and segregate the data present it it by grouping the data by location and then exporting individual csv files for each unique value of location alongside the columns I tell it to keep. The issue with it is while taking the user input for the directory to store the files in, the script is saving the file one directory above it.
Ex- lets say the user selects the directory for the files to be saved in as \Desktop\XYZ\Test, the script is saving the exported file one directory above it i.e. \Desktop\XYZ while adding the name for the subdirectory Test into the exported file name. The code I'm using is attached below.
This is probably a simple issue but being a newbie I'm at my wits end trying to resolve this so any help is appreciated.
Code:
import pandas as pd
import csv
import locale
import os
import sys
import unicodedata
import tkinter as tk
from tkinter import simpledialog
from tkinter.filedialog import askopenfilename
from tkinter import *
from tkinter import ttk
ROOT = tk.Tk()
ROOT.withdraw()
data_df = pd.read_excel(askopenfilename())
grouped_df = data_df.groupby('LOCATION')
folderpath = filedialog.askdirectory()
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]
f=pd.read_csv(folderpath+filename+".csv", sep=',')
#print f
keep_col = ['ID','NAME','DATA1','DATA4']
new_f = f[keep_col]
new_f.to_csv(folderpath+data[0]+".csv", index=False)
Sample data
P.S- There will be data is DATA3 and DATA 4 columns but I just didn't enter it here
How the Script is giving the output:
Thanks in Advance!
It seems like the return value of filedialog.askdirectory() ends with the folder the uses selected without a trailing slash, i.e:
\Desktop\XYZ\Test
You're full path created by folderpath+data[0]+".csv" with an example value for data[0] of "potato" will be
\Desktop\XYZ\Testpotato.csv
You need to at least append the \ manualy
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+"\\"+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]

Write excel Function FILTER with Python

I try use function in my excel sheet because after I run the code all function is breaking ( start with {=FILTER instead =Filter( )
That why I write directly in cell the function.
from tkinter import filedialog, scrolledtext, messagebox, ttk
from tkinter import *
import pandas
from decimal import Decimal
import openpyxl
import os
import re
book = openpyxl.load_workbook(filename,keep_vba=True,data_only=TRUE)
book['POF Workbook']['Q33']= '=COUNTIF(B2:B10, ">0")' => ok working
book['POF Workbook']['D33']= 'FILTER(Table1,Table1[POF "'"#]='POF Workbook'!A3,"+'"No data"'+")" => no working
in excel I have :
FILTER(Table1,Table1[POF "#]='POF Workbook'!A3,"No data")
when I try to write : book['POF Workbook']['D33']= '=FILTER(Table1,Table1[POF "'"#]='POF Workbook'!A3,"+'"No data"'+")" nothing happened
I can't find a solution during 3 days :(

Pyviz panel: can't work with FileInput widget

I must be missing something basic about how FileInput widget works in pyviz panel.
In the following code, I let the user select a csv file and the number of rows to display. If a file isn't selected, I generate some random data.
import pandas as pd; import numpy as np; import matplotlib.pyplot as plt
import panel as pn
import panel.widgets as pnw
pn.extension()
datafile = pnw.FileInput()
head = pnw.IntSlider(name='head', value=3, start=1, end=60)
#pn.depends(datafile, head)
def f(datafile, head):
if datafile is None:
data = pd.DataFrame({'x': np.random.rand(10)})
else:
data = pd.read_csv(datafile)
return pn.Column(f'## {head} first rows', data.head(head))
widgets = pn.Column(datafile, head)
col = pn.Column(widgets, f)
col
Here's the problem. If I don't select a file and play with the head widget, the pane acts as expected: the number of displayed rows changes as I change the head widget, and I can see that the data is different after each update.
However, once I select a file, two problems occur. First, the data isn't loaded. Secondly, the column stops reacting to my interactions.
Can anybody tell me what my problem is?
The problem in the code above is that the datafile variable in function f is not a file name but the file contents, as a bytes string. Due to the error, the function throws an unhandled exception that, unfortunately, isn't registered anywhere.
Thus, the data reading line should be
data = pd.read_csv(io.BytesIO(datafile))

CSV data structure printing in messy format

I've exported an Excel into a CSV where all the columns and entires look correct and normal. However, when I put it into a data frame and print the head, the structure becomes very messy and unreadable due to columns being unstructured.
As you can see in the image, the values are not neatly under user_id.
https://imgur.com/a/gbWaTwi
I'm using the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
then
df1 = pd.read_csv('../doc.csv', low_memory=False)
df1.head
Do this --- print the invocation of head. Just saying .head isn't enough.
print(df1.head())

Categories