Flattening Table From Excel into Csv with Pandas - python

I'm trying to take the data from a table in excel and put it into a csv in a single row. I have the data imported from excel into a dataframe using pandas, but now, I need to write this data to a csv in a single row. Is this possible to do, and if so, what would the syntax look like generally if I was taking a 50 row 3 column table and flattening it into 1 row 150 column csv table? My code so far is below:
import pandas as pd
df = pd.read_excel('filelocation.xlsx',
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'],
skiprows=8, parse_cols="B:D", keep_default_na='FALSE', na_values=['NULL'], header=3)
DataFrame.to_csv("outputFile.csv" )
Another question that I would help me understand how to transform this data is, "Is there any way to select a piece of data from a specific row and column"?

You can simply set the line_terminator to nothing, like so:
df.to_csv('ouputfile.csv', line_terminator=',', index=False, header=False)
Or you can translate your dataframe into a numpy array and use the reshape function:
import numpy as np
import pandas as pd
arr = df.values.reshape(1,-1)
You can then use numpy.savetxt() to save as CSV.

try to do this:
df.to_csv("outputFile.csv", line_terminator=',')

Related

How to store tuples in a pandas dataframe cell?

I have a csv import of datas store in such fashion
username;groups
alice;(admin,user)
bob;(user)
I want to do some data analysis on it and import them to a pandas dataframe so that the first column is stored as a string and the second as a tuple.
I tried mydataframe = pd.read_csv('file.csv', sep=';') then convert the groups column with astype method mydataframe['groups'].astype('tuple') but it won't work.
How to store other objects than strings/ints/floats in dataframes?
Thanks.
Untested, but try
mydataframe['groups'].apply(lambda text: tuple(text[1:-1].split(',')))

How to adjust table header when saving dataframe to excel using Pandas?

The objective is to save a df as xlsx format using the code below.
import pandas as pd
from pandas import DataFrame
list_me = [['A','A','A','A','A','B','C','D','D','D','D'],
['TT','TT','UU','UU','UU','UU','UU','TT','TT','TT','TT'],
['5','2','1','1','1','40','10','2','2','2','2'],
['1','1','1','2','3','3','1','2','2','2','1']]
df = DataFrame (list_me).transpose()
df.columns = ['Name','Activity','Hour','Month']
df_tab=pd.crosstab(df.Name, columns=[df.Month, df.Activity], values=df.Hour, aggfunc='sum').fillna(0)
df_tab.reset_index ( level=0, inplace=True )
df_tab.to_excel("output.xlsx")
The code work fine and outputted xlsx as below:
However, I notice adding index on the first column separate the text Month, Activity, Name into separate columns.
May I know whether there is a build-in setting within Pandas that can produce the output as below?
Thanks in advance
p.s.: Please ignore the yellow line, it just to indicate there should be a blank row.

is there a possible way to merge excel rows for duplicete cells in a column with python?

I am still new with python could you pleas help me with this
i have this excel sheet
and i want it to be like this
You can convert the csv data to a panda dataframe like this:
import pandas as pd
df = pd.read_csv("Input.csv")
Then do the data manipulation as such:
df = df.groupby(['Name'])['Training'].apply(', '.join).reset_index()
Finally, create an output csv file:
df.to_csv('Output.csv', sep='\t')
You could use pandas for creating a DataFrame to manipulate the excel sheet information. First, load the file using the function read_excel (this creates a DataFrame), and then use the function groupby and apply to concatenate the strings.
import pandas as pd
# Read the Excel File
df = pd.read_excel('tmp.xlsx')
# Group by the column(s) that you need.
# Finally, use the apply function to arrange the data
df.groupby(['Name'])['Training'].apply(','.join).reset_index( )

python read excel data, filling missing values using pandas

table 1: first table
table 2: second table in single sheet.
guys i want to read and fill missing values of excel data . but i have many table in single sheet, how can i split it and only fill table data values of different tables.
here's my code:
#read excel files
import pandas as pd
import numpy as np
stations_data = pd.read_excel('filename', sheet_name=0, skiprows=6)
#get a data frame with selected columns
FORMAT = ['S.No.', 'YEAR', 'JUNE']
df_selected = stations_data[FORMAT]
for col in FORMAT:
for idx, rows in df_selected.iterrows():
if pd.isnull(df_selected.loc[idx,col]):
df_selected = df_selected.fillna(df_selected.mean())
print (df_selected)
You could use pd.read_excel where you use the key word argument skiprows to start at the 'correct' row for the specific table and skipfooter to stop at the correct row. Of course this may not be so practical if the number of rows in the tables change in the future. Maybe it is easier to just restructure the excel to have one table per sheet, and then just use the sheetname kwarg. See the documentation.

Force Python Pandas DataFrame( read_csv() method) to avoid/not consider first row of my csv/txt file as header

I am reading a txt file (data.txt) using pandas read_csv method. The file has 16 columns and 600 rows. However, after reading the csv into dataframe, I observed that first row in my data.txt file has been taken as the column headings in the dataframe. This reduces the size of my dataframe to 599 from 600 in my text file. How can I force pandas to not use first row as headers for Dataframe.
I am using this code to read the file.
import pandas as pd
df = pd.read_csv("C:\<my_directory_path>\data.txt)
Just add header=None:
import pandas as pd
df = pd.read_csv("C:\<my_directory_path>\data.txt",header=None)
You can use the parameter header=None to read your data in with integers to index the columns, alternatively if you know what the names of your columns are, you can pass in something like names=['col1','col2','col3']

Categories