How to store tuples in a pandas dataframe cell? - python

I have a csv import of datas store in such fashion
username;groups
alice;(admin,user)
bob;(user)
I want to do some data analysis on it and import them to a pandas dataframe so that the first column is stored as a string and the second as a tuple.
I tried mydataframe = pd.read_csv('file.csv', sep=';') then convert the groups column with astype method mydataframe['groups'].astype('tuple') but it won't work.
How to store other objects than strings/ints/floats in dataframes?
Thanks.

Untested, but try
mydataframe['groups'].apply(lambda text: tuple(text[1:-1].split(',')))

Related

Function to take a list of spark dataframe and convert to pandas then csv

import pyspark
dfs=[df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df1,df12,df13,df14,df15]
for x in dfs:
y=x.toPandas()
y.to_csv("D:/data")
This is what I wrote, but I actually want the function to take this list and convert every df into a pandas df and then convert it to csv and save it in the order as it appears on dfs list and save it to a particular directory in the order of name. Is there a possible way to write such function?
PS D:/data is just an imaginary path and is used for explanation.
If you will convert a dataframe to a csv, you still need to state it in df.to_csv. So, try:
for x in dfs:
y=x.toPandas()
y.to_csv(f"D:/data/df{dfs.index(x) + 1}.csv")
I set it as df{dfs.index(x) + 1} so that the file names will be df1, df2, ... etc.

is there a possible way to merge excel rows for duplicete cells in a column with python?

I am still new with python could you pleas help me with this
i have this excel sheet
and i want it to be like this
You can convert the csv data to a panda dataframe like this:
import pandas as pd
df = pd.read_csv("Input.csv")
Then do the data manipulation as such:
df = df.groupby(['Name'])['Training'].apply(', '.join).reset_index()
Finally, create an output csv file:
df.to_csv('Output.csv', sep='\t')
You could use pandas for creating a DataFrame to manipulate the excel sheet information. First, load the file using the function read_excel (this creates a DataFrame), and then use the function groupby and apply to concatenate the strings.
import pandas as pd
# Read the Excel File
df = pd.read_excel('tmp.xlsx')
# Group by the column(s) that you need.
# Finally, use the apply function to arrange the data
df.groupby(['Name'])['Training'].apply(','.join).reset_index( )

read csv to pandas retaining values as it is

I am trying to read a csv and get it into a dataframe but I want to retain the values of columns.
For eg. my first column has values like 001234, 003462 in the csv file but the dataframe interprets it as 1234, 3462, etc. How do I retain the '00' at the front?
Please help! Thanks.
Try this:
df = pd.read_csv(file_path, dtype=str)

read json object and create csv string from it in python

read json object and create csv string from it in python.
I have a object array in string format as.
'[{"date":"2014-10-05T01:12:00.000Z","count":56.4691}, {"date":"2014-10-05T01:14:00.000Z","count":23.4691}, ...]'
I want to transform the string into csv format as,
"","date","count"
"1",2014-09-25 14:01:00,182.478
"2",2014-09-25 14:01:00,182.478
to be able to do it, I firstly read the string with read_json function in pandas library. but it sorted the columns. and count column comes before the date column. how can i get this transformation in python?
Use columns parameter in df.to_csv
Ex:
import pandas as pd
s = '[{"date":"2014-10-05T01:12:00.000Z","count":56.4691}, {"date":"2014-10-05T01:14:00.000Z","count":23.4691}]'
df = pd.read_json(s)
df.to_csv(r"PATH\B.csv", columns=["date", "count"])

Flattening Table From Excel into Csv with Pandas

I'm trying to take the data from a table in excel and put it into a csv in a single row. I have the data imported from excel into a dataframe using pandas, but now, I need to write this data to a csv in a single row. Is this possible to do, and if so, what would the syntax look like generally if I was taking a 50 row 3 column table and flattening it into 1 row 150 column csv table? My code so far is below:
import pandas as pd
df = pd.read_excel('filelocation.xlsx',
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'],
skiprows=8, parse_cols="B:D", keep_default_na='FALSE', na_values=['NULL'], header=3)
DataFrame.to_csv("outputFile.csv" )
Another question that I would help me understand how to transform this data is, "Is there any way to select a piece of data from a specific row and column"?
You can simply set the line_terminator to nothing, like so:
df.to_csv('ouputfile.csv', line_terminator=',', index=False, header=False)
Or you can translate your dataframe into a numpy array and use the reshape function:
import numpy as np
import pandas as pd
arr = df.values.reshape(1,-1)
You can then use numpy.savetxt() to save as CSV.
try to do this:
df.to_csv("outputFile.csv", line_terminator=',')

Categories