How to write a pandas dataframe into csv file? - python

I don't get how to do this. I'm trying the follow:
OutputData_tmp = pd.DataFrame(columns=('frame_len', 'frame_transport_protocol', 'ip_len', 'ip_ttl', 'ip_src', 'ip_dst', 'src_port', 'dst_port', 'payload_len', 'data_len'))
to create an empty dataframe, and then, inside a for loop I do:
OutputData_tmp.loc(line)
whit 'line' being a list of float values.
Then:
OutputData_tmp.to_csv('TrainingSet\\TrainingFeatures.csv')
to save the dataframe as csv.
But when I open TrainingFeatures.csv it is empty.. only have the header (columns names)
What???

You are adding row in a dataframe wrong.
Refer to following link for adding a row.
add one row in a pandas.DataFrame
Rather than doing OutputData_tmp.loc(line), do
OutputData_tmp.loc(i) = line
Hope this helps.

Related

How to read and modify csv files in function in loop and save as separated DataFrame in Python Pandas?

I try to create function in Python Pandas where:
I read 5 csv
make some aggregations on each readed csv (just to make it easier, we can delete one column)
save each modified csv as DataFrames
Currently I have something like below, nevertheless it return only one DataFrame as output not 5, how can I change below code ?
def xx():
#1. read 5 csv
for el in [col for col in os.listdir("mypath") if col.endswith(".csv")]:
df = pd.read_csv("path/f"{el}"")
#2. making aggregations
df = df.drop("COL1", axis=1)
#3. saving each modified csv to separated DataFrames
?????
FInally I need to have 5 separated DataFrames after modifications, how can I modify my function to achieve taht in Phython Pandas ?
You can create an empty dictionnary and feed it gradually with the five processed dataframes.
Try this:
def xx():
dico_dfs={}
for el in [file for file in os.listdir("mypath") if file.endswith(".csv")]:
#1. read 5 csv
df = pd.read_csv(f"path/{el}")
#2. making aggregations
df = df.drop("COL1", axis=1)
#3. saving each modified csv to separated DataFrames
dico_dfs[el]= df
You can access to each dataframe by using the filename as a key, e.g dico_dfs["file1.csv"].
If needed, you can make a single dataframe by using pandas.concat : pd.concat(dico_dfs).

How to export a dictionary to excel using Pandas

I am trying to export some data from python to excel using Pandas, and not succeeding. The data is a dictionary, where the keys are a tuple of 4 elements.
I am currently using the following code:
df = pd.DataFrame(data)
df.to_excel("*file location*", index=False)
and I get an exported 2-column table as follows:
I am trying to get an excel table where the first 3 elements of the key are split into their own columns, and the 4th element of the key (Period in this case) becomes a column name, similar to the example below:
I have tried using different additions to the above code but I'm a bit new to this, and so nothing is working so far
Based on what you show us (which is unreplicable), you need pandas.MultiIndex
df_ = df.set_index(0) # `0` since your tuples seem to be located at the first column
df_.index = pd.MultiIndex.from_tuples(df_.index) # We convert your simple index into NDimensional index
# `~.unstack` does the job of locating your periods as columns
df_.unstack(level=-1).droplevel(0, axis=1).to_excel(
"file location", index=True
)
you could try exporting to a csv instead
df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv', index = False)
which can then be converted to an excel file easily

Is there a way to edit columns in CSV file with python?

I'm trying to standardize data in a large CSV file. I want to replace a string "Greek" with a different string "Q35497" but only in a single column (I don't want to replace every instance of the word "Greek" to "Q35497" in every column but just in a column named "P407"). This is what I have so far
data_frame = pd.read_csv('/data.csv') data_frame["P407"] = data_frame['P407'].astype(str) data_frame["P407"].str.replace('Greek', 'Q35497')
But what this does is just create a single column "P407" with a list of strings (such as 'Q35497') and I can't append it to the whole csv table.
I tried using DataFrame.replace
data_frame = data_frame.replace( #to_replace={"P407":{'Greek':'Q35497'}}, #inplace=True #)
But this just creates an empty set. I also can't figure out why data_frame["P407"] creates a separate series that cannot be added to the original csv file.
Your approach is correct but you have missing to store the modified dataframe.
data_frame = pd.read_csv('/data.csv')
data_frame["P407"] = data_frame["P407"].str.replace('Greek', 'Q35497')

There is an extra id column in dataFrame read from csv [duplicate]

I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.
I tried:
pd.read_csv('C:/Path to file to edit.csv', index_col = False)
And to save the file...
pd.to_csv('C:/Path to save edited file.csv', index_col = False)
However, I still got the unwanted index column. How can I avoid this when I save my files?
Use index=False.
df.to_csv('your.csv', index=False)
There are two ways to handle the situation where we do not want the index to be stored in csv file.
As others have stated you can use index=False while saving your
dataframe to csv file.
df.to_csv('file_name.csv',index=False)
Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!
df.to_csv(' file_name.csv ')
df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)
If you want no index, read file using:
import pandas as pd
df = pd.read_csv('file.csv', index_col=0)
save it using
df.to_csv('file.csv', index=False)
As others have stated, if you don't want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)
However, since the data you will usually use, have some sort of index themselves, let's say a 'timestamp' column, I would keep the index and load the data using it.
So, to save the indexed data, first set their index and then save the DataFrame:
df.set_index('timestamp')
df.to_csv('processed.csv')
Afterwards, you can either read the data with the index:
pd.read_csv('processed.csv', index_col='timestamp')
or read the data, and then set the index:
pd.read_csv('filename.csv')
pd.set_index('column_name')
Another solution if you want to keep this column as index.
pd.read_csv('filename.csv', index_col='Unnamed: 0')
If you want a good format next statement is the best:
dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)
In this case you have got a csv file with ',' as separate between columns and utf-8 format.
In addition, numerical index won't appear.

Pandas dataframe updating with .loc adds columns and indexes [duplicate]

I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.
I tried:
pd.read_csv('C:/Path to file to edit.csv', index_col = False)
And to save the file...
pd.to_csv('C:/Path to save edited file.csv', index_col = False)
However, I still got the unwanted index column. How can I avoid this when I save my files?
Use index=False.
df.to_csv('your.csv', index=False)
There are two ways to handle the situation where we do not want the index to be stored in csv file.
As others have stated you can use index=False while saving your
dataframe to csv file.
df.to_csv('file_name.csv',index=False)
Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!
df.to_csv(' file_name.csv ')
df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)
If you want no index, read file using:
import pandas as pd
df = pd.read_csv('file.csv', index_col=0)
save it using
df.to_csv('file.csv', index=False)
As others have stated, if you don't want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)
However, since the data you will usually use, have some sort of index themselves, let's say a 'timestamp' column, I would keep the index and load the data using it.
So, to save the indexed data, first set their index and then save the DataFrame:
df.set_index('timestamp')
df.to_csv('processed.csv')
Afterwards, you can either read the data with the index:
pd.read_csv('processed.csv', index_col='timestamp')
or read the data, and then set the index:
pd.read_csv('filename.csv')
pd.set_index('column_name')
Another solution if you want to keep this column as index.
pd.read_csv('filename.csv', index_col='Unnamed: 0')
If you want a good format next statement is the best:
dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)
In this case you have got a csv file with ',' as separate between columns and utf-8 format.
In addition, numerical index won't appear.

Categories