Moving multiple columns of pandas dataframe to csv - python

I have a dataframe that I imported using pandas.read_csv that is two columns. I manipulated one column, and now would like to save all three columns as a .csv file. I have been able to save one column at a time, but am unable to get all three (df.Time, df.Distance, and df.Velocity). Here is what I'm working with.
`import pandas as pd
df=pd.read_csv('/Users/path/file.csv', delimiter=',', usecols=['A', 'B'])
df.columns = ['Time', 'Range']
df.Time = df['Time'].round(14)
df.Range = df['Range'].round(14)
df.Velocity = (df.Range.shift(1) - df.Range) / (df.Time.shift(1) -df.Time)
df2 = [df.Time, df.Range, df.Velocity]
df2.to_csv('test5.csv', columns = header)`

your assignment makes df2 a list and not a dataframe (df2 = [df.Time, df.Range, df.Velocity]).
You probably want:
df[['Time', 'Range', 'Velocity']].to_csv('test5.csv')

import pandas as pd
data=pd.read_csv('filename.csv')
data[['column1','column2','column3',...]].to_csv('fileNameWhereYouwantToWrite.csv')
You can use like this

Related

Filtering a pandas column which contains a list in other column?

I have a pandas dataframe pd like this
https://imgur.com/a/6TM3B3o
I want to filter the df_participants column which contains pd['df_pair'][0], and the expected result should be a new pd1 containing only rows that pd['df_pair'][0] is a subset of df_paticipants like this https://imgur.com/EzCcuh3
I have no idea how to do that. I have tried with .isin() or pd1 = pd[pd['df_participants'].str.contains(pd['df_pair'][0])] but it does not work. Is there any idea?
I think pd variable is not good idea for DataFrame, better is use df:
#remove nans rows
df = df.dropna(subset=['df_participants'])
#get rows if subset
df = df[df.df_participants.map(set(df['df_pair'][0]).issubset)]

How to plot similarly named columns using pandas?

I've read in some csv files using pandas. For now it's only two files, but in a few weeks I'll be working with several hundred csv files with the same data variables.
I've used a for loop to read in the files and appended the dataframes to a single list, and then used this for loop to differentiate the names some:
for i, df in enumerate(separate_data, 1):
df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]
My question is this, how can I compare the variables between the files using a bar plot? For example, one of the common variables is temperature, so after differentiating the column names I now have temp_df1 and temp_df2. How would I go about calling all temperature columns to compare them in a bar plot?
I tried using this, but could not get it to work:
for df in separate_data:
temp_comp = separate_data.plot.bar(y='temp*')
Let's say you have the three dataframes below, each with a temp column. Here is how you iteratively combine the temp columns into a single new dataframe and plot them:
import matplotlib.pyplot as plt
import pandas as pd
df1 = pd.DataFrame({'temp':[100,150,200], 'pressure': [10,20,30]})
df2 = pd.DataFrame({'temp':[50,70,100], 'pressure': [10,25,40]})
df3 = pd.DataFrame({'temp':[110,80,120], 'pressure': [8,20,50]})
df_list = [df1,df2,df3]
df_combined = pd.DataFrame()
for i, df in enumerate(df_list):
df_combined[f'df{i+1}'] = df['temp']
print('Combined Dataframe\n', df_combined)
df_combined.plot(kind = 'bar')
plt.ylabel('Temp')
plt.show()
#Combined Dataframe
df1 df2 df3
0 100 50 110
1 150 70 80
2 200 100 120
Note that this assumes that all your dataframes have the same length. If this is not true, you can just read the first n (e.g. 50) rows from each dataframe to ensure equal lengths with:
df = pd.read_csv('sample.csv', nrows=50).
If you can, I would read into one data frame with an identifier to make the aggregation easier. For example:
import pandas as pd
filenames = ["file_1.csv", "file_2.csv"]
df = pd.concat(
[
pd.read_csv(filename).assign(filename=filename.split(".")[0])
for filename in filenames
]
)
df.groupby("filename")["column_to_plot"].mean().plot.bar()

Read from a .csv the first n rows and store the column in to a list

I am facing a problem with implanting a python code that reads the first n rows from a .csv file and store the values of the columns in a list . the length of the list has to be 2000, and the list will be used to create a plot
The columns in the .csv file are not labeled
You can use pandas to do this:
import pandas as pd
df = pd.read_csv("test.csv", nrows=2000, header=None) #header = None avoids the first row to be read as column names
df_list = df.values.tolist()
this might help but for future reference , read pandas documentation
import pandas as pd
df = pd.read_csv(<full_path_to_file>,nrows=2000,sep=<file_seperator>)
col_list = list(df.columns)

Need help to solve the Unnamed and to change it in dataframe in pandas

how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.

Is it possible to add new columns to DataFrame in Pandas (python)?

Consider the following code:
import datetime
import pandas as pd
import numpy as np
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['A','B', 'C']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, index=index, columns=columns)
df
Here we create an empty DataFrame in Python using Pandas and then fill it to any extent. However, is it possible to add columns dynamically in a similar manner, i.e., for columns = ['A','B', 'C'], it must be possible to add columns D,E,F etc till a specified number.
I think the
pandas.DataFrame.append
method is what you are after.
e.g.
output_frame=input_frame.append(appended_frame)
There are additional examples in the documentation Pandas merge join and concatenate documentation

Categories