I have a dataframe that I imported using pandas.read_csv that is two columns. I manipulated one column, and now would like to save all three columns as a .csv file. I have been able to save one column at a time, but am unable to get all three (df.Time, df.Distance, and df.Velocity). Here is what I'm working with.
`import pandas as pd
df=pd.read_csv('/Users/path/file.csv', delimiter=',', usecols=['A', 'B'])
df.columns = ['Time', 'Range']
df.Time = df['Time'].round(14)
df.Range = df['Range'].round(14)
df.Velocity = (df.Range.shift(1) - df.Range) / (df.Time.shift(1) -df.Time)
df2 = [df.Time, df.Range, df.Velocity]
df2.to_csv('test5.csv', columns = header)`
your assignment makes df2 a list and not a dataframe (df2 = [df.Time, df.Range, df.Velocity]).
You probably want:
df[['Time', 'Range', 'Velocity']].to_csv('test5.csv')
import pandas as pd
data=pd.read_csv('filename.csv')
data[['column1','column2','column3',...]].to_csv('fileNameWhereYouwantToWrite.csv')
You can use like this
Related
I have a pandas dataframe pd like this
https://imgur.com/a/6TM3B3o
I want to filter the df_participants column which contains pd['df_pair'][0], and the expected result should be a new pd1 containing only rows that pd['df_pair'][0] is a subset of df_paticipants like this https://imgur.com/EzCcuh3
I have no idea how to do that. I have tried with .isin() or pd1 = pd[pd['df_participants'].str.contains(pd['df_pair'][0])] but it does not work. Is there any idea?
I think pd variable is not good idea for DataFrame, better is use df:
#remove nans rows
df = df.dropna(subset=['df_participants'])
#get rows if subset
df = df[df.df_participants.map(set(df['df_pair'][0]).issubset)]
I've read in some csv files using pandas. For now it's only two files, but in a few weeks I'll be working with several hundred csv files with the same data variables.
I've used a for loop to read in the files and appended the dataframes to a single list, and then used this for loop to differentiate the names some:
for i, df in enumerate(separate_data, 1):
df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]
My question is this, how can I compare the variables between the files using a bar plot? For example, one of the common variables is temperature, so after differentiating the column names I now have temp_df1 and temp_df2. How would I go about calling all temperature columns to compare them in a bar plot?
I tried using this, but could not get it to work:
for df in separate_data:
temp_comp = separate_data.plot.bar(y='temp*')
Let's say you have the three dataframes below, each with a temp column. Here is how you iteratively combine the temp columns into a single new dataframe and plot them:
import matplotlib.pyplot as plt
import pandas as pd
df1 = pd.DataFrame({'temp':[100,150,200], 'pressure': [10,20,30]})
df2 = pd.DataFrame({'temp':[50,70,100], 'pressure': [10,25,40]})
df3 = pd.DataFrame({'temp':[110,80,120], 'pressure': [8,20,50]})
df_list = [df1,df2,df3]
df_combined = pd.DataFrame()
for i, df in enumerate(df_list):
df_combined[f'df{i+1}'] = df['temp']
print('Combined Dataframe\n', df_combined)
df_combined.plot(kind = 'bar')
plt.ylabel('Temp')
plt.show()
#Combined Dataframe
df1 df2 df3
0 100 50 110
1 150 70 80
2 200 100 120
Note that this assumes that all your dataframes have the same length. If this is not true, you can just read the first n (e.g. 50) rows from each dataframe to ensure equal lengths with:
df = pd.read_csv('sample.csv', nrows=50).
If you can, I would read into one data frame with an identifier to make the aggregation easier. For example:
import pandas as pd
filenames = ["file_1.csv", "file_2.csv"]
df = pd.concat(
[
pd.read_csv(filename).assign(filename=filename.split(".")[0])
for filename in filenames
]
)
df.groupby("filename")["column_to_plot"].mean().plot.bar()
I am facing a problem with implanting a python code that reads the first n rows from a .csv file and store the values of the columns in a list . the length of the list has to be 2000, and the list will be used to create a plot
The columns in the .csv file are not labeled
You can use pandas to do this:
import pandas as pd
df = pd.read_csv("test.csv", nrows=2000, header=None) #header = None avoids the first row to be read as column names
df_list = df.values.tolist()
this might help but for future reference , read pandas documentation
import pandas as pd
df = pd.read_csv(<full_path_to_file>,nrows=2000,sep=<file_seperator>)
col_list = list(df.columns)
how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.
Consider the following code:
import datetime
import pandas as pd
import numpy as np
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['A','B', 'C']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, index=index, columns=columns)
df
Here we create an empty DataFrame in Python using Pandas and then fill it to any extent. However, is it possible to add columns dynamically in a similar manner, i.e., for columns = ['A','B', 'C'], it must be possible to add columns D,E,F etc till a specified number.
I think the
pandas.DataFrame.append
method is what you are after.
e.g.
output_frame=input_frame.append(appended_frame)
There are additional examples in the documentation Pandas merge join and concatenate documentation