Here below is the CSV file that I'm working with:
I'm trying to get my hands on the enj coin: (United States) column. Nonetheless when I try printing all of the columns of the DataFrame it doesn't appear to be treated as a column
Code:
import pandas as pd
df = pd.read_csv("/multiTimeline.csv")
print(df.columns)
I get the following output:
Index(['Category: All categories'], dtype='object')
I've tried accessing the column with df['Category: All categories']['enj coin: (United States)'] but sadly it doesn't work.
Question:
Could someone possibly explain to me how I could possibly transform this DataFrame (which has only one column Category: All categories) into a DataFrame which has two columns Time and enj coin: (United States)?
Thank you very much for your help
Try using the parameter skiprows=2 when reading in the CSV. I.e.
df = pd.read_csv("/multiTimeline.csv", skiprows=2)
The csv looks good.
Ignore the complex header at the top.
pd.read_csv(csvdata, header=[1])
The entire header can be taken in as well, although it is not delimited as the data is.
import pandas as pd
from pandas.compat import StringIO
print(pd.__version__)
csvdata = StringIO("""Category: All categories
Time,enj coin: (United States)
2019-04-10T19,7
2019-04-10T20,20""")
df = pd.read_csv(csvdata, header=[0,1])
print(df)
0.24.2
Category: All categories
Time
2019-04-10T19 7
2019-04-10T20 20
Related
Well I am trying to create a dataframe in pandas and print it reading a csv file, however it is not pretty displayed
This is the code:
import pandas as pd
df = pd.read_csv("weather.csv")
print(df)
And this is my output:
What can I do?
A sample of weather.csv would help but I believe that this will solve the issue:
import pandas as pd
df = pd.read_csv("weather.csv", sep=';')
print(df)
Next time try to provide your data in text. You need to change separator, default is ','. So try this:
df = pd.read_csv('weather.csv', sep=';')
I ma trying to write a code that deletes the unnamed column , that comes right before Unix Timestamp. After deleting I will save the modified dataframe into data.csv. How would I be able to get the Expected Output below?
import pandas ads pd
data = pd.read_csv('data.csv')
data.drop('')
data.to_csv('data.csv')
data.csv file
,Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
0,1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1,1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
2,1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
Updated csv (Expected Output):
Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
This is the index. Use index=False in to_csv.
data.to_csv('data.csv', index=False)
Set the first column as index df = pd.read_csv('data.csv', index_col=0) and set index=False when writing the results.
you can follow below code.it will take column from 1st position and then you can save that df to csv without index values.
df = df.iloc[:,1:]
df.to_csv("data.csv",index=False)
I have a CSV of data I've loaded into a dataframe that I'm trying to massage: I want to create a new column that contains the difference from one record to another, grouped by another field.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
rl = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
all_counties = pd.read_csv(url, dtype={"fips": str})
all_counties.date = pd.to_datetime(all_counties.date)
oregon = all_counties.loc[all_counties['state'] == 'Oregon']
oregon.set_index('date', inplace=True)
oregon.sort_values('county', inplace=True)
# This is not working; I was hoping to find the differences from one day to another on a per-county basis
oregon['delta'] = oregon.groupby(['state','county'])['cases'].shift(1, fill_value=0)
oregon.tail()
Unfortunately, I'm getting results where the delta is always the same as the cases.
I'm new at Pandas and relatively inexperienced with Python, so bonus points if you can point me towards how to best read the documentation.
Lets Try
oregon['delta']=oregon.groupby(['state','county'])['cases'].diff().fillna(0)
I want to filter a rather large Pandas dataframe (about 3 million rows) by date.
For some reason the drop method when used with boolean criteria does not work at all. It just returns the same old dataframe. Dropping single rows is no problem though.
This is the code is used initially, which essentially does nothing at all:
import pandas as pd
#open the file
df = pd.read_csv('examplepath/examplefile.csv', names=['File Name','FileSize','File Type','Date Created','Date Last Accessed','Date Last Modified','Path'],\
delimiter=';', header=None, encoding="ISO-8859-1",)
#convert to german style date
df['Date Created'] = pd.to_datetime(df['Date Created'], dayfirst=True)
#drop rows and assign new dataframe
df_filtered = df.drop(df[df['Date Created'] > datetime(2010,1,1)])
I then came up with this code, which seemingly works like a charm:
import pandas as pd
#open the file
df = pd.read_csv('examplepath/examplefile.csv', names=['File Name','FileSize','File Type','Date Created','Date Last Accessed','Date Last Modified','Path'],\
delimiter=';', header=None, encoding="ISO-8859-1",)
#convert to german style date
df['Date Created'] = pd.to_datetime(df['Date Created'], dayfirst=True)
#select rows and assign new dataframe
df_filtered = df['Date Created'] < datetime(2010,1,1)
Both codes in theory should do the same thing, right?
Is one of the codes to be preferred? Can I just work with my second code? In the future I may have to add a second filterdate.
I hope someone can help me.
Thanks and best regards,
Stefan
You've got to give index list or column names to 'drop' either rows or columns respectively.
Read docs and examples given.
Your second approach works because that is the way you filter a dataframe.
You may use it at will.
I am trying to drop some useless columns in a dataframe but I am getting the error: "too many indices for array"
Here is my code :
import pandas as pd
def answer_one():
energy = pd.read_excel("Energy Indicators.xls")
energy.drop(energy.index[0,1], axis = 1)
answer_one()
Option 1
Your syntax is wrong when slicing the index and it should be the columns
import pandas as pd
energy = pd.read_excel("Energy Indicators.xls")
energy.drop(energy.columns[[0,1]], axis=1)
Option 2
I'd do it like this
import pandas as pd
energy = pd.read_excel("Energy Indicators.xls")
energy.iloc[:, 2:]
I think it's better to skip unneeded columns when parsing/reading Excel file:
energy = pd.read_excel("Energy Indicators.xls", usecols='C:ZZ')
If you're trying to drop the column need to change the syntax. You can refer to them by the header or the index. Here is how you would refer to them by name.
import pandas as pd
energy = pd.read_excel("Energy Indicators.xls")
energy.drop(['first_colum', 'second_column'], axis=1, inplace=True)
Another solution would be to exclude them in the first place:
energy = pd.read_excel("Energy Indicators.xls", usecols=[2:])
This will help speed up the import as well.