Well I am trying to create a dataframe in pandas and print it reading a csv file, however it is not pretty displayed
This is the code:
import pandas as pd
df = pd.read_csv("weather.csv")
print(df)
And this is my output:
What can I do?
A sample of weather.csv would help but I believe that this will solve the issue:
import pandas as pd
df = pd.read_csv("weather.csv", sep=';')
print(df)
Next time try to provide your data in text. You need to change separator, default is ','. So try this:
df = pd.read_csv('weather.csv', sep=';')
Related
I have problem with our code to compare every 2 row with different excel file:
and we have code to compare every row:
import pandas as pd
import numpy as np
old_df = pd.read_excel('Test.xlsx', sheet_name="Best Practice Config", names="A", header=None)
new_df = pd.read_excel('Test.xlsx',sheet_name="Existing Config", names="B", header=None)
compare = old_df[~old_df["A"].isin(new_df["B"])
but i need compare 2 row , Please advise what is the best way of pandas to do that.
Try to use pandas.DataFrame.compare method. The documentation is available here.
old_df.compare(new_df)
I hope it will be useful for you.
I have just started to learn to use Jupyter notebook. I have a data file called 'Diseases'.
Opening data file
import pandas as pd
df = pd.read_csv('Diseases.csv')
Choosing data from a column named 'DIABETES', i.e choosing subject IDs that have diabetes, yes is 1 and no is 0.
df[df.DIABETES >1]
Now I want to export this cleaned data (that has fewer rows)
df.to_csv('diabetes-filtered.csv')
This exports the original data file, not the filtered df with fewer rows.
I saw in another question that the inplace argument needs to be used. But I don't know how.
You forget assign back filtered DataFrame, here to df1:
import pandas as pd
df = pd.read_csv('Diseases.csv')
df1 = df[df.DIABETES >1]
df1.to_csv('diabetes-filtered.csv')
Or you can chain filtering and exporting to file:
import pandas as pd
df = pd.read_csv('Diseases.csv')
df[df.DIABETES >1].to_csv('diabetes-filtered.csv')
I have attached a screenshot of my excel sheet. I want to store the length of every string in SUPPLIER_id Length column. But when I run my code, CSV columns are blanks.
And when I use this same code in different CSV, it works well.
I am using following code but not able to print the data.
I have attached the snippet of csv. Can somebody tell me why is this happening:
import pandas as pd
data = pd.read_csv(r'C:/Users/patesari/Desktop/python work/nba.csv')
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
data.dropna(inplace = True)
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
data
print(df)
data.to_csv("C:/Users/patesari/Desktop/python work/nba.csv")
I faced a similar problem in the past.
Instead of:
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
Type this:
data.columns=['SUPPLIER_ID','ACTION']
Also, I don't understand why did you create DataFrame df. It was unnecessary in my opinion.
Aren't you getting a SettingWithCopyWarning from pandas? I would imagine (haven't ran this code) that these lines
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
would not do anything, and should be replaced with
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data.loc[:, 'SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
Here below is the CSV file that I'm working with:
I'm trying to get my hands on the enj coin: (United States) column. Nonetheless when I try printing all of the columns of the DataFrame it doesn't appear to be treated as a column
Code:
import pandas as pd
df = pd.read_csv("/multiTimeline.csv")
print(df.columns)
I get the following output:
Index(['Category: All categories'], dtype='object')
I've tried accessing the column with df['Category: All categories']['enj coin: (United States)'] but sadly it doesn't work.
Question:
Could someone possibly explain to me how I could possibly transform this DataFrame (which has only one column Category: All categories) into a DataFrame which has two columns Time and enj coin: (United States)?
Thank you very much for your help
Try using the parameter skiprows=2 when reading in the CSV. I.e.
df = pd.read_csv("/multiTimeline.csv", skiprows=2)
The csv looks good.
Ignore the complex header at the top.
pd.read_csv(csvdata, header=[1])
The entire header can be taken in as well, although it is not delimited as the data is.
import pandas as pd
from pandas.compat import StringIO
print(pd.__version__)
csvdata = StringIO("""Category: All categories
Time,enj coin: (United States)
2019-04-10T19,7
2019-04-10T20,20""")
df = pd.read_csv(csvdata, header=[0,1])
print(df)
0.24.2
Category: All categories
Time
2019-04-10T19 7
2019-04-10T20 20
Using Python 3.6 and Pandas 0.19.2: How do you read in an excel file and change a column to datetime straight from read_excel? Similar to This Question about converters and dtypes. But I want to read in a certain column as datetime
I want to change this:
import pandas as pd
import datetime
import numpy as np
file = 'PATH_HERE'
df1 = pd.read_excel(file)
df1['COLUMN'] = pd.to_datetime(df1['COLUMN']) # <--- Line to get rid of
into something like:
df1 = pd.read_excel(file, dtypes= {'COLUMN': datetime})
The code does not error, but in my example, COLUMN is still a dtype of int64 after calling print(df1['COLUMN'].dtype)
I have tried using np.datetime64 instead of datetime. I have also tried using converters= instead of dtypes= but to no avail. This may be nit picky, but would be a nice feature to implement in my code.
Typically reading excel sheets will use the dtypes defined in the excel sheets but you cannot specify the dtypes like in read_csv for example. You can provide a converters arg for which you can pass a dict of the column and func to call to convert the column:
df1 = pd.read_excel(file, converters= {'COLUMN': pd.to_datetime})
Another way to read in an excel file and change a column to datetime straight from read_excel is as follows;
import pandas as pd
file = 'PATH_HERE'
df1 = pd.read_excel(file, parse_dates=['COLUMN'])
For reference, I am using python 3.8.3
read_excel supports dtype, just as read_csv, as of this writing:
import datetime
import pandas as pd
xlsx = pd.ExcelFile('path...')
df = pd.read_excel(xlsx, dtype={'column_name': datetime.datetime})
https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html