Complete noob here. I've been trying the following instructions to access a data set on Kaggle and to read the first 5 rows.
https://towardsdatascience.com/simple-and-multiple-linear-regression-with-python-c9ab422ec29c
I'm using spyder and when I run the following code, I only obtain a runfile wdir= comment in the console
Following is the Code:
import pandas as pd
df=pd.read_csv('weight-height.csv')
df.head(5)
Output:
Code and Console Output
The medium post is probably using jupyter notebooks which will take the last line and put it as formatted output in a cell below it without a print. In a regular python script / idle or other IDEs, you need to actually use the print function to print to the terminal/console.
import pandas as pd
df = pd.read_csv('weight-height.csv')
print(df.head(5))
Related
I am new to data, so after a few lessons on importing data in python, I tried the following codes in my jupter notebook but keep getting an error saying df not defined. I need help.
The code I wrote is as follows;
import pandas as pd
url = "https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv"
df = pd.read_csv(https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv)
After running the third code, I got a series of reports on jupter notebook but one that stood out was "df not defined"
The problem here is that your data is a ZIP file containing multiple CSV files. You need to download the data, unpack the ZIP file, and then read one CSV file at a time.
If you can give more details on the problem(etc: screenshots), debugging will become more easier
One possibility for the error is that the response content accessed by the url(https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv) is a zip file, which may prevent pandas from processing it further.
I'm new on this site so be indulgent if i make a mistake :)
I recently imported a csv file on my Jupyter notebook for a student work. I want use some of data of specific column of this file. The problem is that after import, the file appear as a table with 5286 lines (which represent dates and hours of measures) in a single column (that compiles all variables separated by ; that i want use for my work).
I don't know how to do to put this like a regular table.
I used this code to import my csv from my board :
import pandas as pd
data = pd.read_csv('/work/Weather_data/data 1998-2003.csv','error_bad_lines = false')
Output:
Desired output: the same data in multiple columns, separated on ;.
You can try this:
import pandas as pd
data = pd.read_csv('<location>', sep=';')
I'm trying to capture AND present data in a table format after the script is finished. The website I am using is http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records And the logic is working as such:
I run the command, it opens to the URL
I then go to the URL http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records
I proceed to copy any selected rows/columns from the Table/chart
I then go back to my IDE (Jupyter Notebook) and it takes the captured data and spits it out
I can select the data on that particular webpage and copy it using my cursor by highlighting and selecting “copy”. It will then spit out all that I have selected and copied to my clipboard.
So far, my script that I wrote, is working to only capture the data and then spit it back out as is (unformatted).
PROBLEM: I would like the data I captured to be presented in a table format after I have finished selecting it and have it copied in my clipboard.
I realize I need to probably write the logic for the data I captured to be then be formatted. What would be the best approach for accomplishing this?
Below is my code that I have written so far:
Here is my code:
import numpy as np
Import pandas as pd
from pandas import Series, Dataframe
website='http://en.wikipedia.org/wiki/NFL_win_loss_records'
web browser.open(website)
nfl_frame= pd.read_clipboard(Sep='\t')
nfl_frame
You can read your data directly to DataFrame with pandas.read_html
import pandas as pd
WIKI_URL = 'http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records'
df = pd.read_html(WIKI_URL,header=0)[1]
df.head() # in jupyter or print(df.head()) to show a table with first 5 rows
As pd.read_html returns a list. In them are tables that are in that HTML/URL. I set header to first raw, and selected the second element of the list which is the table you are looking for.
I'm an entry level Python user who just started self-teaching use Python to do data analytics. These days I'm practicing with a Global Suicide Rate data in Jupyter Notebook on Kaggle. I met some problems with formatting my result.I'm wondering how to make my result in several lists, into a well-formatted table?
The dataset I's using is a Global Suicide Date data. For the following section of the code, I want to retrieve all country information in the min_year (which is 1985), and max_year (which is 2016).
So what I expected as my output is something like this: (just an example)
Following are my code.
country_1985 = data [(data['year']==min_year)].country.unique()
country_2016 = data [(data['year']==max_year)].country.unique()
print ([country_1985],[country_2016])
The result shows like this:
However, I don't want those in a list. I'd like it to be shown in a table format something like this:
I tried to use pandas.DataFrame, also doesn't make any sense... Could anyone help me to solve my problem?
Updated:
Thanks for #Code Pope code!!! Thank you for your explanation and patience!
import pandas as pd
import numpy as np
country_1985 = data [(data['year']==min_year)].country.unique()
country_2016 = data [(data['year']==max_year)].country.unique()
country_1985 = pd.DataFrame(country_1985.categories)
country_2016 = pd.DataFrame(country_2016.categories)
# Following are the code from #Code Pope
from IPython.display import display_html
def display_side_by_side(dataframe1, dataframe2):
modified_HTML=dataframe1.to_html() + dataframe2.to_html()
display_html(modified_HTML.replace('table','table
style="display:inline"'),raw=True)
display_side_by_side(country_1985,country_2016 )
Then it looks like this:
Updated Output
As you are saying that you are using Jupyter Notebook, you can change the html of your dataframes before displaying it. Use the following function:
from IPython.display import display_html
def display_side_by_side(dataframe1, dataframe2):
modified_HTML=dataframe1.to_html() + dataframe2.to_html()
display_html(modified_HTML.replace('table','table style="display:inline"'),raw=True)
# then call the function with your two dataframes
display_side_by_side(country_1985,country_2016 )
I have a data set taken from kaggle, and I want to get the result shown here
So, I took that code, changed it a bit and what I ran is this:
# get titanic & test csv files as a DataFrame
titanic_df = pd.read_csv("./input/train.csv")
test_df = pd.read_csv("./input/test.csv")
# preview the data
print titanic_df.head()
This works, as it outputs the right data, but not as neatly as in the tutorial... Can I make it right?
Here is my output (Python 2, Spyder):
Try using Jupyter notebook if you have not used it before. In ipython console, it will wrap the text and show it in multiple lines. In kaggle, what you are seeing is itself a jupyter notebook.