Web Scraping Data Visualization

Web Scraping Data Visualization - python

I'm trying to capture AND present data in a table format after the script is finished. The website I am using is http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records And the logic is working as such:
I run the command, it opens to the URL
I then go to the URL http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records
I proceed to copy any selected rows/columns from the Table/chart
I then go back to my IDE (Jupyter Notebook) and it takes the captured data and spits it out
I can select the data on that particular webpage and copy it using my cursor by highlighting and selecting “copy”. It will then spit out all that I have selected and copied to my clipboard.
So far, my script that I wrote, is working to only capture the data and then spit it back out as is (unformatted).
PROBLEM: I would like the data I captured to be presented in a table format after I have finished selecting it and have it copied in my clipboard.
I realize I need to probably write the logic for the data I captured to be then be formatted. What would be the best approach for accomplishing this?
Below is my code that I have written so far:
Here is my code:
import numpy as np
Import pandas as pd
from pandas import Series, Dataframe
website='http://en.wikipedia.org/wiki/NFL_win_loss_records'
web browser.open(website)
nfl_frame= pd.read_clipboard(Sep='\t')
nfl_frame

You can read your data directly to DataFrame with pandas.read_html
import pandas as pd
WIKI_URL = 'http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records'
df = pd.read_html(WIKI_URL,header=0)[1]
df.head() # in jupyter or print(df.head()) to show a table with first 5 rows
As pd.read_html returns a list. In them are tables that are in that HTML/URL. I set header to first raw, and selected the second element of the list which is the table you are looking for.

Related

Problem with csv data imported on jupyter notebook

I'm new on this site so be indulgent if i make a mistake :)
I recently imported a csv file on my Jupyter notebook for a student work. I want use some of data of specific column of this file. The problem is that after import, the file appear as a table with 5286 lines (which represent dates and hours of measures) in a single column (that compiles all variables separated by ; that i want use for my work).
I don't know how to do to put this like a regular table.
I used this code to import my csv from my board :
import pandas as pd
data = pd.read_csv('/work/Weather_data/data 1998-2003.csv','error_bad_lines = false')
Output:
Desired output: the same data in multiple columns, separated on ;.

You can try this:
import pandas as pd
data = pd.read_csv('<location>', sep=';')

How do you read rows from a csv file and store it in an array using Python codes?

I have a CSV file, diseases_matrix_KNN.csv which has excel table.
Now, I would like to store all the numbers from the row like:
Hypothermia = [0,-1,0,0,0,0,0,0,0,0,0,0,0,0]
For some reason, I am unable to find a solution to this. Even though I have looked. Please let me know if I can read this type of data in the chosen form, using Python please.

most common way to work with excel is use Pandas.
Here is example:
import pandas as pd
df = pd.read_excel(filename)
print (df.iloc['Hypothermia']). # gives you such result

What is the correct way to convert json data (which is undefined/messy) into a DataFrame?

I am trying to understand how JSON data which is not parsed/extracted correctly can be converted into a (Pandas) DataFrame.
I am using python (3.7.1) and have tried the usual way of reading the JSON data. Actually, the code works if I use transpose or axis=1 syntax. But using that completely ignores a large number of values or variables in the data and I am 100% sure that the maybe the code is working but is not giving the desired results.
import pandas as pd
import numpy as np
import csv
import json
sourcefile = open(r"C:\Users\jadil\Downloads\chicago-red-light-and-speed-camera-data\socrata_metadata_red-light-camera-violations.json")
json_data = json.load(sourcefile)
#print(json_data)
type(json_data)
dict
## this code works but is not loading/reading complete data
df = pd.DataFrame.from_dict(json_data, orient="index")
df.head(15)
#This is what I am getting for the first 15 rows
df.head(15)
0
createdAt 1407456580
description This dataset reflects the daily volume of viol...
rights [read]
flags [default, restorable, restorePossibleForType]
id spqx-js37
oid 24980316
owner {'type': 'interactive', 'profileImageUrlLarge'...
newBackend False
totalTimesRated 0
attributionLink http://www.cityofchicago.org
hideFromCatalog False
columns [{'description': 'Intersection of the location...
displayType table
indexUpdatedAt 1553164745
rowsUpdatedBy n9j5-zh

As you have seen, Pandas will attempt to create a data frame out of JSON data even if it is not parsed or extracted correctly. If your goal is to understand exactly what Pandas does when presented with a messy JSON file, you can look inside the code for pd.DataFrame.from_dict() to learn more. If your goal is to get the JSON data to convert correctly to a Pandas data frame, you will need to provide more information abut the JSON data, ideally by providing a sample of the data as text in your question. If your data is sufficiently complicated, you might try the json_normalize() function as described here.

Import data tables in Python

I am new to Python, coming from MATLAB. In MATLAB, I used to create a variable table (copy from excel to MATLAB) in MATLAB and save it as a .mat file and whenever I needed the data from the MATLAB, I used to import it using:
A = importdata('Filename.mat');
[Filename is 38x5 table, see the attached photo]
Is there a way I can do this in Python? I have to work with about 35 such tables and loading everytime from excel is not the best way.

In order to import excel tables into your python environment you have to install pandas.
Check out the detailed guideline.
import pandas as pd
xl = pd.ExcelFile('myFile.xlsx')
I hope this helps.

Use pandas:
import pandas as pd
dataframe = pd.read_csv("your_data.csv")
dataframe.head() # prints out first rows of your data
Or from Excel:
dataframe = pd.read_excel('your_excel_sheet.xlsx')

How to Read a WebPage with Python and write to a flat file?

Very novice at Python here.
Trying to read the table presented at this page (w/ the current filters set as is) and then write it to a csv file.
http://www65.myfantasyleague.com/2017/options?L=47579&O=243&TEAM=DAL&POS=RB
I tried this next approach. It creates the csv file but does not fill it w/ the actual table contents.
Appreciate any help in advance. thanks.
import requests
import pandas as pd
url = 'http://www65.myfantasyleague.com/2017/optionsL=47579&O=243&TEAM=DAL&POS=RB'
csv_file='DAL.RB.csv'
pd.read_html(requests.get(url).content)[-1].to_csv(csv_file)

Generally, try to emphasize your problems better, try to debug and don't put everything in one line. With that said, your specific problem here was the index and the missing ? in the code (after options):
import requests
import pandas as pd
url = 'http://www65.myfantasyleague.com/2017/options?L=47579&O=243&TEAM=DAL&POS=RB'
# -^-
csv_file='DAL.RB.csv'
pd.read_html(requests.get(url).content)[1].to_csv(csv_file)
# -^-
This yields a CSV file with the table in it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Web Scraping Data Visualization - python

Related

Problem with csv data imported on jupyter notebook

How do you read rows from a csv file and store it in an array using Python codes?

What is the correct way to convert json data (which is undefined/messy) into a DataFrame?

Import data tables in Python

How to Read a WebPage with Python and write to a flat file?

Categories

Resources