How to load and plot csv file using altair package in python? - python

I have a csv file which has 200000 lines and I would like to plot the data files using altair packagae. Documentation states that for large files, data needs to be passed as URL. This is what I have till now.
import altair
alt.data_transformers.enable('csv')
url = 'path/to/data'
chart = alt.chart(url).mark_line.encode(x= 'time:T', y = 'current:Q')
chart.save('name.html')
But this does not seem to work. Am I missing something obvious here?

When you pass a dataset by URL and save the chart to HTML, the important thing is that the URL is valid for the web browser you use to view the HTML file.
So if you are viewing the chart locally and want to load a local file, use an appropriate file:// URL. If you plan to view the file within a web server that supports relative URLs for loading resources, pass the relative URL between the location of the HTML file and the location of the data file.
But, as a side-note, you mention your data has 200,000 rows: no matter how you pass the data to the Vega-Lite renderer, it's unlikely to perform well with that much data. My personal rule-of-thumb is to avoid Altair/Vega-Lite for datasets with more than ~10,000 rows or so.

Apart from what #jakevdp said, one more thing that I noticed was that in defining your plot, you missed out on the brackets "()" after mark_line in your code.
I tried implementing the code on a smaller dataset with this modification, and it worked great.

Related

How to embed an XLSX local file into HTML page with Python and Django

For a Python web project (with Django) I developed a tool that generates an XLSX file. For questions of ergonomics and ease for users I would like to integrate this excel on my HTML page.
So I first thought of converting the XLSX to an HTML array, with the xlsx2html python library. It works but since I can’t determine the desired size for my cells or trim the content during conversion, I end up with huge cells and tiny text..
I found an interesting way with the html tag associated with OneDrive to embed an excel window into a web page, but my file being in my code and not on Excel Online I cannot import it like that. Yet the display is perfect and I don’t need the user to interact with this table.
I have searched a lot for other methods but apart from developing a function to browse my file and generate the script of the html table line by line, I have the feeling that I cannot simply use a method to convert or display it on my web page.
I am not accustomed to this need and wonder if there would not be a cleaner method to display an excel file in html.
Does it make sense to develop a function that builds my html table script in str? Or should I find a library that does it? Maybe there is a specific Django library ?
Thank you for your experience

comparing PDF report to Database

I have one use-case .Lets say there is pdf report which has data from testing of some manufacturing components
and this PDF report is loaded in DB using some internally developed software.
We need to develop some reconciliation program wherein the data needs to be compared from PDF report to Database. We can assume pdf file has a fixed template.
If there are many tables and some raw text data in pdf then how mysql save this pdf data..in One table or in many tables .
Please suggest some approach(preferably in python) for comparing data
Finding and extracting specific text from URL PDF files, without downloading or writing (solution) have a look at this example and see if it will help. I found it worked efficiently for me, this is if the pdf is URL based, but you could simply change the input source to be your DB. In your case you can remove the two if statements under the if isinstance(obj, pdfminer.layout.LTTextBoxHorizontal): line. You mention having PDFs with the same template, if you are looking to extract text from one specific area of the template, use the print statement that has been commented out to find coordinates of desired data. Then as is done in the example, use those coordinates in if statements.

What does include_plotlyjs="cdn" in plotly python do to the generated HTML?

My goal is to generate an interactive html file from plotly fig and embed this html in my website. I was previously using fig.write_html('name.html'), but in the generated HTML, there are some unwanted symbols like ampersands &&.
Now, I tried adding cdn like fig.write_html('name.html', include_plotlyjs="cdn"), which solves the && problem but I have some questions about this:
On using cdn, is my data still secured/private, and can there be some possible complications on embedding this html to my website?
Is there any better/alternate way of removing the && symbols/cleaning the initial html file generated by plotly?
TIA
The include_plotlyjs="cdn" parameter causes a script tag to be included in the HTML output that references the Plotly CDN ("content delivery network"). This offloads the work of providing the necessary javascript from your server to a more scalable one. A browser will typically cache this making subsequent page loads faster.
Your data security/privacy is not affected by this option.
If the unwanted text you refer to is part of the Plotly JavaScript, it must be loaded however this solution will keep it from appearing in your HTML.
See the documentation for to_html for more information.

pd.read_table for .dat fills null values

I am trying to learn data analysis using "Python for Data analysis" by WesMcKinney.
There is a .dat file with the following data :
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
I'm trying to import them using :
unames=['user_id', 'gender', 'age', 'occupation', 'zip']
users = pd.read_table('D:/INSOFE/Python_practice/users.dat', sep='::', header=None,names=unames,engine='python')
But, it shows nulls
Please let me know what I'm doing wrong.
The read_table method expects relatively clean data; if you've simply saved the web page containing the table (cf. the clarifying comments), you will end up with a file full of HTML which pandas will not know what to do about.
Instead, you will want to get the raw contents of the file. In principle you could simply copy the 6040 lines from GitHub into your favorite text editor and save the contents as users.dat.
GitHub makes your life a bit simpler than that by supplying a view of the raw data as well.
With that, if you choose to save the file, most browsers (including e.g. Firefox) will produce a proper users.dat with only the data. Command line tools such as wget or curl allow you to get at the same data without having to use a fully-fledged browser.

Accessing Hovertext with html

I am trying to access hover text found on graph points at this site (bottom):
http://matchhistory.na.leagueoflegends.com/en/#match-details/TRLH1/1002200043?gameHash=b98e62c1bcc887e4&tab=overview
I have the full site html but I am unable to find the values displayed in the hover text. All that can be seen when inspecting a point are x and y values that are transformed versions of these values. The mapping can be determined with manual input taken from the hovertext but this defeats the purpose of looking at the html. Additionally, the mapping changes with each match history so it is not feasible to do this for a large number of games.
Is there any way around this?
thank you
Explanation
Nearly everything on this webpage is loaded via JSON through JavaScript. We don't even have to request the original page. You will, however, have to repiece together the page via id's of items, mysteries and etc., which won't be too hard because you can request masteries similar to how we fetch items.
So, I went through the network tab in inspect and I noticed that it loaded the following JSON formatted URL:
https://acs.leagueoflegends.com/v1/stats/game/TRLH1/1002200043?gameHash=b98e62c1bcc887e4
If you notice, there is a gameHash and the id (similar to that of the link you just sent me). This page contains everything you need to rebuild it, given that you fetch all reliant JSON files.
Dealing with JSON
You can use json.loads in Python to load it, but a great tool I would recomend is:
https://jsonformatter.curiousconcept.com/
You copy and paste JSON in there and it will help you understand the data structure.
Fetching items
The webpage loads all this information via a JSON file:
https://ddragon.leagueoflegends.com/cdn/7.10.1/data/en_US/item.json
It contains all of the information and tool tips about each item in the game. You can access your desired item via: theirJson['data']['1001']. Each image on the page's file name is the id (or 1001) in this example.
For instance, for 'Boots of Speed':
import requests, json
itemJson = json.loads(requests.get('https://ddragon.leagueoflegends.com/cdn/7.10.1/data/en_US/item.json').text)
print(itemJson['data']['1001'])
An alternative: Selenium
Selenium could be used for this. You should look it up. It's been ported for several programming languages, one being Python. It may work as you want it to here, but I sincerely think that the JSON method (describe above), although a little more convoluted, will perform faster (since speed, based on your post, seems to be an important factor).

Categories