what's the best way to not build a page for a specific Pelican article?
I'm asking because I'm using some article pages mainly as metadata collections but I don't want them to have a specific page you can access.
For example I have a bunch of articles for portfolio clients which include metadata like the link to their logo, name of the client, the category of the job we did, etc... but there's not actual article I want to link to.
I tried to give them the Draft metadata or Hidden metadata but that way they do not show up when I iterate them in the relevant page.
I would like to be able to iterate them and access and use their metadata without building a webpage for them.
Thanks for your help.
It's unclear to me what you're after, it sounds like what you really want is to be able to include data from a file into other articles?
If so, you could use the reStructuredText include directive. Save the metadata files with an extension that won't be picked up by Pelican and then include them in all the pages where you want to use them.
Related
For a project I've been using an API to get information from Instagram. However, I would like to get info from posts using keywords (words included inside the post description). This is a feature available in the app
see here, however I have been only able to make search by hashtag, which is not what I want.
I would like to know if any of you know an API/tool able to accomplish this task.
Here is a way to search instagram posts by words or phrases (not hashtags):
First, you would need to use tools for web scrapping Google search results. Check out this answers for guidance. The url that you will make the request to would be something like:
https://www.google.com/search?q=site%3Ahttps%3A%2%2Fwww.instagram.com%2Fp%2F+**put+the+phrase+here**
Once you get the urls from the posts that contain those words, you may want to use an API (e.g. from Rapidapi), build your own code for web scrapping, or use Python packages such as instagramy to get metadata from the instagram posts that you've got.
Usually, the information comes in JSON when using an API, so it is not very difficult to extract the data and put it in as a pandas dataframe if you want to.
is this something like stimulsoft or crystal report for django, i am not talking about report viewer that just export some excel data, i am talking about whole package, like some text with variables and some tables, pages with headers and footers and water marks and so on.
i want to have footer on every page and tables that i don't know how
long they will grow and maybe they go to second page or third and the
page must be generated with footer for new data just like stimulsoft
reporter
You can use Reportlab that contains such features. [read it!]. But I don't found a full package to connect models and making reports. In Reportlab you can make page templates and complete them with data. For the Persian language, you should use external packages for RTL reshaping.
Check out ReportBro
Commercial use requires a license. I'm not affiliated, but currently evaluating for use in my own project. It seems to offer everything you're looking for.
Sorry if this is not a valid question, i personally feel it kind of boarders on the edge.
Assuming the website involved has given full permission
How could I download the ENTIRE contents (html) of that website using a python data scraper. By entire contents I refer to not only the current page you are on, but any other directory that branches off of that main website. Eg.
Using the link:
https://www.dogs.com
could I pull info from:
https://www.dogs.com/about-us
and any other directory attached to the "https://www.dogs.com/"
(I have no idea is dogs.com is a real website or not, just an example)
I have already made a scraper that will pull info from a certain link (nothing further than that), but I want to further improve it so I dont have to have heaps of links. I understand I can use an API but if this is possible I would rather this. Cheers!
while there is scrapy to do it professionally, you can use requests to get the url data, and bs4 to parse the html and look into it. it's also easier to do for a beginner i guess.
anyhow you go, you need to have a starting point, then you just follow the link's in the page, and then link's within those pages.
you might need to check if the url is linking to another website or is still in the targeted website. find the pages one by one and scrape them.
i need to create a web scraper for this website
However I need to get the links for the counties, stored in the interactive map
Unfortunately, for some reason, their search engine doesn't provide all the results as the interactive map does.
My question:
Could anyone tell me how to get all the links for all the counties, without manually accessing them?
Thanks
Technically you can use a decompiler to do this job.
There are free (e.g.: ActionScript Extractor) and paid (e.g.: Sothink
SWF Decompiler) tools out there.
you can reference this answer
Edit :
Most swf content gets external records from either a .xml or .json file.
Without decompiling and just using the browser's Developer Tools we can see that an xml file is indeed accessed (maybe it contains what you want) :
http://www.allpetservices.co.uk/uk_ir_locator.xml.
Put view-source: in front of the link to read it (if there's an error message).
In that xml you want to extract the contents (the xyz) of each & every <link> xyz </link> tag. This will give you the links of every entry on the map.
The short answer to your question: There's no way to get the links from the site.
The solution: The structure of the links you are trying to retrieve are very predictable. They follow the same structure:
http://www.allpetservices.co.uk/search_map.asp?ccounty={COUNTY_NAME}
So, if you can use another site or data source to get the names of each of the counties, you can formulate each of the links that you need.
Hi guys : Is there a way to improve trac wiki quality using a plugin that deals with artifacts like for obsolete pages, or pages that refer to code which doesn't exist anymore, pages that are unlinked, or pages which have a low update-rate ? I think there might be several heuristics which could be used to prevent wiki-rot :
Number of recent edits
Number of recent views
Wether or not a page links to a source file
Wether or not a wiki page's last update is < or > the source files it links to
Wether entire directories in the wiki have been used/edited/ignored over the last "n" days
etc. etc. etc.
If nothing else, just these metrics alone would be useful for each page and each directory from an administrative standpoint.
I don't know of an existing plugin that does this, but everything you mentioned certainly sounds do-able in one way or another.
You can use the trac-admin CLI command to get a list of wiki pages and to dump the contents of a particular wiki page (as plain text) to a file or stdout. Using this, you can write a script that reads in all of the wiki pages, parses the content for links, and generates a graph of which pages link to what. This should pinpoint "orphans" (pages that aren't linked to), pages that link to source files, and pages that link to external resources. Running external links through something like wget can help you identify broken links.
To access last-edited dates, you'll want to query Trac's database. The query you'll need will depend on the particular database type that you're using. For playing with the database in a (relatively) safe and easy manner, I find the WikiTableMacro and TracSql plugins quite useful.
The hardest feature in your question to implement would be the one regarding page views. I don't think that Trac keeps track of page views, you'll probably have to parse your web server's log for that sort of information.
How about these:
BadLinksPlugin: This plugin logs bad local links found in wiki content.
It's a quite new one, just deals with dangling links, but any bad links as I see from source code. This is at least one building block to your solution request.
VisitCounterMacro: Macro displays how many times was wiki page displayed.
This is a rather old one. You'll get just the statistic per page while an administrative view is missing, but this could be built rather easily, i.e. like a custom PageIndex.