want to add url links to .csv datafeed using python - python

ive looked through the current related questions but have not managed to find anything similar to my needs.
Im in the process of creating a affiliate store using zencart - now one of the issues is that zencart is not designed for redirects and affiliate stores but it can be done. I will be changing the store so it acts like a showcase store showing prices.
There is a mod called easy populate which allows me to upload datafeeds. This is all well and good however my affiliate link will not be in each product. I can do it manually after uploading the data feed and going to each product and then adding it as an image with a redirect link - However when there are over 500 items its going to be a long repetitive and time consuming job.
I have been told that I can add the links to the data feed before uploading it to zencart and this should be done using python. Ive been reading about python for several days now and feel im looking for the wrong things. I was wondering if someone could please advise the simplest way for me to get this done.
I hope the question makes sense
thanks
abs

You could craft a python script using csv module like this:
>>> import csv
>>> cartWriter = csv.writer(open('yourcart.csv', 'wb'))
>>> cartWriter.writerow(['Product', 'yourinfo', 'yourlink'])
You need to know how link should be formatted hoping that it could be composed using the other parameters present on csv file.

First, use the CSV module as systempuntoout told you, secondly, you will want to change your header to:
mimetype='text/csv'
Content-Disposition = 'attachment; filename=name_of_your_file.csv'
The way to do it depends very much of your website implementation. In pure Python you would probably do that with an HttpResponse object. In django, as well, but there are some shortcuts.
You can find a video demonstrating how to create CSV files with Python on showmedo. It's not free however.
Now, to provide a link to download the CSV, this depends of your Website. What is the technology behinds it : pure Python, Django, Pylons, Tubogear ?
If you can't answer the question, you should ask your boss a training about your infrastructure before trying to make change to it.

Related

Internet Shortcut in python

I have a problem. Let's say I have a website (e.g. www.google.com). Is there any way to create a file with a .url extension linking to this website in python? (I am currently looking for a flat, and I am trying to save shortcuts on my hard drive only to apartment offers posted online matching my expectations ) I've tried to use the os and requests module to create such files, but with no success. I would really appreciate the help. (I am using python 3.9.6 on Windows 10)
This is pretty straightforward. I had no idea what .URL files were before seeing this post, so I decided to drag its URL to my desktop. It created a file with the following contents which I viewed in Notepad:
[InternetShortcut]
URL=https://stackoverflow.com/questions/68304057/internet-shortcut-in-python
So, you just need to write out the same thing via Python, except replace the URL with the one you want:
test_url = r'https://www.google.com/'
with open('Google.url','w') as f:
f.write(f"""[InternetShortcut]
URL={test_url}
""")
With regards to your current attempts:
I've tried to use os and requests module to create such file
It's not clear what you're using requests or os for, since you didn't provide a Minimal Reproduceable Example of what you'd tried so far; so, if there's a more complex element to this that you didn't specify, such as automatically generating the file while you're in your browser, or something like that, then you need to update your question to include all of your requirements.

Is pandas a local-only library

I recently started coding, but took a brief stint. I started a new job and I’m under some confidential restrictions. I need to make sure python and pandas are secure before I do this—I’ll also be talking with IT on Monday
I was wondering if pandas in python was a local library, or does the data get sent to or from elsewhere? If I write something in pandas—will the data be stored somewhere under pandas?
The best example of what I’m doing is best found on a medium article about stripping data from tables that don’t have csv Exports.
https://medium.com/#ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58
Creating a DataFrame out of a dict, doing vectorized operations on its rows, printing out slices of it, etc. are all completely local. I'm not sure why this matters. Is your IT department going to say, "Well, this looks fishy—but some random guy on the internet says it's safe, so forget our policies, we'll allow it"? But, for what it's worth, you have this random guy on the internet saying it's safe.
However, Pandas can be used to make network requests. Some of the IO functions can take a URL instead of a filename or file object. Some of them can also use another library that does so—e.g., if you have lxml installed, read_html, will pass the filename to lxml to open, and if that filename is an HTTP URL, lxml will go fetch it.
This is rarely a concern, but if you want to get paranoid, you could imagine ways in which it might be.
For example, let's say your program is parsing user-supplied CSV files and doing some data processing on them. That's safe; there's no network access at all.
Now you add a way for the user to specify CSV files by URL, and you pass them into read_csv and go fetch them. Still safe; there is network access, but it's transparent to the end user and obviously needed for the user's task; if this weren't appropriate, your company wouldn't have asked you to add this feature.
Now you add a way for CSV files to reference other CSV files: if column 1 is #path/to/other/file, you recursively read and parse path/to/other/file and embed it in place of the current row. Now, what happens if I can give one of your users a CSV file where, buried at line 69105, there's #http://example.com/evilendpoint?track=me (an endpoint which does something evil, but then returns something that looks like a perfectly valid thing to insert at line 69105 of that CSV)? Now you may be facilitating my hacking of your employees, without even realizing it.
Of course this is a more limited version of exactly the same functionality that's in every web browser with HTML pages. But maybe your IT department has gotten paranoid and clamped down security on browsers and written an application-level sniffer to detect suspicious followup requests from HTML, and haven't thought to do the same thing for references in CSV files.
I don't think that's a problem a sane IT department should worry about. If your company doesn't trust you to think about these issues, they shouldn't hire you and assign you to write software that involves scraping the web. But then not every IT department is sane about what they do and don't get paranoid about. ("Sure, we can forward this under-1024 port to your laptop for you… but you'd better not install a newer version of Firefox than 16.0…")

How to extract financial statements only from XBRL files using Arelle's Python API?

Somehow, with the broken documentation on Arelle's python API as of date, I managed to make the API work and successfully load an XBRL file.
Anyways, my question is:
How do I extract only the STATEMENTS from the XBRL file?
Below is a screenshot from Arelle's Windows App.
URL used in this example: https://www.sec.gov/Archives/edgar/data/101984/000010198416000062/ueic-20151231.xml
I tried experimenting with the API and here's my code
from arelle import Cntlr
xbrl = Cntlr.Cntlr().modelManager.load('https://www.sec.gov/Archives/edgar/data/101984/000010198416000062/ueic-20151231.xml')
for fact in xbrl.facts:
print(fact)
but after executing this snippet, I'm bombarded with these:
I tried getting the keys available per modelFact and its a mixture between contextRef, id, decimals and unitRef which is not helpful from what I want to extract. With no documentation to help further with this, I'm at a loss here. Can someone enlighten me on how to achieve extracting only the statements?
I am doing something similar and have so far had some progress which I can share:
Going through the python code files of arelle you can detect which properties you can access for the different classes such as ModelFact, ModelContext, ModelUnit etc.
To extract the individual data, you can for example put them in a panda dataframe as follows:
factData=pd.DataFrame(data=[(fact.concept.qname,
fact.value,
fact.isNumeric,
fact.contextID,
fact.context.isStartEndPeriod,
fact.context.isInstantPeriod,
fact.context.isForeverPeriod,
fact.context.startDatetime,
fact.context.endDatetime,
fact.unitID) for fact in xbrl.facts])
Now it is easier to work with all the data, filter those that you want to use etc. If you want to reproduce the statements tables, you will also need to incorporate the links for each of the facts and than order and sort, but I haven't gotten this far either.

Python and downloading Google Sheets feeds

I'm trying to download a spreadsheet from Google Drive inside a program I'm writing (so the data can be easily updated across all users), but I've run into a few problems:
First, and perhaps foolishly, I'm only wanting to use the basic python distribution, so I'm not requiring people to download multiple modules to run it. The urllib.request module seems to work well enough for basic downloading, specifically the urlopen() function, when I've tested it on normal webpages (more on why I say "normal" below).
Second, most questions and answers on here deal with retrieving a .csv from the spreadsheet. While this might work even better than trying to parse the feeds (and I have actually gotten it to work), using only the basic address means only the first sheet is downloaded, and I need to add a non-obvious gid to get the others. I want to have the program independent of the spreadsheet, so I only have to add new data online and the clients are automatically updated; trying to find a gid programmatically gives me trouble because:
Third, I can't actually get the feeds (interface described here) to be downloaded correctly. That does seem to be the best way to get what I want—download the overview of the entire spreadsheet, and from there obtain the addresses to each sheet—but if I try to send that through urlopen(feed).read() it just returns b''. While I'm not exactly sure what the problem is, I'd guess that the webpage is empty very briefly when it's first loaded, and that's what urlopen() thinks it should be returning. I've included what little code I'm using below, and was hoping someone had a way of working around this. Thanks!
import urllib.request as url
key = '1Eamsi8_3T_a0OfL926OdtJwLoWFrGjl1S2GiUAn75lU'
gid = '1193707515'
# Single sheet in CSV format
# feed = 'https://docs.google.com/spreadsheets/d/' + key + '/export?format=csv&gid=' + gid
# Document feed
feed = 'https://spreadsheets.google.com/feeds/worksheets/' + key + '/private/full'
csv = url.urlopen(feed).read()
(I don't actually mind publishing the key/gid, because I am planning on releasing this if I ever finish it.)
Requres OAuth2 or a password.
If you log out of google and try again with your browser, it fails (It failed when I did logged out). It looks like it requires a google account.
I did have it working with and application password a while ago. But I now use OAuth2. Both are quite a bit of messing about compared to CSV.
This sounds like a perfect use case for a wrapper library i once wrote. Let me know if you find it useful.

migrating data from tomcat .dbx files

I want to migrate data from an old Tomcat/Jetty website to a new one which runs on Python & Django. Ideally I would like to populate the new website by directly reading the data from the old database and storing them in the new one.
Problem is that the database I was given comes in the form of a bunch of WEB-INF/data/*.dbx and I didn't find any way to read them. So, I have a few questions.
Which format do the WEB-INF/data/*.dbx use?
Is there a python module for directly reading from the WEB-INF/data/*.dbx files?
Is there some external tool for dumpint the WEB-INF/data/*.dbx to an ascii format that will be parsable by python?
If someone has attempted a similar data migration, how does it compare against scraping the data from the old website? (assuming that all important data can be scraped)
Thanks!
The ".dbx" suffix has been used by various softwares over the years so it could be almost anything. The only way to know what you really have here is to browse the source code of the legacy java app (or the relevant doc or ask the author etc).
wrt/ scraping, it's probably going to be a lot of a pain for not much results, depending on the app.

Categories