migrating data from tomcat .dbx files - python

I want to migrate data from an old Tomcat/Jetty website to a new one which runs on Python & Django. Ideally I would like to populate the new website by directly reading the data from the old database and storing them in the new one.
Problem is that the database I was given comes in the form of a bunch of WEB-INF/data/*.dbx and I didn't find any way to read them. So, I have a few questions.
Which format do the WEB-INF/data/*.dbx use?
Is there a python module for directly reading from the WEB-INF/data/*.dbx files?
Is there some external tool for dumpint the WEB-INF/data/*.dbx to an ascii format that will be parsable by python?
If someone has attempted a similar data migration, how does it compare against scraping the data from the old website? (assuming that all important data can be scraped)
Thanks!

The ".dbx" suffix has been used by various softwares over the years so it could be almost anything. The only way to know what you really have here is to browse the source code of the legacy java app (or the relevant doc or ask the author etc).
wrt/ scraping, it's probably going to be a lot of a pain for not much results, depending on the app.

Related

How do I automatically replace a large .xml database file from a website with a newer file each month?

I am working on a project right now that uses data from a large xml database file (usually like 8gb) pulled from a website. The website updates this database file monthly, so every month, there's a newer and more accurate database file.
I started my project about a year ago, so it is using a database file from February 2019. For the sake of people using my program, I would like for the database file to be replaced with the new one from each month when that gets rolled out.
How could I go about implementing this in my project so I don't have to manually go and replace the file with a newer one each month? Is it something I should write into the program? But, if that's the case, it would only update when the program is ran. Or, is there a way to have some script do this that automatically checks once a month?
Note: this project is not being used by people yet, it has got a long way to go, but I am trying to figure out how to implement these features earlier on before I get to a point where I can publish it.
I would first find out if there is an API built on top of that XML data that you could leverage, instead of downloading the XML into your own website. That way you always get the latest version of the data, since you're pulling it on-demand.
However, an on-demand integration wouldn't be a good idea if you would be hitting the API with any kind of heavy frequency, or if you would be pulling large datasets from said API. In that case, you need an ETL integration. Look into open-source ETL tools (just Google it) to help move that data in an automated fashion; I would recommend importing the XML into MongoDB or some other DB, and pull the data from there instead of reading it from a flat file.
And if you absolutely have to have it as a flat file, looking into using Gatsby; it's a framework for static websites that need to be reconstituted every once in a while.

Extract webscraped python data to SQLite, excel or xml?

I'm kinda new to Python and webscraping but I'm currently at a point where I need to extract data to a database. Can someone tell me the pros and cons by using sqlite, excel or xml?
I've read that sqlite should be the fastest, so I may go for that database structure, but can someone then tell me what IDE you use to handle sqlite data after I've extracted it from python?
Edit: I hope my post makes sense. I'm currently trying to use a web scraper from here: https://github.com/gingeleski/odds-portal-scraper
Thanks in advance.
For the short term, Excel is a good way to examine your data and prototype analysis and visualizations. It gets old using it for very large datasets, or multiple similar datasets. Basically as soon as you start doing the same thing more than twice or writing VB code you should switch to the pandas/matplotlib solution.
It looks like the scraper you are using already puts the results in an SQLITE database, but if you have your data in a list or dictionary, I'd suggest using pandas to do calculations and matplotlib for visualizations, as that will give you a robust, extensible solution over the long term. It is very easy to read and write data between an SQLITE database and pandas.
A good way of viewing the data in the DB is a must. I'm currently using SQLiteStudio.
When you say IDE, I'm assuming you're looking for a way to view the SQLite data? If so, DBeaver is a free, open source SQL client. You could use this to view the data quite easily.

Python and creating Excel-like tables of data

I am relatively new to (Python) programming and would like some direction for my current project.
I have been learning to web scrape and parse data. I am now sitting on a program that can create lots of information in the form of lists and dictionaries.
I now need a way to create formatted tables and output to some sort of web-based system. I am looking at tables of about 10-40 rows and 20 columns of alphanumeric (names and numbers) data. If I can produce basic bar/line charts that would be useful. It also needs to be entirely automated - the program will run once a day and download information. I need it to output seamlessly in report form to something like dropbox that I can access on-the-go. The table template will always be the same and will be heavily formatted (colouring mostly, like Excel conditional formatting).
I am also keen to learn to create web apps and I'm wondering if this is something I can do with this project? I'm not sure what I'll need to do and I'd like some direction. I'm pretty new to programming and the jargon is tough to wade through. Is it possible to create a website that takes automated input and creates good-looking data tables? Is this efficient? What are the available tools for this? If not efficient what are the other available options?

Use Python to load data into Mysql

is it possible to set up tables for Mysql in Python?
Here's my problem, I have bunch of .txt files which I want to load into Mysql database. Instead of creating tables in phpmyadmin manually, is it possible to do the following things all in Python?
Create table, including data type definition.
Load many files one by one. I only know this LOAD DATA LOCAL INFILE command to load one file.
Many thanks
Yes, it is possible, you'll need to read the data from the CSV files using CSV module.
http://docs.python.org/library/csv.html
And the inject the data using Python MySQL binding. Here is a good starter tutorial:
http://zetcode.com/databases/mysqlpythontutorial/
If you already know python it will be easy
It is. Typically what you want to do is use an Object-Retlational Mapping library.
Probably the most widely used in the python ecosystem is SQLAlchemy, but there is a lot of magic going on in it, so if you want to keep a tighter control on your DB schema, or if you are learning about relational DB's and want to follow along what the code does, you might be better off with something lighter like Canonical's storm.
EDIT: Just thought to add. The reason to use ORM's is that they provide a very handy way to manipulate data / interface to the DB. But if all you will ever want to do is to do a script to convert textual data to MySQL tables, than you might get along with something even easier. Check the tutorial linked from the official MySQL website, for example.
HTH!

want to add url links to .csv datafeed using python

ive looked through the current related questions but have not managed to find anything similar to my needs.
Im in the process of creating a affiliate store using zencart - now one of the issues is that zencart is not designed for redirects and affiliate stores but it can be done. I will be changing the store so it acts like a showcase store showing prices.
There is a mod called easy populate which allows me to upload datafeeds. This is all well and good however my affiliate link will not be in each product. I can do it manually after uploading the data feed and going to each product and then adding it as an image with a redirect link - However when there are over 500 items its going to be a long repetitive and time consuming job.
I have been told that I can add the links to the data feed before uploading it to zencart and this should be done using python. Ive been reading about python for several days now and feel im looking for the wrong things. I was wondering if someone could please advise the simplest way for me to get this done.
I hope the question makes sense
thanks
abs
You could craft a python script using csv module like this:
>>> import csv
>>> cartWriter = csv.writer(open('yourcart.csv', 'wb'))
>>> cartWriter.writerow(['Product', 'yourinfo', 'yourlink'])
You need to know how link should be formatted hoping that it could be composed using the other parameters present on csv file.
First, use the CSV module as systempuntoout told you, secondly, you will want to change your header to:
mimetype='text/csv'
Content-Disposition = 'attachment; filename=name_of_your_file.csv'
The way to do it depends very much of your website implementation. In pure Python you would probably do that with an HttpResponse object. In django, as well, but there are some shortcuts.
You can find a video demonstrating how to create CSV files with Python on showmedo. It's not free however.
Now, to provide a link to download the CSV, this depends of your Website. What is the technology behinds it : pure Python, Django, Pylons, Tubogear ?
If you can't answer the question, you should ask your boss a training about your infrastructure before trying to make change to it.

Categories