Extract webscraped python data to SQLite, excel or xml? - python

I'm kinda new to Python and webscraping but I'm currently at a point where I need to extract data to a database. Can someone tell me the pros and cons by using sqlite, excel or xml?
I've read that sqlite should be the fastest, so I may go for that database structure, but can someone then tell me what IDE you use to handle sqlite data after I've extracted it from python?
Edit: I hope my post makes sense. I'm currently trying to use a web scraper from here: https://github.com/gingeleski/odds-portal-scraper
Thanks in advance.

For the short term, Excel is a good way to examine your data and prototype analysis and visualizations. It gets old using it for very large datasets, or multiple similar datasets. Basically as soon as you start doing the same thing more than twice or writing VB code you should switch to the pandas/matplotlib solution.
It looks like the scraper you are using already puts the results in an SQLITE database, but if you have your data in a list or dictionary, I'd suggest using pandas to do calculations and matplotlib for visualizations, as that will give you a robust, extensible solution over the long term. It is very easy to read and write data between an SQLITE database and pandas.
A good way of viewing the data in the DB is a must. I'm currently using SQLiteStudio.

When you say IDE, I'm assuming you're looking for a way to view the SQLite data? If so, DBeaver is a free, open source SQL client. You could use this to view the data quite easily.

Related

Python and creating Excel-like tables of data

I am relatively new to (Python) programming and would like some direction for my current project.
I have been learning to web scrape and parse data. I am now sitting on a program that can create lots of information in the form of lists and dictionaries.
I now need a way to create formatted tables and output to some sort of web-based system. I am looking at tables of about 10-40 rows and 20 columns of alphanumeric (names and numbers) data. If I can produce basic bar/line charts that would be useful. It also needs to be entirely automated - the program will run once a day and download information. I need it to output seamlessly in report form to something like dropbox that I can access on-the-go. The table template will always be the same and will be heavily formatted (colouring mostly, like Excel conditional formatting).
I am also keen to learn to create web apps and I'm wondering if this is something I can do with this project? I'm not sure what I'll need to do and I'd like some direction. I'm pretty new to programming and the jargon is tough to wade through. Is it possible to create a website that takes automated input and creates good-looking data tables? Is this efficient? What are the available tools for this? If not efficient what are the other available options?

Using Data in MongoDB through Pymongo(Python)

I am interested in but am ignorant to the best method of extracting quantitative data fast and efficiently that I have inserted into MongoDB.
I will explain my process. I used MongoDB to hold a variety of quantiative data that I inserted from multiple .log files.
Now that the information is inserted, I would like to extract certain data through queries, format it into an array, and display it in a form of a GUI (matplotlib).
I am confused on how to go about the best method of extracting the data. Thank you.
There are some good tutorials on using the python and mongodb such as this http://api.mongodb.org/python/current/tutorial.html
Also there is some more information on SO on matplotplib with mongodb such as this one Mongodb data statistics visualization using matplotlib
It's probably better to start trying some things and then asking specific questions on SO when you get stuck.
use the standard query API
use Map-Reduce
use the new MongoDB aggregation framework

migrating data from tomcat .dbx files

I want to migrate data from an old Tomcat/Jetty website to a new one which runs on Python & Django. Ideally I would like to populate the new website by directly reading the data from the old database and storing them in the new one.
Problem is that the database I was given comes in the form of a bunch of WEB-INF/data/*.dbx and I didn't find any way to read them. So, I have a few questions.
Which format do the WEB-INF/data/*.dbx use?
Is there a python module for directly reading from the WEB-INF/data/*.dbx files?
Is there some external tool for dumpint the WEB-INF/data/*.dbx to an ascii format that will be parsable by python?
If someone has attempted a similar data migration, how does it compare against scraping the data from the old website? (assuming that all important data can be scraped)
Thanks!
The ".dbx" suffix has been used by various softwares over the years so it could be almost anything. The only way to know what you really have here is to browse the source code of the legacy java app (or the relevant doc or ask the author etc).
wrt/ scraping, it's probably going to be a lot of a pain for not much results, depending on the app.

streamlining spreadsheet to DB copying Python

A friend of mine has asked me to write a quick Python script.
He has a small SQLite database (3 tables) and has to copy a bunch of data to it from an excel spreadsheet. The spreadsheet only has 2 data fields but alot of rows of data.
He asked if I would write a quick Python script to transfer the ss data to the db to so he doesn't have to spend a crapload of time manually copying it. I told him that I would do my best.
My question is: Where do I start? what do I need to research for this? Does anyone know if there is a pre-existing module to do this? Im trying to research this myself, but haven't come up with anything concrete yet and am not sure of any other search terms to use to narrow down my search.
Im just hoping someone wont mind giving my some guidance in the right direction.
Blessings and thanks
F
Do you really need Python?
SQLite Database Browser is a freeware, public domain, open source visual tool used to create, design and edit database files compatible with SQLite. Controls and wizards are available for users to:
Import and export tables from/to CSV files
If the file is simple enough (no commas in the content), you can import directly from SQLite:
For simple CSV files, you can use the SQLite shell to import the file into your SQLite database.
If you need to import a complex CSV file and the SQLite shell doesn't handle it, you may want to try a different front end, such as SQLite Database Browser.

Use Python to load data into Mysql

is it possible to set up tables for Mysql in Python?
Here's my problem, I have bunch of .txt files which I want to load into Mysql database. Instead of creating tables in phpmyadmin manually, is it possible to do the following things all in Python?
Create table, including data type definition.
Load many files one by one. I only know this LOAD DATA LOCAL INFILE command to load one file.
Many thanks
Yes, it is possible, you'll need to read the data from the CSV files using CSV module.
http://docs.python.org/library/csv.html
And the inject the data using Python MySQL binding. Here is a good starter tutorial:
http://zetcode.com/databases/mysqlpythontutorial/
If you already know python it will be easy
It is. Typically what you want to do is use an Object-Retlational Mapping library.
Probably the most widely used in the python ecosystem is SQLAlchemy, but there is a lot of magic going on in it, so if you want to keep a tighter control on your DB schema, or if you are learning about relational DB's and want to follow along what the code does, you might be better off with something lighter like Canonical's storm.
EDIT: Just thought to add. The reason to use ORM's is that they provide a very handy way to manipulate data / interface to the DB. But if all you will ever want to do is to do a script to convert textual data to MySQL tables, than you might get along with something even easier. Check the tutorial linked from the official MySQL website, for example.
HTH!

Categories