extracting data from several xml-files with python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I just started learing python for my new job, so everything is quite difficult to me, even if the task sounds pretty straight forward.
I would like to extract several nodes from multiple xml-files, at best putting the information into an excel file in the end. Every row should contain the information from one xml-file, the columns should represent the specific nodes I am looking for, like "Zip-code" "town". Not all xml-files contain all nodes, so it would be perfect, if node "Zip-code" doesnt exist it just leaves the cell blank.
Could someone please point out a few hints how to start with this or, this is also possible, a special programm, which is easy to learn and use? My company and me only need to do it once for about 2000 files.
Thank you very much =)

For opening the files and getting their contents, you can use the Python functions: Documentation.
For XML parsing, I always use Beautiful Soup. It's a HTML/XML parser with good documentation that mostly "just works".
For creating the Excel file, you can use Xlsxwriter.

Related

How can I implement a word to PDF conversion in python without importing any libraries? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
First time poster here. I'm trying to convert one or multiple .docx files to PDF but I can't figure out how to do it without importing any libraries/modules aside from what is available in python 3.3.
I've read through the packages documentation but nothing stuck out as a solution. I also don't know what I am looking for as I am pretty new to python. I found plenty of articles and resources that mention how to do it with an imported library, but not without.
Is it possible to accomplish this without importing a library?
Any advice/resources are welcome.
Code it from scratch. If you're not going to use an external library, that is by definition pretty much your only option.
You'll want to become an expert in the formal specifications for both PDF
and MS Word. Given the complexity and history of each of those, I expect a senior developer will want 6-12 months of experience with each to obtain the necessary understanding.
You should also have 6-12 months' experience with Python, since you'll likely need to be familiar with the language in order to define and use all the functions you'll need. But in just a few years of dedication, you should be able to write the necessary code.
MORE REALISTICALLY, import Python libraries for managing PDFs and MS Word. That should only take a week or two.

Is Python a suitable tool for automating data scraping? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am working on a project which involves working with a large amount of data. Essentially, there exists a large repository on some website of excel files that can be downloaded. The site has several different lists of filters and I have several different parameters I am filtering and then collecting data from. Overall, this process requires me to download upwards of 1,000+ excel files and copy and paste them together.
Does Python have the functionality to automate this process? Essentially what I am doing is setting Filter 1 = A, Filter 2 = B, Filter 3 = C, download file, and then repeat with different parameters and copy and paste files together. If Python is suitable for this, can anyone point me in the direction of a good tutorial or starting point? If not, what language would be more suitable for this for someone with little background?
Thanks!
Personally I would prefer to use python for this. I would look in particular at the Pandas library that is a powerful data analysis library that has a dataframe object that can be used like a headless Spreadsheet. I use it for a small number of spreadsheets and it's been very quick. Perhaps take a look at this person's website for more guidance. https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/
I'm not 100% if your question was only about spreadsheets and my first paragraph was really about working on the files once you have downloaded them, but if you're interested in actually fetching the files or 'scraping' the data you can look at the Requests library for the http side of things - this might be what you could use if there is Restful way of doing things. Or, look at scrapy https://scrapy.org for web scraping.
Sorry if I misunderstood in parts.

How do large static sites make their content effectively searchable? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
One of the most popular tools to generate static sites is Sphinx which is largely used in the Python community to document code. It converts .rst files into other formats like HTML, PDF and others. But how is it possible that a static documentation with plain HTML files is searchable without losing performance?
I guess, it's done by creating an index (like a JSON file for example) that will be loaded via AJAX and is interpreted by something like lunr.js. Since many major projects in the world of Python have a huge documentation (like the Python docs itself). Therefore, how is it possible, to create such a good search without creating a gigantic index file that needs to be loaded?
You can use Google Search Engine to use Google´s power on your site. It is difficult to customize yet powerful. Other reference in this question

Python convention for separating files [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Im not positive on the terminology here, so that may explain why searching on my own yielded no results.
I was just curious if there is a widely accepted, general method to writing a module in python. Obviosuly people prefer splitting things into segmented .py scripts, importing when needed, and packing it all into a folder.
What I want to know: Is there a general method to how/when/why we stop writing things together in one .py and begin a new one (And i mean other than obvious things like... one script .py for the main job, and then a preferences.py to handle reading/writing prefs)
You should split your code into multiple modules when it begins to be unwieldy to keep it all in one module. This is to some extent a matter of taste. Note that it may unwieldy for the code author (i.e., file is too big to navigate easily) or for the user of the library (e.g., too many unrelated functions/classes jammed together in the same namespace, hard to keep track of them).

what language do I need to write macros in LIbre Office Calc? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've written a bunch of VBA code for various things in Excel. I'm looking at migrating to libreOffice. Under Tool->Macros->Organize Macros: the two choices are LibreOffice Basic and Python.
Should I learn one of those, both, or something else. Am I wasting my time altogether? Any suggestions appreciated.
Python is the way to go.
Start here: http://wiki.python.org/moin/BeginnersGuide
And no, you're not wasting time.
You'll look back and say, why didn't I do it sooner.
Python's a great skill to learn - I use it for everything. It's the glue language for virtually every tool out there (you can even use it with .Net).
Documentation for Python + LibreOffice is however a bit sketchy currently, although I don't have much experience with Calc.
There is some work-in-progress documentation at http://documenthacker.wordpress.com (or soon www.documenthacker.com). It has examples for working with Writer, rather than Calc, but you might still find it useful.

Categories