LOAD XML INFILE save nested childs as plain

LOAD XML INFILE save nested childs as plain - python

I did my research on the internet and it seems, that LOAD XML INFILE could not save nested childs with same names or simply with different names.
imported XML sample here
But is there any option, which could be used to keep whole content in parent as plaintext? Its not problem for me after that to parse that content line by line.
Please do not tell me I need to parse it with PHP, it fails in case of speed and I have many XMLs I need to load, so terminal is best solution for me.
So if there is for example some kind of shell or python script (in case that its not possible to import it as plain).
Thanks in advance

Thank you all for correcting grammar mistakes, its very useful and you should earn another badge for helping to community.
Since nobody came up with solution, I did following, which helped me:
1) create file script.py with this contents
#!/usr/bin/python3
# coding: utf-8
import os
import sys
import fileinput
replacements = {'<Image>':'', '</Image>':';',' ':'','\n':''}
with open('/var/www/html/XX/data/xml/products.xml') as infile, open('/var/www/html/XXX/data/xml/products_clean.xml', 'w') as outfile:
for line in infile:
for src, target in replacements.iteritems():
line = line.replace(src, target)
outfile.write(line)
2) run it through terminal
python /var/www/html/script.py
3) then you load XML infile that XML to your mysql as usual, or you can transform that column into json for better use

Related

Need a push to start with a function about text files, I can't figure this out on my own

I don't need the entire code but I want a push to help me on the way, I've been searching on the internet for clues on how to start to write a function like this but I haven't gotten any further then just the name of the function.
So I haven't got the slightest clue on how to start with this, I don't know how to work with text files. Any tips?

These text files are CSV (Comma Separated Values). It is a simple file format used to store tabular data.
You may explore Python's inbuilt module called csv.
Following code snippet an example to load .csv file in Python:
import csv
filename = 'us_population.csv'
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)

python cannot open and edit a .reg file

I am trying to edit a .reg file in python to replace strings in a file. I can do this for any other file type such as .txt.
Here is the python code:
with open ("C:/Users/UKa51070/Desktop/regFile.reg", "r") as myfile:
data=myfile.read()
print data
It returns an empty string

I am not sure why you are not seeing any output, perhaps you could try:
print len(data)
Depending on your version of Windows, your REG file will be saved using UTF-16 encoding, unless you specifically export it using the Win9x/NT4 format.
You could try using the following script:
import codecs
with codecs.open("C:/Users/UKa51070/Desktop/regFile.reg", encoding='utf-16') as myfile:
data = myfile.read()
print data

It's probably not a good idea to edit .reg files manually. My suggestion is to search for a Python package that handles it for you. I think the _winreg Python built-in library is what you are looking for.

How do I use ParaView's CSVReader in a Python Script?

How do I use ParaView's CSVReader in a Python Script? An example would be appreciated.

If you have a .csv file that looks like this:
x,y,z,attribute
0,0,0,0
1,0,0,1
0,1,0,2
1,1,0,3
0,0,1,4
1,0,1,5
0,1,1,6
1,1,1,7
then you can import it with a command that looks like this:
myReader = CSVReader(FileName='C:\foo.csv', guiName='foo.csv')
Also, if you don't add that guiName parameter, you can change the name later using the RenameSource command like this:
RenameSource(proxy = myReader, newName = 'MySuperNewName'
Credit for the renaming part of this answer to Sebastien Jourdain.

Unfortunately, I don't know Paraview at all. But I found "... simply record your work in the desktop application in the form of a python script ..." at their site. If you import a CSV like that, it might give you a hint.

Improving the #GregNash's answer. If you want to include only a single file (called foo.csv):
outcsv = CSVReader(FileName= 'foo.csv')
Or if you want to include all files with certain pattern use glob. For example if files start with string foo (aka foo.csv.0, foo.csv.1, foo.csv.2):
myreader = CSVReader(FileName=glob.glob('foo*'))
To use glob is neccesary import glob in the preamble. In general in Filename you could work with strings generated with python which could contain more complex pattern files and file's path.

Reading command Line Args

I am running a script in python like this from the prompt:
python gp.py /home/cdn/test.in..........
Inside the script i need to take the path of the input file test.in and the script should read and print from the file content. This is the code which was working fine. But the file path is hard coded in script. Now I want to call the path as a command line argument.
Working Script
#!/usr/bin/python
import sys
inputfile='home/cdn/test.in'
f = open (inputfile,"r")
data = f.read()
print data
f.close()
Script Not Working
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
data = argv[1:].read()
print data
f.close()
What change do I need to make in this ?

While Brandon's answer is a useful solution, the reason your code is not working also deserves explanation.
In short, a list of strings is not a file object. In your first script, you open a file and operate on that object (which is a file object.). But writing ['foo','bar'].read() does not make any kind of sense -- lists aren't read()able, nor are strings -- 'foo'.read() is clearly nonsense. It would be similar to just writing inputfile.read() in your first script.
To make things explicit, here is an example of getting all of the content from all of the files specified on the commandline. This does not use fileinput, so you can see exactly what actually happens.
# iterate over the filenames passed on the commandline
for filename in sys.argv[1:]:
# open the file, assigning the file-object to the variable 'f'
with open(filename, 'r') as f:
# print the content of this file.
print f.read()
# Done.

Check out the fileinput module: it interprets command line arguments as filenames and hands you the resulting data in a single step!
http://docs.python.org/2/library/fileinput.html
For example:
import fileinput
for line in fileinput.input():
print line

In the script that isn't working for you, you are simply not opening the file before reading it. So change it to
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
f = open(argv[1:], "r")
data = f.read()
print data
f.close()
Also, f.close() this would error out because f has not been defined. The above changes take care of it though.
BTW, you should use at least 3 chars long variable names according to the coding standards.

Error with urlopen: new-line character seen in unquoted field

I am using urllib.urlopen with Python 2.7 to read csv files located on an external webserver:
# Try & Except statements removed for clarity
import urllib
import csv
url = ...
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
do_something()
All 100+ files can be read fine, except one that has been updated recently and that returns:
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
The file is accessible here. According to my text editor, its mode is Mac (CR), as opposed to Windows (CRLF) for the other files.
I found that based on this thread, python urlopen will handle correctly all formats of newlines. Therefore, the problem is likely to come from somewhere else. I have no clue though. The file opens fine with all my text editors and my speadsheet editors.
Does any one have any idea how to diagnose the problem ?
* EDIT *
The creator of the file informed me by email that I was not the only one to experience such issues. Therefore, he decided to make it again. The code above now works fine again. Unfortunately, using a new file also means that the issue can no longer be reproduced, and the solutions tested properly.
Before closing the question, I want to thank all the stackers who dedicated some of their time to figure out a solution and post it here.

It might be a corrupt .csv file? Otherwise, this code runs perfectly.
#!/usr/bin/python
import urllib
import csv
url = "http://www.football-data.co.uk/mmz4281/1213/I1.csv"
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
print row
Credits to J.F. Sebastian for the .csv file.
Altough, you might want to consider sharing the specific .csv file with us? So we can try to re-create the error.

The following code runs without any error:
#!/usr/bin/env python
import csv
import urllib2
r = urllib2.urlopen('http://www.football-data.co.uk/mmz4281/1213/I1.csv')
for row in csv.reader(r):
print row

I was having the same problem with a downloaded csv.
I know the fix would be to use open with 'rU'. But I would rather not have to save the file to disk, just to open back up into a variable. That seems unnecessary.
file = open(filepath,'rU')
mydata = csv.reader(file)
So if someone has a better solution that would be nice. Stackoverflow links that got me this far:
CSV new-line character seen in unquoted field error
Open the file in universal-newline mode using the CSV Django module
I found what I actually wanted with stringIO, or cStringIO, or io:
Using Python, how do I to read/write data in memory like I would with a file?
I ended up getting io working,
import csv
import urllib2
import io
# warning its a 20MB csv
url = 'http://poweredgec.com/latest_poweredge-11g.csv'
urlRead = urllib2.urlopen(url).read()
ramFile = io.open(urlRead, mode='w')
openRamFile = open(ramFile, 'rU')
csvCurrent = csv.reader(openRamFile)
csvTuple = map(tuple, csvCurrent)
print csvTuple

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

LOAD XML INFILE save nested childs as plain - python

Related

Need a push to start with a function about text files, I can't figure this out on my own

python cannot open and edit a .reg file

How do I use ParaView's CSVReader in a Python Script?

Reading command Line Args

Error with urlopen: new-line character seen in unquoted field

Categories

Resources