Remove quotes inside XML text tag - python

I asked this question a couple of days ago; and am now facing an issue in some XMLs after iterating through all of them.
I have found that some values have quotes inside, like <restaurant> L'amour <\restaurant>, and when I try to parse it into a dictionary it generates an error because of the single quote character inside the value. Is there a way to add double quotes to, preferably, all the values inside the XML so that the error can be avoided and then remove the double quotes after the list of dictionaries is generated?
Or, perhaps there is another approach to this issue? Thank you very much.
Edit:
This is an example of the string I am having trouble with:
s1 = "{'uno': 'l'ebe'}"
ast.literal_eval(mydict(s1))
Throws the Invalid syntax error.

Have you tried replacing the value
value.replace("'", "\"")
Then you can revert that when displaying, or even, you can try replacing the single quote when saving into the dictionary, so its saved scaped.

Related

How to write Json/dict into file with pretty print along with single quotes?

I'm new to stack overflow. It's my first question here.
I've data (JSON/dict). I'm writing this to a file by prefixing it's corresponding dictionary name as below:
open(file_name,'w') as fh:
fh.write("dict_name = " + json.dumps(dict_values,indent=2))
This works totally fine, but what with double quotes. I need single quotes instead along with pretty print/some readable format.
I tried json.dumps(dict_values,indent=2).replace("\"","'"), but this replaces double quotes with single quotes but it's neither a good idea nor correct way to apply to dictionary. Could you please help me?

How can I load a string like this( {u'facebook': {u'identifier': u'http://www.facebook.com/71'}} ) into JSON

I got this string from another python crawler program.
{u'facebook': {u'identifier': u'http://www.facebook.com/71'}}
I have read the most questions about JSON. I know the question is about the single quote mark. but how do I convert it into double quote marks?
I have tried json.dump(), but it only add a pair of double quote marks of the string.
"{u'facebook': {u'identifier': u'http://www.facebook.com/71'}} "
I have also tried to use demjson, but the result is the same as above.
Actually I only need to the string behind the "identifier". How can I get that? Thanks in advance.
Your string "{u'facebook': {u'identifier': u'http://www.facebook.com/71'}} " doesn't look like JSON. First it has single quotes (as you already found out) and it also has unicode prefixes: u.
That string actually looks like valid python, so you can use the ast.literal_eval function to parse it into a dict
from ast import literal_eval
dictionary = literal_eval("{u'facebook': {u'identifier': u'http://www.facebook.com/71'}}"

Remove duplicates from a list in Python

I have a python script which parses an xml file and then gives me the required information. My output looks like this, and is 100% correct:
output = ['77:275,77:424,77:425,77:426,77:427,77:412,77:413,77:414,77:412,77:413,77:414,77:412,77:413,77:414,77:412,77:413,77:414,77:431,77:432,77:433,77:435,77:467,77:470,77:471,77:484,77:485,77:475,77:476,77:437,77:438,77:439,77:440,77:442,77:443,77:444,77:445,77:446,77:447,77:449,77:450,77:451,77:454,77:455,77:456,77:305,77:309,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:317,77:321,77:346,77:349,77:350,77:351,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:362,77:367,77:369,77:374,77:370,77:372,77:373,77:387,77:388,77:389,77:392,77:393,77:394,77:328,77:283,77:284,77:285,77:288,77:289,77:290,77:292,']
It is all fine, but I want to remove the duplicate elements in an element, like in the case above. I tried using the OrderedDict package or just simple list(set(output)), but obvoiusly they both didn't work. Does anyone have a tip for me on how to solve this problem.
You have one element in a list. If you expected it to be treated as separate elements, you need to explicitly split it.
You could split the string on the ',' comma character into a list with str.split():
separate_elements = output[0].split(',')
after which you can use set() (unordered) or OrderedDict (maintaining order) and re-join the string if you still need just the one string object:
','.join(set(separate_elements))
You can put that back into a list with just one element, but there is little point if all you ever handle is that one string.

Where to post code to be checked

I apologise if this is an obvious question. I would like to know where to post code to be error-checked in the future, as am teaching myself Python and am hitting stumbling blocks in my code constantly. It is usually blindingly obvious, as with the first dictionary error; for which I apologise.
original dictionary error sorted
Missed out quote marks on 2nd line of code
I am making a dictionary variable, but there appears to be a problem with it.
charAttr = {'Power':'5','Health':'5','Wisdom':'5','Dexterity':'5'}
basePow=int(charAttr[Power])
I am then given "NameError: name 'Power' is not defined."
Either use single quotes ('Power') or double quotes ("Wisdom") to make a string literal. Double quotes are not the same as two single quotes.
As to your more general question: StackOverflow is indeed a place for such things, but in general, you should provide more information with your question. The code you posted creates an error message: so you should post that error message. There's lots of information here on what makes a good question; I definitely recommend you read up on it.
Two single quote != one double quote.
So not
''
but:
"
or you can use single quote as... single quote :P
Correct form is:
charAttr = {'Power':'5','Health':'5','Wisdom':'5','Dexterity':'5'}
or
charAttr = {"Power":"5","Health":"5","Wisdom":"5","Dexterity":"5"}

Python CSV module - quotes go missing

I have a CSV file that has data like this
15,"I",2,41301888,"BYRNESS RAW","","BYRNESS VILLAGE","NORTHUMBERLAND","ENG"
11,"I",3,41350101,2,2935,2,2008-01-09,1,8,0,2003-02-01,,2009-12-22,2003-02-11,377016.00,601912.00,377105.00,602354.00,10
I am reading this and then writing different rows to different CSV files.
However, in the original data there are quotes around the non-numeric fields, as some of them contain commas within the field.
I am not able to keep the quotes.
I have researched lots and discovered the quoting=csv.QUOTE_NONNUMERIC however this now results in a quote mark around every field and I dont know why??
If i try one of the other quoting options like MINIMAL I end up with an error message regarding the date value, 2008-01-09, not being a float.
I have tried to create a dialect, add the quoting on the csv reader and writer but nothing I have tried results in the getting an exact match to the original data.
Anyone had this same problem and found a solution.
When writing, quoting=csv.QUOTE_NONNUMERIC keeps values unquoted as long as they're numbers, ie. if their type is int or float (for example), which means it will write what you expect.
Your problem could be that, when reading, a csv.reader will turn every row it reads into a list of strings (if you read the documentation carefully enough, you'll see a reader does not perform automatic data type conversion!
If you don't perform any kind of conversion after reading, then when you write you'll end up with everything on quotes... because everything you write is a string.
Edit: of course, date fields will be quoted, because they are not numbers, meaning you cannot get the exact expected behaviour using the standard csv.writer.
Are you sure you have a problem? The behavior you're describing is correct: The csv module will enclose strings in quotes only if it's necessary for parsing them correctly. So you should expect to see quotes only around strings containing a comma, newlines, etc. Unless you're getting errors reading your output back in, there is no problem.
Trying to get an "exact match" of the original data is a difficult and potentially fruitless endeavor. quoting=csv.QUOTE_NONNUMERIC put quotes around everything because every field was a string when you read it in.
Your concern that some of the "quoted" input fields could have commas is usually not that big a deal. If you added a comma to one of your quoted fields and used the default writer, the field with the comma would be automatically quoted in the output.

Categories