I have to extract from several sources and save the result to csv files. When the data received for a field is None, I want it represented as empty string. (I realize I'll lose the distinction between empty string and NULL-type values in the source.)
My underlying requirement is to preserve the distinction between the string "None" and nothingness in a plain delimited file, without using quotes.
I hope to avoid calling a function that checks if a value is null for every nullable field I write, or at least have that call not explicit in the code... e.g., I want to just code f.write(row['LastName'], and if the LastName is "None" get "None", but if it is of NoneType get an empty string.
I haven't yet investigated if the comes-with csv library can do what I need, I will do that, it seems likely that's the easiest approach.
But: is there anything I can override, so that if I write None to a file, I get empty string (or something besides the string "None" in the output file?
It seems to me I'd have to change either 1) the built-in write method of _io.TextIOWrapper or 2) the __str__ method of the NoneType.
if row['LastName'] is None will check if your value is a NoneType
Something like:
if row['LastName'] is None:
f.write("")
else:
f.write(row['LastName'])
Would get the job done.
Related
I'm working with an external compiled API, that does some XML manipulation prior to screen rendering. I am passing data to and from the API, and I want it to break cleanly so I can see where it is broken. This means I need to be able to send it text, rather Pythons internal formatting for types.
So I need to debug what types I'm sending and recieving. type() hands back xml?, and isinstance() would require testing every possible type.
So is there an alternative that will give me the stringified type that is suitable for inline evaluation?
Example:
mystring = str("")
print (type(mystr))
returns <class "str">
If I am passing angle brackets into a binary API, I have no idea what it is doing with the data, or whether it will render in the UI at all. It does its own string parsing based on code I don't have, or even want to know.
The only part I care about is: "str".
So I have:
mystr = str("")
blackbox.String(mystr)
Where the contents and even type of mystr are unknown. (the string value is also handed to me by a blackbox) The API is rendering nothing. Though I don't know whether the fact that it is rendering nothing is because of bad string formatting, or a bad type, or because there is a string, but it is empty. I know it is SUPPOSED to be a text string with length. But I don't know if it is. If I use isinstance I have to know what I'm testing for. Which tells me nothing if I'm getting something weird. If I use type() I am sending something weird to the rending engine. So I am screwed both ways.
What I need is:
blackbox.String(type(mystr))
where the result of type is a plain ascii type name without punctuation, so that I can reasonably assess that the black box is giving and getting a plain, untainted text string.
The angle brackets are a representational convention of the output of type() rather xml as such.
The output of type() is an object, so you can access it's __name__ attribute to get the name of the type.
>>> type('')
<class 'str'>
>>> type('').__name__
'str'
In a dictionary, I have the following value with equals signal:
{"appVersion":"o0u5jeWA6TwlJacNFnjiTA=="}
To be explicit, I need to replace the = for the unicode representation '\u003d' (basically the reverse process of [json.loads()][1]). How can I set the unicode value to a variable without store the value with two scapes (\\u003d)?.
I've tryed of different ways, including the enconde/decode, repr(), unichr(61), etc, and even searching a lot, cound't find anything that does this, all the ways give me the following final result (or the original result):
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
Since now, thanks for your attention.
EDIT
When I debug the code, it gives me the value of the variable with 2 escapes. The program will get this value and use it to do the following actions, including the extra escape. I'm using this code to construct a json by the json.dumps() and the result returned is a unicode with 2 escapes.
Follow a print of the final result after the JSON construction. I need to find a way to store the value in the var with just one escape.
I don't know if make difference, but I'm doing this to a custom BURP Plugin, manipulating some selected requests.
Here is an image of my POC, getting the value of the var.
The extra backslash is not actually added, The Python interpreter uses the repr() to indicate that it's a backslash not something like \t or \n when the string containing \ gets printed:
I hope this helps:
>>> t['appVersion'] = t["appVersion"].replace('=', '\u003d')
>>> t['appVersion']
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
>>> print(t['appVersion'])
o0u5jeWA6TwlJacNFnjiTA\u003d\u003d
>>> t['appVersion'] == 'o0u5jeWA6TwlJacNFnjiTA\u003d\u003d'
True
I'm trying to insert a unix timestamp using REST to a webservice. And when I convert the dictionary I get the value: 1392249600000L I need this value to be an integer.
So I tried int(1392249600000L) and I get 1392249600000L, still a long value.
The reason I need this is because the JSON webservice only accepts timestamsp with milliseconds in them, but when I pass the JSON value with the 'L' in it I get an invalid JSON Primative of value 1392249600000L error.
Can someone please help me resolve this? It seems like it should be so easy, but it's driving me crazy!
You should not be using Python representations when you are sending JSON data. Use the json module to represent integers instead:
>>> import json
>>> json.dumps(1392249600000L)
'1392249600000'
In any case, the L is only part of the string representation to make debugging easier, making it clear you have a long, not int value. Don't use Python string representations for network communications, in any case.
For example, if you have a list of Python values, the str() representation of that list will also use repr() representations of the contents of the list, resulting in L postfixes for long integers. But json.dumps() handles such cases properly too, and handle other types correctly too (like Python None to JSON null, Python True to JSON true, etc.):
>>> json.dumps([1392249600000L, True, None])
'[1392249600000, true, null]'
I am using python and XMLBuilder, a module I downloaded off the internet (pypi). It returns an object, that works like a string (I can do print(x)) but when I use file.write(x) it crashes and throws an error in the XMLBuilder module.
I am just wondering how I can convert the object it returns into a string?
I have confirmed that I am writing to the file correctly.
I have already tried for example x = y although, as I thought, it just creates a pointer, and also x=x+" " put I still get an error. It also returns an string like object with "\n".
Any help on the matter would be greatly appreciated.
file.write(str(x))
will likely work for you.
Background information: Most types have a function __str__ or __repr__ (or both) defined. If you pass an object of such a type to print, it'll recognize that you did not pass a str and try to call one of these functions in order to convert the object to a string.
However, not all functions are as smart as print and will fail if you pass them something that is not a string. Also string concatenation does not work with mixed types. To work with these functions you'll have to convert the non-string-type objects manually, by wrapping them with str(). So for example:
x = str(x)+" "
This will create a new string and assign it to the variable x, which held the object before (you lose that object now!).
The Library has __str__ defined:
def __str__(self):
return tostring(~self, self.__document()['encoding'])
So you just need to use str(x):
file.write(str(x))
I'm not quite sure what your question is, but print automatically calls str on all of it's arguments ... So if you want the same output as print to be put into your file, then myfile.write(str(whatever)) will put the same text in myfile that print (x) would have put into the file (minus a trailing newline that print puts in there).
When you write:
print myObject
The __repr__ method is actually called.
So for example you could do: x += myXMLObject.__repr__() if you want to append the string representation of that object to your x variable.
I have a dictionary of dictionaries in Python:
d = {"a11y_firesafety.html":{"lang:hi": {"div1": "http://a11y.in/a11y/idea/a11y_firesafety.html:hi"}, "lang:kn": {"div1": "http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn}}}
I have this in a JSON file and I encoded it using json.dumps(). Now when I decode it using json.loads() in Python I get a result like this:
temp = {u'a11y_firesafety.html': {u'lang:hi': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:hi'}, u'lang:kn': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn'}}}
My problem is with the "u" which signifies the Unicode encoding in front of every item in my temp (dictionary of dictionaries). How to get rid of that "u"?
Why do you care about the 'u' characters? They're just a visual indicator; unless you're actually using the result of str(temp) in your code, they have no effect on your code. For example:
>>> test = u"abcd"
>>> test == "abcd"
True
If they do matter for some reason, and you don't care about consequences like not being able to use this code in an international setting, then you could pass in a custom object_hook (see the json docs here) to produce dictionaries with string contents rather than unicode.
You could also use this:
import fileinput
fout = open("out.txt", 'a')
for i in fileinput.input("in.txt"):
str = i.replace("u\"","\"").replace("u\'","\'")
print >> fout,str
The typical json responses from standard websites have these two encoding representations - u' and u"
This snippet gets rid of both of them. It may not be required as this encoding doesn't hinder any logical processing, as mentioned by previous commenter
There is no "unicode" encoding, since unicode is a different data type and I don't really see any reason unicode would be a problem, since you may always convert it to string doing e.g. foo.encode('utf-8').
However, if you really want to have string objects upfront you should probably create your own decoder class and use it while decoding JSON.