removing double quote from json.dumps of string data - python

I have some data that I'm retrieving from a data feed as text. For example, I receive the data like the following:
1105488000000, 34.1300, 34.5750, 32.0700, 32.2800\r\n
1105574400000, 32.6750, 32.9500, 31.6500, 32.7300\r\n
1105660800000, 36.8250, 37.2100, 34.8650, 34.9000\r\n
etc.
(This is stock data, where the first column is the timestamp, the next columns are the open, high, low, and close price for the time period.)
I want to convert this into a json such as the following:
[
[1105488000000, 34.1300, 34.5750, 32.0700, 32.2800],
[1105574400000, 32.6750, 32.9500, 31.6500, 32.7300],
[1105660800000, 36.8250, 37.2100, 34.8650, 34.9000],
...
The code that I'm using is:
lines = data.split("\r\n");
output = []
for line in lines:
currentLine = line.split(",")
currentLine = [currentLine[0] , currentLine[1] , currentLine[2], currentLine[3], currentLine[4]]
output.append(currentLine)
jsonOutput = json.dumps(output)
However, when I do this, I'm finding that the values are:
[
["1105488000000", "34.1300", "34.5750", "32.0700", "32.2800"],
["1105574400000", "32.6750", "32.9500", "31.6500", "32.7300"],
["1105660800000", "36.8250", "37.2100", "34.8650", "34.9000"],
Is there anyway for me to get the output without the double quotes?

Pass the data through the int() or float() constructors before outputting in order to turn them into numbers.

...
currentLine = [float(i) for i in currentLine]
output.append(currentLine)
...

Change
currentLine = [currentLine[0] , currentLine[1] , currentLine[2], currentLine[3], currentLine[4]]
output.append(currentLine)
to
currentData = map(lambda num: float(num.strip()) , currentLine)
output.append(currentData)
Whenever you initialize currentLine with
currentLine = line.split(",")
all the elements of currentLine are strings. So, whenever you write this to JSON, you get JSON strings throughout. By converting all the strings to numbers, you get something without quotes. Also, I added the strip() calls to handle leading and trailing whitespace as is shown in your data example.
P.S. Please don't use the same variable name for two completely different things. It's more clear to use currentLine for the list of strings, and currentData for the list of numbers.

Related

How to split a comma-separated line if the chunk contains a comma in Python?

I'm trying to split current line into 3 chunks.
Title column contains comma which is delimiter
1,"Rink, The (1916)",Comedy
Current code is not working
id, title, genres = line.split(',')
Expected result
id = 1
title = 'Rink, The (1916)'
genres = 'Comedy'
Any thoughts how to split it properly?
Ideally, you should use a proper CSV parser and specify that double quote is an escape character. If you must proceed with the current string as the starting point, here is a regex trick which should work:
inp = '1,"Rink, The (1916)",Comedy'
parts = re.findall(r'".*?"|[^,]+', inp)
print(parts) # ['1', '"Rink, The (1916)"', 'Comedy']
The regex pattern works by first trying to find a term "..." in double quotes. That failing, it falls back to finding a CSV term which is defined as a sequence of non comma characters (leading up to the next comma or end of the line).
lets talk about why your code does not work
id, title, genres = line.split(',')
here line.split(',') return 4 values(since you have 3 commas) on the other hand you are expecting 3 values hence you get.
ValueError: too many values to unpack (expected 3)
My advice to you will be to not use commas but use other characters
"1#\"Rink, The (1916)\"#Comedy"
and then
id, title, genres = line.split('#')
Use the csv package from the standard library:
>>> import csv, io
>>> s = """1,"Rink, The (1916)",Comedy"""
>>> # Load the string into a buffer so that csv reader will accept it.
>>> reader = csv.reader(io.StringIO(s))
>>> next(reader)
['1', 'Rink, The (1916)', 'Comedy']
Well you can do it by making it a tuple
line = (1,"Rink, The (1916)",Comedy)
id, title, genres = line

Efficiently create list of list of list with varying amount of input

I have a .txt file with floating point numbers inside. This file always contains an even number of values which need to be formatted as follows: [[[a,b],[c,d],[e,f]]]
The values always need to be in pairs of two. Even when there are less or more values: [[[a,b], ... [y,z]]]
So it needs to go from this:
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this:
[[[3.31497114423,50.803721015],[7.09205325687,50.803721015],[7.09205325687,53.5104033474],[3.31497114423,53.5104033474],[3.31497114423,50.803721015]]]
I have the feeling this can be done fairly easy and efficiënt. The code I have so far works, but is far from efficient...
with open(filename) as f:
for line in f:
footprint = line.strip()
splitted = footprint.split(' ')
list_str = []
for coordinate in splitted:
list_str.append(coordinate.replace(',', ''))
list_floats = [float(x) for x in list_str]
footprint = [list_floats[x:x+2] for x in range(0, len(list_floats), 2)]
return [footprint]
Any help is greatly appreciated!
The split function is very useful in scenarios such as these.
with open(filename) as f:
# Format the string of numbers into a list seperated by commas
new_list = f.read().split(", ")
# For every element in this list, make it a list seperated by space
# Also convert the strings into floats
for i in range(len(new_list)):
new_list[i] = list(map(float, new_list[i].split(" ")))
new_list = [new_list]
The first split converts the code from this
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this
['3.31497114423 50.803721015', '7.09205325687 50.803721015', '7.09205325687 53.5104033474', '3.31497114423 53.5104033474', '3.31497114423 50.803721015']
The second split converts that to this
[['3.31497114423', '50.803721015'], ['7.09205325687', '50.803721015'], ['7.09205325687', 53.5104033474'], ['3.31497114423', '53.5104033474'], ['3.31497114423', '50.803721015']]
Then the mapping of the float function converts it to this (the list converts the map object to a list object)
[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]
The last brackets place the whole thing into another list
[[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]]

I can't remove whistespace

I have json file which is inserted in a sqlite database.
After inserting, all non breaking space are automatically converted to whitespace, which is good!
json file looks like : [{'john' : "6\u00a0500\u00a0\u20ac" , 'dams' : "7\u00a0500\u00a0\u20ac"}, {'john' : "10\u00a0900\u00a0\u20ac" , 'dams' : "13\u00a0980\u00a0\u20ac"}] ##style it in code block
sqlite file looks like:
My goal is to remove whitespace, '€' and cast the value to integer.
I used trim, ltrim, rtrim, replace and combinations of trim and replace to remove whitespace, but it doesn't work.
First off, I would suggest that you ensure that you're using double quotes throughout your JSON files. This is the standard for JSON syntax and moreover, not having things consistent will cause more of a headache later on.
With that out of the way, here's my solution:
with open(jsonFile, "r") as file:
jsonLines = file.readlines()
cleanJsonLines = []
for jsonDict in jsonLines:
for key in jsonDict:
almostCleanJson = jsonDict[key].replace("\u00a0", "")
cleanJson = almostCleanJson.replace("\u20ac", "")
cleanJsonLines.append({key: cleanJson})
print(cleanJsonLines)
Output:
[{'john': '6500'}, {'dams': '7500'}]

Can't get rid of hex characters

This program makes an array of verbs which come from a text file.
file = open("Verbs.txt", "r")
data = str(file.read())
table = eval(data)
num_table = len(table)
new_table = []
for x in range(0, num_table):
newstr = table[x].replace(")", "")
split = newstr.rsplit("(")
numx = len(split)
for y in range(0, numx):
split[y] = split[y].split(",", 1)[0]
new_table.append(split[y])
num_new_table = len(new_table)
for z in range(0, num_new_table):
print(new_table[z])
However the text itself contains hex characters such as in
('a\\xc4\\x9fr\\xc4\\xb1[Verb]+[Pos]+[Imp]+[A2sg]', ':', 17.6044921875)('A\\xc4\\x9fr\\xc4\\xb1[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]', ':', 11.5615234375)
I'm trying to get rid of those. How am supposed to do that?
I've looked up pretty much everywhere and decode() returns an error (even after importing codecs).
You could use parse, a python module that allows you to search inside a string for regularly-formatted components, and, from the components returned, you could extract the corresponding integers, replacing them from the original string.
For example (untested alert!):
import parse
# Parse all hex-like items
list_of_findings = parse.findall("\\x{:w}", your_string)
# For each item
for hex_item in list_of_findings:
# Replace the item in the string
your_string = your_string.replace(
# Retrieve the value from the Parse Data Format
hex_item[0],
# Convert the value parsed to a normal hex string,
# then to int, then to string again
str(int("0x"+hex_item[0]))
)
Obs: instead of "int", you could convert the found hex-like values to characters, using chr, as in:
chr(hex_item[0])

How do I avoid errors when parsing a .csv file in python?

I'm trying to parse a .csv file that contains two columns: Ticker (the company ticker name) and Earnings (the corresponding company's earnings). When I read the file using the following code:
f = open('earnings.csv', 'r')
earnings = f.read()
The result when I run print earnings looks like this (it's a single string):
Ticker;Earnings
AAPL;52131400000
TSLA;-911214000
AMZN;583841600
I use the following code to split the string by the break line character (\n), followed by splitting each resulting line by the semi-colon character:
earnings_list = earnings.split('\n')
string_earnings = []
for string in earnings_list:
colon_list = string.split(';')
string_earnings.append(colon_list)
The result is a list of lists where each list contains the company's ticker at index[0] and its earnigns at index[1], like such:
[['Ticker', 'Earnings\r\r'], ['AAPL', '52131400000\r\r'], ['TSLA', '-911214000\r\r'], ['AMZN', '583841600\r\r']]
Now, I want to convert the earnings at index[1] of each list -which are currently strings- intro integers. So I first remove the first list containing the column names:
headless_earnings = string_earnings[1:]
Afterwards I try to loop over the resulting list to convert the values at index[1] of each list into integers with the following:
numerical = []
for i in headless_earnings:
num = int(i[1])
numerical.append(num)
I get the following error:
num = int(i[1])
IndexError: list index out of range
How is that index out of range?
You certainly mishandle the end of lines.
If I try your code with this string: "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600" it works.
But with this one: "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600\r\r\n" it doesn't.
Explanation: split creates a last list item containing only ['']. So at the end, python tries to access [''][1], hence the error.
So a very simple workaround would be to remove the last '\n' (if you're sure it's a '\n', otherwise you might have surprises).
You could write this:
earnings_list = earnings[:-1].split('\n')
this will fix your error.
If you want to be sure you remove a last '\n', you can write:
earnings_list = earnings[:-1].split('\n') if earnings[-1] == '\n' else earnings.split('\n')
EDIT: test code:
#!/usr/bin/env python2
earnings = "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600\r\r\n"
earnings_list = earnings[:-1].split('\n') if earnings[-1] == '\n' else earnings.split('\n')
string_earnings = []
for string in earnings_list:
colon_list = string.split(';')
string_earnings.append(colon_list)
headless_earnings = string_earnings[1:]
#print(headless_earnings)
numerical = []
for i in headless_earnings:
num = int(i[1])
numerical.append(num)
print numerical
Output:
nico#ometeotl:~/temp$ ./test_script2.py
[52131400000, -911214000, 583841600]

Categories