How to split GPS data in python? - python

I'm trying to split some data from a GPS module. The module prints GPS coordinates using multiple types. I need to be able to split the data type this starting with $ from the integers/other string later in that row.
#read in data
data = pd.read_fwf('/home/harry/Desktop/catTest')
#convert to csv file
data.to_csv('GPS.csv')
X = pd.read_csv('GPS.csv')
#Keep all values
GPS = X.iloc[:].values
#Test on random string
Test_string = GPS[5,:]
#seperate string and int
result = [x.strip() for x in Test_string.split(',')]
print(Test_string)
print(result)
AttributeError: 'numpy.ndarray' object has no attribute 'split'
I want to print each item in the row on seperate rows.
How can I fix this?
This is what the 5th row item looks like when printed.
[5 '$GPTXT,01,01,02,LLC FFFFFFFF-FFFFFFFF-FFFFFFFF-FFFFFFFF-FFFFFFFD*2C']

data = pd.read_fwf('/home/harry/Desktop/catTest')
data.to_csv('GPS.csv')
X = pd.read_csv('GPS.csv')
#Keep all values
GPS = X.iloc[:].values
#Test on random string
Test_string = GPS[5,:]
#seperate string and int
result = Test_string.apply(lambda x: x.split())
print(Test_string)
print(result)
you are selecting ndarray while spliting data try selecting single data while spliting data using lambda

Related

How do I fix this error in python { LAT.append(float(row[6].strip())) ValueError: could not convert string to float: '\\N' }?

I am creating a function to read in all the data from a .csv file and append each column to a list. However, when I try to convert the values in a certain column to a float, it gives me this error {LAT.append(float(row[6].strip())) ValueError: could not convert string to float: '\N'}. Here is what I am working with:
ID = []
NAME = []
ADDRESS = []
POSTAL = []
EAST = []
NORTH = []
LAT = []
LON = []
AUTH = []
# read in data
def read_data():
data = pd.read_csv('Pubs in England.csv', delimiter=',')
for row in data.values:
ID.append(row[0])
NAME.append(row[1])
ADDRESS.append(row[2])
POSTAL.append(row[3])
EAST.append(row[4])
NORTH.append(row[5])
LAT.append(float(row[6].strip()))
LON.append(float(row[7].strip()))
AUTH.append(row[8])
The LAT list is for all the latitude values in the .csv file and the LON list is for all the longitude values in the .csv file. Everything else works as intended.

How to put JSON chart data into pandas dataframe?

I have downloaded the har file of an interactive chart and have the datapoints in the following format:
'{"x":"2022-03-28T00:00:00Z"', '"value":0.2615}',
'{"x":"2022-03-29T00:00:00Z"', '"value":0.2573}',
'{"x":"2022-03-30T00:00:00Z"', '"value":0.272}', ...
What would be the easiest way to convert this into a pandas dataframe?
Both the date and the value should be columns of the dataframe.
First problem is that every element is in inside ' ' so it treads it as two items/columns but it should treat it as single item/doctionary. It may need to replace ', ' with , to have normal string with JSON which you can conver to Python dictionary using module json
text = open(filename).read()
text = text.replace("', '", ",")
and later you can use io.StringIO() to load it from text.
It needs quotechar="'" to read it correctly
df = pd.read_csv(io.StringIO(text), names=['data', 'other'], quotechar="'")
next you can convert every JSON string to python dictionary
df['data'] = df['data'].apply(json.loads)
and next convert dictionary to pd.Series which you can split to columns
df[['x','value']] = df['data'].apply(pd.Series)
Finally you may remove columns data, other
del df['data']
del df['other']
Full working example
text = """'{"x":"2022-03-28T00:00:00Z"', '"value":0.2615}',
'{"x":"2022-03-29T00:00:00Z"', '"value":0.2573}',
'{"x":"2022-03-30T00:00:00Z"', '"value":0.272}',"""
import pandas as pd
import io
import json
#text = open(filename).read()
text = text.replace("', '", ",")
#print(text)
# read from string
df = pd.read_csv(io.StringIO(text), names=['data', 'other'], quotechar="'")
# convert string to dictionary
df['data'] = df['data'].apply(json.loads)
# split dictionary in separated columns
df[['x','value']] = df['data'].apply(pd.Series)
# remove some columns
del df['data']
del df['other']
print(df)
Result:
x value
0 2022-03-28T00:00:00Z 0.2615
1 2022-03-29T00:00:00Z 0.2573
2 2022-03-30T00:00:00Z 0.2720
You can also write some part in one line
df[['x','value']] = df['data'].apply(lambda item: pd.Series(json.loads(item)))
or split it separatelly (using .str[index] on dictionary)
df['data'] = df['data'].apply(json.loads)
df['x'] = df['data'].str['x']
df['value'] = df['data'].str['value']
BTW:
you may also need to convert x from string to datetime
df['x'] = pd.to_datetime(df['x'])

Write JSON file with X and Y axis

I am working on a requirement to write my JSON output as [{"x": "MaxTemp", "y": "Temp3pm"}] and my current output looks like [MaxTemp, Temp3pm], so the logis here is, as per the screenshot the first word is X_axis and the second word after comma(,) is y_axis. Below is my code and I have attached the screenshot of the input data.
x_y_data = list(selected_ri['index'])
x_y_data
ini_string = {'Imp_features_selected_x_y':x_y_data}
# printing initial json
ini_string = json.dumps(ini_string)
# converting string to json
final_dictionary = json.loads(ini_string)
you could use str.split to split text by ',' and expand into two columns, for example:
df = df['index'].str.split(',', expand=True)
# then rename column name to x and y
df.columns = ['x', 'y']
then you can convert it into a dict and output as json at last
data = df.to_dict('records')
ini_string = json.dumps(data)

Python: error reading and manipulating DataFrame data

I have a DataFrame variable called "obsData", which has the structure:
I then use this variable as an input to a code (with much help from Stackoverflow) that sorts all hourly data into one row for each day using:
f = obsData
data = {}
for line in f:
if 'Date' not in line or 'Temp' not in line:
k, v, = line.split() # split line in 2 parts, v and k
temperature = v.split(';')[1]
if k not in data:
data[k] = [temperature]
else:
data[k].append(temperature)
for k, v in data.items():
outPut = "{} ;{}".format(k, ";".join(v))
My issue it that the variable "line" never manages to get past the first row of the data in "obsData". It only manages to read 'Date' but not the second column 'Temp'. As a consequence the split function tries to split 'Date' but since its only one value I get the error:
ValueError: not enough values to unpack (expected 2, got 1)
I have tried to redefine "f" (i.e. "obsData") from a DataFrame into a ndarray or string using to make it easier for the code to work with the data:
f = f.values # into ndarry
f = f.astype(str) # into string try 1
f[['Date', 'Temp']] = f[['Date', 'Temp']].astype(str) # into string try 2
But for some reason I don't understand I cant convert it. What am I doing wrong? Any help is much appreciated!
EDIT for clarification: I get the error at the line with
k, v, = line.split()
When importing csv data it's best to use pandas
import pandas as pd
df = pd.read_csv('obsData.csv')
if you still need to loop check itertuples

Removing Unicode from Pandas Column Text

I am attempting to determine if the data inside a list is within a dataframe column. I am new to Pandas and have been struggling with this, so at the moment I am turning the dataframe column of interest into a list. However, when I df.tolist() the list contains a slew of unicode around the string. As i am attempting to compare this with text from the other list which is not in unicode I am running into issues.
I am attempted to turn the other list into unicode but then the list had items such that read like u'["item"]' which didn't help. I have also tried to remove the unicode from the dataframe but only get errors. I cannot iterate as pandas tells me that the dataframe is to long to iterate over. Below is my code:
SDC_wb = pd.ExcelFile('C:\ BLeh')
df = SDC_wb.parse(SDC_wb.sheet_names[1], header = 1)
def Follower_count(filename):
filename = open(filename)
reader = csv.reader(filename)
handles = df['things'].tolist()
print handles
dict1 = {}
for item in reader:
if item in handles:
user = api.get_user(item)
dict1[item] = user.Follower_count
newdf = pd.DataFrame(dict1)
newdf.to_csv('test1.csv', encoding='utf-8')
Here is what the list from the dataframe looks like:
[u'#Mastercard', u'#Visa', u'#AmericanExpress', u'#CapitalOne']
Here is what x = [unicode(s) for s in some_list] looks like:
u"['#HomeGoods']", u"['#pier1']", u"['#houzz']", u"['#InteriorDesign']", u"['#zulily']"]
Naturally these don't align to check the "in" requirement. Thus, I need a method of converting the .tolist() object from:
[u'#Mastercard', u'#Visa', u'#AmericanExpress', u'#CapitalOne']
to:
[#Mastercard, #Visa, #AmericanExpress, #CapitalOne]
so that the for item in handles function will see similar handles.
Thanks for your help.

Categories