How to plot nested/multi layer json dictionaries - python

I've been going through hours of research just to try to solve this seemingly simple issue. I'm not sure why it's been so hard to try to find. I'm trying to plot the stock data of aapl. When i extract the data from ameritrade, its a nested json dictionary. I came from matlab and found this very simple in matlab, but I am not sure how to extract the nested json out. I used pd.read_json to extract the first json but then there is still one left inside the dataframe that has data i need to plot. Any help would be greatly appreciated. Below is what they look like:
df = pd.read_json(aapldata)
And the df looks like this, I'm trying to extract the data within the 'candles' column.
Dataframe Picture Showing Candle Column

As long as there is only one level of nesting, you should be able to do this :
from pandas.io.json import json_normalize
df = json_normalize(aapldata)
source

Related

Manipulating data for network analysis

I am trying to manipulate my dataframe before I conduct network analysis using networkx.
Here is an sample of data i got:
sample data
I am trying to use the title and cast columns and trun them to something like this:
ideal format
The ideal result is to have one column for each individual actor and the movie/show that he/she is in. If the actor has more than 1 show/movie, I want to have different rows for that actor as well.
Could someone please advise me on how to make it happen? Thank you!!
So to use pandas you first import into the dataframe. Lets call it "f".
import pandas
f = pandas.read_csv('path/to/csv')
after that you can access individual columns by doing:
f['title']
similar to a dictionary. if you want both in the same dataframe, pass in a list of columns like so:
f[['title', 'cast']]
that is as much as I can provide without knowing the extent of the project.

Why Am I getting two values while indexing Pandas Dataframe?

Here are my data and index value image :
As in the snap pandas Dataframe returning two values. What could be possibly wrong? I am beginner, sorry for the bad editing.
I think I see the issue.
data['Title'].iloc[0]
Try something like this. I think the .head() portion of the code is causinng you issues

Convert timeseries csv in Python

I want to convert a CSV file of time-series data with
multiple sensors.
This is what the data currently looks like:
The different sensors are described by numbers and have different numbers of axes. If a new activity is labeled, everything below belongs to this new label. The label is in the same column as the first entry of each sensor.
This is the way I would like the data to be:
Each sensor axis has its own column and the according label is added in the last column.
So far, I have created a DataObject class to access timestamp, sensortype, sensorvalues, and the belonging parent_label for each row in the CSV.
I thought the most convenient way to solve this would be by using pandas DataFrame but simply using pd.DataFrame(timestamp, sensortype, sensorvalues, label)
won't work.
Any ideas/hints? Maybe other ways to solve this problem?
I am fairly new to programming, especially Python, so I have already run out of ideas.
Thanks in advance
Try creating a numpy matrix of the columns you require then convert them to a pandas DataFrame.
Otherwise, you can also try to import the csv using pandas from the start.
Also for the following
pd.DataFrame(timestamp, sensortype, sensorvalues, label)
try referring to the pd.concat function as well. You would need to convert each array to a DataFrame, put them in a list and then concat them with pandas.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

Take dates and times from multiple columns to one datetime object with Python

I've got a dataset with multiple time values as below.
Area,Year,Month,Day of Week,Time of Day,Hour of Day
x,2016,1,6.0,108,1.0
z,2016,1,6.0,140,1.0
n,2016,1,6.0,113,1.0
p,2016,1,6.0,150,1.0
r,2016,1,6.0,158,1.0
I have been trying to transform this into a single datetime object to simplify the dataset and be able to do proper time series analysis against it.
For some reason I have been unable to get the right outcome using the datetime library from Python. Would anyone be able to point me in the right direction?
Update - Example of stats here.
https://data.pa.gov/Public-Safety/Crash-Incident-Details-CY-1997-Current-Annual-Coun/dc5b-gebx/data
I don't think there is a week column. Hmm. I wonder if I've missed something?
Any suggestions would be great. Really just looking to simplify this dataset. Maybe even create another table / sheet for the causes of crash, as their's a lot of superfluous columns that are taking up a lot of data, which can be labeled with simple ints.

Extract Data from CSV

I have a file which consists of nearly 10000 rows of data. I want to extract some data corresponding to some element. For example
I want to extract values of xErr for say, x>22.1 and x<22.3.
The data are in a CSV file.
How can I do this?
I have tried using np.where() but I'm unsuccessful.
Use pandas and it will be very simple. Just Google how to read a CSV with pandas and then look for examples of how to filter your data frame.
There are a lot of other posts on here that cover this problem.

Categories