I have a JSON dataframe with 12 columns, however, I only want to read columns 2 and 5 which are named "name" and "score."
Currently, the code I have is:
df = pd.read_json("path",orient='columns', lines=True)
print(df.head())
What that does is displays every column, as would be expected.
After reading through the documentation here:
https://pandas.pydata.org/docs/reference/api/pandas.read_json.html
I can't find any real way to only parse certain columns within json, compared to csv where you can parse columns using names=[]
pass a list of columns for indexing
df[["name","score"]]
Related
I want to read a file 'tos_year.csv' into a Pandas dataframe, such that all values are in one single column. I will later use pd.concat() to add this column to an existing dataframe.
The CSV file holds 80 entries in the form of years, i.e. "... 1966,1966,1966,1966,1967,1967,... "
What I can't figure out is how to read the values into one column with 80 rows, instead of 80 columns with one row.
This is probably quite basic but I'm new to this. Here's my code:
import pandas as pd
tos_year = pd.read_csv('tos_year.csv').T
tos_year.reset_index(inplace=True)
tos_year.columns = ['Year']
As you can see, I tried reading it in and then transposing the dataframe, but when it gets read in initially, the year numbers are interpreted as column names, and there apparently cannot be several columns with identical names, so I end up with a dataframe that holds str-values like
...
1966
1966.1
1966.2
1966.3
1967
1967.1
...
which is not what I want. So clearly, it's preferable to read it in correctly from the start.
Thanks for any advice!
Add header=None for avoid parse years like columns names, then transpose and rename column, e.g. by DataFrame.set_axis:
tos_year = pd.read_csv('tos_year.csv', header=None).T.set_axis(['Year'], axis=1)
Or:
tos_year = pd.read_csv('tos_year.csv', header=None).T
tos_year.columns = ['Year']
I am trying to export some data from python to excel using Pandas, and not succeeding. The data is a dictionary, where the keys are a tuple of 4 elements.
I am currently using the following code:
df = pd.DataFrame(data)
df.to_excel("*file location*", index=False)
and I get an exported 2-column table as follows:
I am trying to get an excel table where the first 3 elements of the key are split into their own columns, and the 4th element of the key (Period in this case) becomes a column name, similar to the example below:
I have tried using different additions to the above code but I'm a bit new to this, and so nothing is working so far
Based on what you show us (which is unreplicable), you need pandas.MultiIndex
df_ = df.set_index(0) # `0` since your tuples seem to be located at the first column
df_.index = pd.MultiIndex.from_tuples(df_.index) # We convert your simple index into NDimensional index
# `~.unstack` does the job of locating your periods as columns
df_.unstack(level=-1).droplevel(0, axis=1).to_excel(
"file location", index=True
)
you could try exporting to a csv instead
df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv', index = False)
which can then be converted to an excel file easily
I try to select a subset of the object type column cells with str.split(pat="'")
dataset['pictures'].str.split(pat=",")
I want to get the values of the numbers 40092 and 39097 and the two dates of the pictures as two columns ID and DATE but as result I get one column consisting of NaNs.
'pictures' column:
{"col1":"40092","picture_date":"2017-11-06"}
{"col1":"39097","picture_date":"2017-10-31"}
...
Here's what I understood from your question:
You have a pandas Dataframe with one of the columns containing json strings (or any other string that need to be parsed into multiple columns)
E.g.
df = pd.DataFrame({'pictures': [
'{"col1":"40092","picture_date":"2017-11-06"}',
'{"col1":"39097","picture_date":"2017-10-31"}']
})
You want to parse the two elements ('col1' and 'picture_date') into two separate columns for further processing (or perhaps just one of them)
Define a function for parsing the row:
import json
def parse_row(r):
j=json.loads(r['pictures'])
return j['col1'],j['picture_date']
And use Pandas DataFrame.apply() method as follows
df1=df.apply(parse_row, axis=1,result_type='expand')
The result is a new dataframe with two columns - each containing the parsed data:
0 1
0 40092 2017-11-06
1 39097 2017-10-31
If you need just one column you can return a single element from parse_row (instead of a two element tuple in the above example) and just use df.apply(parse_row).
If the values are not in json format, just modify parse_row accordingly (Split, convert string to numbers, etc.)
Thanks for the replies but I solved it by loading the 'pictures' column from the dataset into a list:
picturelist= dataset['pictures'].values.tolist()
And afterwards creating a dataframe of the list made from the column pictures and concat it with the original dataset without the picture column
two_new_columns = pd.Dataframe(picturelist)
new_dataset = pd.concat(dataset, two_new_columns)
I took data from an .xlsx file and stored it in the dataframe. The data frame is called df, and the size of the dataframe is (51,3). 51 rows. 3 columns. The columns are unnamed and numbered 0,1,2. The rows are indexed from 0-50. What syntax would I use to extract data from a dataframe with pandas in python and put it into a csv? I know I would use DataFrame.to_csv("outputFile.csv" ), but I'm not sure how to identify a specific piece of data (row/column pair), so I can put it in a new location in the csv table in comparison to the old excel table.
You can use integer based indexing using iloc: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
In your case, the row/col value you are looking for can be retrieved by:
df.iloc[row_id, col_id]
I have turned a list of dicts into a dataframe using this:
import pandas as pd
d = pd.read_csv('/path/to/file.csv')
res_df = pd.DataFrame(d)
res_df.head()
Pandas did not interpret the format successfully, I'm guessing bc there were no quotes around the keys of the dict. It looked like this:
[{location:'playroom',state:'NY',zip:10011},{..}]
As a workaround, I stripped out "'","{}", and "[]", to make the file standard csv. However, when I call the names argument from pd.read_csv, I have two issues: 1 - the names columns are blank, and 2 - I end up with a dataframe that is 1 row with thousands of columns. res_df.transpose() did not work.
If my csv has no header row, and assuming it has the same number of fields for each record, why is it that I can't give pandas my column names, and create new dataframe rows based on these arguments/instructions?
What is the quicker/better way to do this?
*Update: here is a snippet of the csv file:
websitedotcomcom/,Jim,jim#testdotcom,777-444-5555,Long City, NY,1,http://document-url,,another_field,,,true,12 Feb 2015 (18:17),http://website.com/,Jim,jim#test.com,777-444-5555,Long City, NY,1,http://document-url,,another_field,,,true,12 Feb 2015 (18:17)
This looks like JSON rather than CSV. You should use pandas read_json method.
df = pd.read_json('/path/to/file.json')
Note: that it is sensitive to valid json, say you may have to do some string manipulation (e.g. replacing ' with ").