Send data from all collections to Dataframe (MongoDB, Python)

Send data from all collections to Dataframe (MongoDB, Python) - python

I have tried to use this code:
preDF = pd.DataFrame
for ticker in tickers:
df_forpre = read_ticker(ticker, '2017-01-22')
df_forpre['ticker'] = ticker
preDF = pd.concat([preDF,df_forpre], axis=0)
But I receive:
cannot concatenate a non-NDFrame object
How I can get data from all collection and send it to dataframe with name collection

The mistake was in concat, it should be:
preDF = pd.concat([df_forpre], axis=0)
Instead of:
preDF = pd.concat([preDF,df_forpre], axis=0)

Related

How to append results from a for loop in a DataFrame

I have created a function that manipulates a couple of datasets and outputs a merged DataFrame. I have passed an array of variables in a loop, which outputs a merged DataFrame for each one - now I want all the results appended in a single DataFrame:
Function:
`
def backtest(ticker, data):
fin = si.get_data(ticker)
fin.index.rename('date', inplace=True)
fin = fin.reset_index(level=0)
fin = fin.drop(columns=['high', 'low', 'volume'])
fin['intraday_ch_usd'] = fin['close'] - fin['open']
fin['intraday_pct_ch'] = fin['intraday_ch_usd'] / fin['open'] * 100
fin['3d_pr'] = fin['close'].shift(-3)
fin['3d_del'] = fin['3d_pr'] - fin['open']
fin['3d_pct_ch'] = fin['3d_del'] / fin['open'] * 100
data = data[data['awardee_parent_ticker_symbol'].notna()]
data = data.rename(columns={'date_of_news_dispatch': 'date', 'awardee_parent_ticker_symbol': 'ticker'})
data["date"] = pd.to_datetime(data["date"])
data = data.merge(fin, on=['date','ticker'])
data = pd.DataFrame(data=data)
`
Loop:
`
output=pd.DataFrame()
for ticker in tickers:
try:
backtest(ticker, data)
except:
pass
output=output.append(data,ignore_index=True)
output
`
I can't figure out how to append results in a single DataFrame..

Long story short, you should not dynamically append rows to a DataFrame. To achieve the result you want, you can append everything in a list and then call pd.concat to create a DataFrame from it. Something like
output=[]
for ticker in tickers:
try:
backtest(ticker, data)
except:
pass
output.append(data)
output = pd.concat(output)

How to create a dataframe?

df4 = []
for i in (my_data.points.values.tolist()[0]):
df3 = pd.json_normalize(j)
df4.append(df3)
df5 = pd.DataFrame(df4)
df5.head()
When I run this code I get this error: Must pass 2-d input. shape=(16001, 1, 3)

pd.json_normalize will change the json data to table format, but what you need to have is an array of dictionaries to be able to convert to a dataframe.
For example
dict_list=[
{"id":1,"name":"apple","price":10},
{"id":1,"name":"orange","price":20},
{"id":1,"name":"pineapple","price":15},
]
df=pd.DataFrame(dict_list)
In your case
df4 = []
for i in (my_data.points.values.tolist()[0]):
# df3 = pd.json_normalize(j) since the structure is not mentioned,
# I'm assuming "i" as a dictionary which has the relevant row
df4.append(i)
df5 = pd.DataFrame(df4)
df5.head()

Pandas, assign object name using index strings

Using sample data:
Product = [Galaxy_8, Galaxy_Note_9, Galaxy_Note_10, Galaxy_11]
I would like to create 4 data frames, each of the data frames contains respective sales information.
The problem is I would like to use index method to create data frames, for instance,
Expected output is:
Galaxy_8 = pd.DataFrame()
Galaxy_Note_9 = Pd.DataFrame()
Galaxy_Note_10 = pd.DataFrame()
Galaxy_11 = pd.DataFrame()
Imagine if the product list counts beyond 200, what is the most efficient way to achieve the desired outcome?
Thank you

If the sample list is like,
Product = ['Galaxy_8', 'Galaxy_Note_9', 'Galaxy_Note_10','Galaxy_11']
Then you can try like:
for var in Product:
globals()[var] = pd.DataFrame()

Read Shapefiles into Dataframe

I have a shapefile that I would like to convert into dataframe in Python 3.7. I have tried the following codes:
import pandas as pd
import shapefile
sf_path = r'data/shapefile'
sf = shapefile.Reader(sf_path, encoding = 'Shift-JIS')
fields = [x[0] for x in sf.fields][1:]
records = sf.records()
shps = [s.points for s in sf.shapes()]
sf_df = pd.DataFrame(columns = fields, data = records)
But I got this error message saying
TypeError: Expected list, got _Record
So how should I convert the list to _Record or is there a way around it? I have tried GeoPandas too, but had some trouble installing it. Thanks!

def read_shapefile(sf_shape):
"""
Read a shapefile into a Pandas dataframe with a 'coords'
column holding the geometry information. This uses the pyshp
package
"""
fields = [x[0] for x in sf_shape.fields][1:]
records = [y[:] for y in sf_shape.records()]
#records = sf_shape.records()
shps = [s.points for s in sf_shape.shapes()]
df = pd.DataFrame(columns=fields, data=records)
df = df.assign(coords=shps)
return df

I had the same problem and this is because the .shp file has a kind of key field in each record and when converting to dataframe, a list is expected and only that field is found, test changing:
records = [y[:] for y in sf.records()]
I hope this works!

Trying to iterate and join Pandas DFs: AttributeError: 'Series' object has no attribute 'join'

I'm looking to pull the historical data for ~200 securities in a given index. I import the list of securities from a csv file then iterate over them to pull their respective data from the quandl api. That dataframe for each security has 12 columns, so I create a new column with the name of the security and the Adjusted Close value, so I can later identify the series.
I'm receiving an error when I try to join all the new columns into an empty dataframe. I receive an attribute error:
'''
Print output data
'''
grab_constituent_data()
AttributeError: 'Series' object has no attribute 'join'
Below is the code I have used to arrive here thus far.
'''
Import the modules necessary for analysis
'''
import quandl
import pandas as pd
import numpy as np
'''
Set file pathes and API keys
'''
ticker_path = ''
auth_key = ''
'''
Pull a list of tickers in the IGM ETF
'''
def ticker_list():
df = pd.read_csv('{}IGM Tickers.csv'.format(ticker_path))
# print(df['Ticker'])
return df['Ticker']
'''
Pull the historical prices for the securities within Ticker List
'''
def grab_constituent_data():
tickers = ticker_list()
main_df = pd.DataFrame()
for abbv in tickers:
query = 'EOD/{}'.format(str(abbv))
df = quandl.get(query, authtoken=auth_key)
print('Competed the query for {}'.format(query))
df['{} Adj_Close'.format(str(abbv))] = df['Adj_Close'].copy()
df = df['{} Adj_Close'.format(str(abbv))]
print('Completed the column adjustment for {}'.format(str(abbv)))
if main_df.empty:
main_df = df
else:
main_df = main_df.join(df)
print(main_df.head())

It seems that in your line
df = df['{} Adj_Close'.format(str(abbv))]
you're getting a Serie and not a Dataframe. If you want to convert your serie to a dataframe, you can use the function to_frame() like:
df = df['{} Adj_Close'.format(str(abbv))].to_frame()
I didn't check if your code might be more simple, but this should fix your issue.

To change a series into pandas dataframe you can use the following
df = pd.DataFrame(df)
After running above code, the series will become dataframe, then you can proceed with join tasks you have mentioned earlier

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Send data from all collections to Dataframe (MongoDB, Python) - python

The mistake was in concat, it should be: preDF = pd.concat([df_forpre], axis=0) Instead of: preDF = pd.concat([preDF,df_forpre], axis=0)

Related

How to append results from a for loop in a DataFrame

How to create a dataframe?

Pandas, assign object name using index strings

Read Shapefiles into Dataframe

Trying to iterate and join Pandas DFs: AttributeError: 'Series' object has no attribute 'join'

Categories

Resources