Converting a single row into two - python

I have this daily stats churned out from a system which outputs total sales and units sold per region group. For my analysis, I want to breakdown the entries into regions instead of region group. I'm trying to look for a way to split each row into per region with the respective measures.
I have historical percentages on the market share per region which I'll use to come up with the estimated sales and units sold.
I can do this manually in excel but given how i'll be doing this on a weekly basis, I'm looking for a way to automate it via python.
My data: https://imgur.com/a/pBr3y4D
Goal: https://imgur.com/a/Uc56PVR

Well, first of all, when you're doing DS researches try to find the most appropriate way in your personal case. There's nothing bad in using all Excel functionality to solve your issue, scripting, etc.
However, if you really-really want to use pandas, then what I would do in your case - just .append() and then split on regions and grouping by sales or made up a function with for..loop.

Related

How to forecast data based on variables from different datasets?

For a university project I’m trying to see the relation oil production/consumption and crude oil price have on certain oil stocks, and I’m a bit confused about how to sort this data.
I basically have 4 datasets-
-Oil production
-Oil consumption
-Crude oil price
-Historical price of certain oil company stock
If I am trying to find a way these 4 tables relate, what is the recommended way of organizing the data? Should I manually combine all this data to a single Excel sheet (seems like the most straight-forward way) or is there a more efficient way to go about this.
I am brand new to PyTorch and data, so I apologise if this is a very basic question. Also, the data can basically get infinitely larger, by adding data from additional countries, other stock indexes, etc. So is there a way I can organize the data so it’s easy to add additional related data?
Finally, I have the month-to-month values for certain data (eg: oil production), and day-to-day values for other data (eg: oil price). What is the best way I can adjust the data to make up for this discrepancy?
Thanks in advance!
You can use pandas.DataFrame to create 4 dataframes for each dataset, then proceed with combining them in one dataframe by using merge

Arbitrage algorithm for multiple exchanges, multiple currencies, and multiple amounts

I'm searching for a way to apply an arbitrage algorithm across multiple exchanges, multiple currencies. and multiple trading amounts. I've seen examples of using BellmanFord and FloydWarshall, but the one's I've tried all seem to be assuming the graph data set is made up of prices for multiple currencies on one single exchange. I've tried tinkering and making it support prices across multiple exchanges but I haven't found any success.
One article I read said that I use BellmanFord and simply put only the best exchange's price in the graph (as opposed to all the exchange's prices). While it sounds like that should work, I feel like that could be missing out on value that way. Is this the right way to go about it?
And regarding multiple amounts, should I just make one graph per trade amount? So say I want to run the algorithm for $100 and for $1000, do I just literally populate the graph twice once for each set of data? The prices will be different at $100 than for $1000 so one exchange that has the best price at $100 may be different then that of the $1000 amount.
Examples:
The graph would look like this:
rates = [
[1, 0.23, 0.26, 17.41],
[4.31, 1, 1.14, 75.01],
[3.79, 0.88, 1, 65.93],
[0.057, 0.013, 0.015, 1],
]
currencies = ('PLN', 'EUR', 'USD', 'RUB')
REFERENCES:
Here is the code I've been using, but this assumes one exchange and one single trade quantity
Here is where someone mentions you can just include the best exchange's price in the graph in order to support multiple exchanges
Trying for accuracy over speed, there's a way to represent the whole order book of each exchange inside of a linear program. For each bid, we have a real-valued variable that represents how much we want to sell at that price, ranging between zero and the amount bid for. For each ask, we have a real-valued variable that represents how much we want to buy at that price, ranging between zero and the amount asked for. (I'm assuming that it's possible to get partial fills, but if not, you could switch to integer programming.) The constraints say, for each currency aside from dollars (or whatever you want more of), the total amount bought equals the total amount sold. You can strengthen this by requiring detailed balance for each (currency, exchange) pair, but then you might leave some opportunities on the table. Beware counterparty risk and slippage.
For different amounts of starting capital, you can split dollars into "in-dollars" and "out-dollars" and constrain your supply of "in-dollars", maximizing "out-dollars", with a one-to-one conversion with no limit from in- to out-dollars. Then you can solve for one in-dollars constraint, adjust the constraint, and use dual simplex to re-solve the LP faster than from scratch.

How to visualize aggregate VADER sentiment score values over time in Python?

I have a Pandas dataframe containing tweets from the period July 24 2019 to 19 October 2019. I have applied the VADER sentiment analysis method to each tweet and added the sentiment scores in new columns.
Now, my hope was to visualize this in some kind of line chart in order to analyse how the averaged sentiment scores per day have changed over this three-months period. I therefore need the dates to be on the x-axis, and the averaged negative, positive and compound scores (three different lines) on the y-axis.
I have an idea that I need to somehow group or resample the data in order to show the aggregated sentiment value per day, but since my Python skills are still limited, I have not succeeded in finding a solution that works yet.
If anyone has an idea as to how I can proceed, that would be much appreciated! I have attached a picture of my dateframe as well as an example of the type of plot I had in mind :)
Cheers,
Nicolai
You should have a look at the groupby() method:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
Simply create a day column which contains a timestamp/datetime_object/dict/tuple/str ... which represents the day of the tweet and not it's exact time . Then use the groupby() method on this column.
If you don't know how to create this column, an easy way of doing it is using https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
Keep in mind that groupby method doesn't return a DataFrame but a groupby_generic.DataFrameGroupBy so you'll have to choose a way of aggreating the data in your groups (you should probably do groupby().mean() in your case, see grouby method documentation for more information)

Is there anyway to calculate Market Beta from Yahoo Finance DatasReader on Python?

I'm currently trying to gain market betas from tickers gained through yahoo finance datasreader. I was wondering if there is a way to calculate each stocks market beta, and put it in a dataframe?
This is what I have for my code so far:
import pandas_datareader.data as pdr
Tickers=['SBUX','TLRY']
SD='2005-01-31'
ED='2018-12-31'
TickerW=pdr.datareader(Tickers,'yahoo',SD,ED)
TickerW.head()
Okay, to make sure we're on the same page, we use the formula and definition of market beta from here: https://www.investopedia.com/terms/b/beta.asp
Beta = Covariance(Stock Returns, Market Returns) / Variance(Market Returns)
So first of all, we need the tickers for the market as well as the tickers for the stock. Which ticker you use here depends a lot on what market you want to compare against: Total stock market? Just the S&P 500? Maybe some other international equity index? There's no 100% right answer here, but a good way to pick is think about who the "movers" of your stock are, and what other stocks they hold. (Check out Damodaran's course on valuation, free on the interwebs if you google it).
So now your question becomes: How do I compute the covariance and variance of stock returns?
First, the pandas tickers have a bunch of information. The one we want is the "Adjusted Close". That's the daily closing price of the stock, retroactively adjusted for any "special" events like stock splits, reverse splits, and dividends. Because let's say a stock trades for $1000 a pop one day, but then undergoes a 1 for 2 stock split, so now instead of 1 share for $1000, you have 2 shares for $500 each. In a "raw" price chart, it would appear as if your stock just lost 50% value in a single day when in reality nothing happened. The Adjusted Close time series takes care of that to make sure that only "real" changes to the stock's value are reflected.
You can get that by calling prices = TickerW['Adj. Close'] or whatever key yahoo finance uses these days. By just looking at the TickerW dataframe you should be able to figure that out on your own :)
Next, we'd be changing prices into returns. That's just prices.shift(1) / prices (or maybe the other way round :D consult the documentation and try it out yourself). (Nerd note: Instead of these returns, it is mathematically more sound to use the logarithmic returns, because they have certain reasonable properties. If you want, throw a "log" around the returns.
Finally, we now have a series of returns (or log returns). One series for the stock returns, one for the market returns (e.g. from SPY, for the S&P 500). Now we just need to use them in the formula for beta.
Well, the way to go here is to do what I just did: Hit up google for "pandas covariance between two series" and that gets us to https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cov.html
So basically, cov = stock_returns.cov(market_returns) and var = market_returns.var and then beta = cov / var.
I'd say that should be enough info to send you on your way. Good luck.

Best way to work with hierarchal python data that needs to be aggregated at many levels

I have a very complex data-set that I need to easily aggregate and work with values at multiple levels.
For example, assume I have data on population and crime rate for each city in the US. Each city should roll up to a state, so the state population is the SUM of each city within it, and the crime rate is the AVERAGE of the crime rates of each city below it. Then I need each state to roll up to the US overall, maintaining the same calculation logic.
What is the best data structure to accomplish complex aggregations of hierarchically organized data in python?
Ideally I would be able to select a node, and then using some method feed the node an argument on what data to aggregate, and the logic to aggregate it with.
two words
use pandas
link to tutorial
http://pandas.pydata.org/pandas-docs/stable/cookbook.html

Categories