The Five Steps to Determine Returns from Historical Asset Price Data Using Python | by Bruce H. Cottman, Ph.D. | May 2022


A step-by-step approach to finding and analyzing the returns of a market trading asset

Figure 1. The black solid market capitalization of listed US companies by year (adjusted for inflation in US dollars). The red dotted line is an exponential function that corresponds to US market capitalization. Source: screenshot by author and World Bank data

Disclaimer: The following content is provided for informational purposes only and does not constitute financial advice. Do not make any decisions based solely on my writings.

I need a method to measure the returns of a market trading asset. Using the return trendline, I compare assets competitively. In the case of ETFs or portfolio funds, I must include the fees.

Why I do not compare the risk? Because I invest in broad-based stock indices or ETFs. Broad-based ETFs tend to have small risks that disappear if held for twenty years or more.

To note: An unproven but excepted “rule” is that higher risk is associated with higher turn gain. However, any stock in a specific company is mathematically certain to reach zero value during its lifetime. EFT’s risk is proportional to the risk of the market indices in which it holds assets. Market indices remove assets if their value becomes too low. The market index will live as long as the market exists.

What is an ETF?

The ETF (Electronic Transfer Fund) has gained wide acceptance over the past ten years, a relatively short period in the history of investments. ETFs are made up of a basket of stocks. Each stock is generally weighted in the EFT by market capitalization.

Although ETFs are similar to stock indices, they have one important additional feature. ETFs can be bought or sold on a stock exchange just like a stock market asset.

EFTs are bought and sold whenever the forex market is open. Indices are bought or sold only at market close.

More importantly, holding specific assets in ETFs is changing. Some assets are added to the ETF, while others are removed. Companies go bankrupt, but ETFs are wired to keep the winners.

For example, a “large cap” ETF includes the 100 most capitalized stocks of publicly traded companies. The largest capitalized share was do not APPL (Apple) twenty years ago. APPL is currently the largest capitalized stock. APPL will probably do not to be the largest capitalized action in twenty years.

However, this is DO NOT a blog post on how to invest; instead, I focus on how to create a trend line of returns suitable for historical prices of the EFT, index or any market trading asset. An adjustment to the trend line of returns reveals the average periodic return of that asset.

The first step shows how to download historical price data from market exchanges for free in a CSV (comma separated values) formatted file. The second step details two methods of reading asset data into a python-pandas data structure. Steps three through five show how the pandas dataframe of historical financial price data is fitted, plotted, and analyzed to derive trendlines for returns.

There are several places where you can download pricing data for any asset traded on a public market. You can even download price data for assets that don’t yet appear on the well-known market exchanges NYEX and NASDAQ. An example of an asset class is crypto coins.

The example used in Figures 2-7 is the “NASDAQ 100 Technology Sector” index with the ticker symbol ˆNDXT or “NASDAQ Composite” index with stock symbol ˆIXIC.

List of some of the locations I download data from:

Figure 2: Displays NXDT EFT price data downloaded from The first image shows the date dropdown and a click on “Max” to select the full date range available. The second image shows the “daily”, “weekly” or “monthly” period drop-down list and clicking “weekly” to select data by week interval. The transition from the second frame to the third frame shows the “Apply” button running the “Max” date range and “Weekly” choices. Source: Screenshot of the author’s desktop.

The resulting data snapshot is downloaded and then viewed by uploading it to macOS Numbers (the macOS spreadsheet app is familiar with Microsoft Excel).

Picture 3. NDXT price data is downloaded and then uploaded to macOS Numbers (macOS version of Microsoft Excel) for viewing.

To note: Any text editor on any operating system should be able to view the NDXT.csv data file.

You read a CSV deposit in a pandas dataframeobject using pandas.read_csv() a function. I assume you are using pandas, a valuable library for financial data in python. If you want to know more about pandas, search “pandas python” on Google.

import pandas as pd
NDXT_wk = pd.read_csv('../data/NDXT.csv', index_col=0, parse_dates=True)
Figure 4. Five frame video showing using pandas to read and display the NDXT.csv file. Screenshot of the author’s desktop.

Alternatively, you can load a historical stock market price data asset, for example, the “NASDAQ composite” index – ^IXICdirectly in your python environment using:

!pip install pandas_datareader #if not already installed
from import DataReader
start_date = '1999/06/01'
end_date = '2022/4/21'
tickers = '^IXIC'
EFT_df = DataReader(tickers, 'yahoo', start = start_date, end = end_date)
EFT_df.index = pd.to_datetime(EFT_df.index, format ='%Y-%m-%d')
Figure 5. IXIC data. Screenshot of the author’s desktop.
Many other different price histories for assets readers as well as “awesome” financial analysis packages are listed at:

You can research the future value formula with starting principle, periodic deposit, withdrawal period and average periodic return.

A period return is a future value of an asset at the end of a period minus the future value at the end of a period.

R = F[2] - F[1]

We gain a better understanding of the future value formula by manual derivation. Additionally, we can use whatever formula we need to adjust the price data to better match an average periodic return.

However, we gain a better understanding of the future value formula. Also, we can modify the formula we need to fit the price data to best fit an average periodic return.
The future value (F) for one period is:
F[1] = P+ PR + D + DR,
where P is priniple, R the average periodic return, D is periodic deposit and np is the number of periods.
For case P = 0F[1]= D(1 + R)F[2] =F[1](1 + R) +D(1 + R) = D(1 + R)ˆ2+ D(1 + R)F[3] =F[2](1 + R) +D(1 + R)), = D(1 + R))ˆ3+ D(1 + R)ˆ2 + D(1 + R)F(np) =D(1 + R) + ... + D(1 + R)Yˆ(np-1) = sum(1,np,D(1 + Rˆn)
For case P>0, D=0F[1] = P+ PR + = P(1 + R)F[2] =  P+ PR + F[1]R =  P + 2PR + PRˆ2 = P(1 + R)ˆ2F[np] = P + R + F[np-1]R = P(1 + R)ˆnp
for case P>0, D > 0F[1] = P+ PR + D + DR = (P+D)(1 + R)F[2] = D + DR + F[1](1+R) = D + DY + P(1+R)ˆnF[np] = P + F[np-1] = P(1+R)ˆnp + sum(1,np,D(1 + Y)ˆn)

The resulting python function for future value,FV,for all principal and deposit cases, are:

def FV(principle=0, deposit=0, withdrawal=0, fyield=0, nperiod=1):
if ((principle + deposit+ withdrawal) == 0): return(0)
if (fyield <= 0): return (principle + deposit+ withdrawal) #default is end of period
amount = (deposit- withdrawal)
cumamount = 0
for n in range(1,nperiod+1):
cumamount = cumamount + deposit*(1+fyield)**n
return(principle*(1+fyield**nperiod) + cumamount)

The yield function is determined by the principal of the first price, with the future value varying according to the yield per period. In the next function returnfit python type hint was added.

def returnfit(nperiod: int=1, principle: float =0.0, return_:
float=0.0)-> float:
if ((principle) == 0): return(0)
if (return_ <= 0): return (principle) #default is end of period

The type indication is explored in:

The future value with a principle of $100 and the first ten periods of a return per period of 10% are:

Picture 7. : The future value with a principle of $100 and the first ten periods of return of 10%. Screenshot of the author’s desktop.

To adjust historical price data, I use scipy.optimize.curfit which uses nonlinear least squares to fit a nonlinear function, returnfit, to historical price data. For function curfit, I use the method trf: Trust Region Reflective algorithm, particularly suitable for large sparse problems with bounds.

from scipy.optimize import curve_fitdf = NDXT_wk
column_name = 'Adj Close'
nppy = 52
low_bounds = [1000,.05/nppy]
high_bounds = [1300, .5/nppy]
nper = NDXT_wk.index.size
xdata = np.linspace(1,nper,num=nper).astype(int)
NDXT_wk['ID'] = xdata
y_data = NDXT_wk['Adj Close'].to_numpy()
popt, pcov = curve_fit(returnfit, xdata, y_data, method='trf',
bounds=(low_bounds, high_bounds))
popt[0],stderr[0],popt[1]*nppy, stderr[1]*nppy
Picture 8. : The adjustment resulting from return using scipy.optimize.curfit .The 1st value in the list is the main adjustment result. The 2nd value in the list is the yield per year of the adjustment result. The 3rd value in the list is the main standard deviation of error between the actual and the predicted. The 4th value in the list is the standard deviation of yield error per year between the actual yield and the predicted yield. Screenshot of the author’s desktop.

In Figure 3., we saw that the starting price was around $1000. I use a principal equal to the first price as the first value of low_bounds. This pinpoints the rotating principle curfit in a 1-parameter fit of the yield, the second value of low_boundsand high_bounds.

Adjustment plot

We can plot the actual price and the resulting adjustment price:

sns.set(rc={'figure.figsize':(12, 6)})
plt.plot(xdata, yieldfit(xdata, *popt))
plt.plot(xdata, y_data)
Picture 9.: Plot of the resulting yield adjustmentfit of NDXT using scipy.optimize.curfit . The actual price is indicated by an orange line. The price resulting from an optimal yield adjustment is represented by a blue line. Screenshot of the author’s desktop.

Below is a plotting function peb to show the resulting return adjustment of the NASDAQ Composite Index using curfit.

def peb(df, ticker, column_name, nppy, low_bounds, high_bounds,nstd):
# nppy: number of periods per year
# low_bounds: [low_priniple, low_yield ]
# high_bounds: [high_priniple, high_yield ]
#nstd: number of standard deviation = 2*width of error banner
xddata = df.index #dates array for axis
nper = df.index.size # number of periods
xdata = np.linspace(1,nper,num=nper).astype(int)
df['ID'] = xdata
y_data = df[column_name].to_numpy()
# Compute upper and lower bounds using chosen uncertainty measure: here
# it is a fraction of the standard deviation of measurements at each
# time point based on the unbiased sample variance
popt, pcov = curve_fit(returnfit, xdata, y_data, method='trf',
bounds=(low_bounds, high_bounds))
# determine std of errors
yfit_data= returnfit(xdata, *popt)
# trend line, high error, low error
df['y_fit'] = yfit_data
yfit_data_low = returnfit(xdata, *(popt-stderr*nstd))
df['y_fit_low'] = yfit_data_low
yfit_data_high = returnfit(xdata, *(popt+stderr*nstd))
df['y_fit_high'] = yfit_data_high
sns.set(rc={'figure.figsize':(12, 6)})
fig, ax = plt.subplots(figsize=(12,6))
ntm = 19
interv = nper//ntm # interval between 25 tick marks
if interv < 1: interv = 1
# weekly instead of yearly x axis labels
if (nppy == 52):
# monthly instead of yearly x axis labels
elif (nppy == 12):
# We change the fontsize of minor ticks label

ax.tick_params(axis='both', which='major', labelsize=14)
# plot raw data
ax.plot(xddata, y_data, color='black', label='yield')
# plot fitted yield trend line
ax.plot(xddata, yfit_data, color='tab:blue', label='yield fit')
# plot fitted yield lowabs error trend line
ax.plot(xddata, yfit_data_low, color='tab:blue', alpha=0.1)
# plot fitted yield high error trend line
ax.plot(xddata, yfit_data_high, color='tab:blue', alpha=0.1)
# plot error band
ax.fill_between(xddata, yfit_data_low, yfit_data_high, alpha=0.2)
### Annotate
astr = "TICKER= {:}n PRINCIPAL= ${:.2f})nRETURN= {:.2%}/year".format(ticker, popt[0], popt[1]*nppy)
xy=(0.5, 0.7), xycoords=ax.transAxes,
xytext=(-60, 18), textcoords='offset points',
bbox=dict(boxstyle="round4,pad=.5", fc="0.8"),) = NDXT_wk
ticker = '^NXDT'
column_name = 'Adj Close'
nppy = 52
low_bounds = [1000,.05/nppy]
high_bounds = [1500, .5/nppy]
nstd = 3
peb(df, ticker, column_name, nppy, low_bounds, high_bounds, nstd)

The plot resulting from curfit is shown in Figure 10.

Figure 10.: Fancy graph using peb() of the adjustment resulting from returnfit of NDXT using scipy.optimize.curfit. The blue band with the blue line dividing the band is the standard deviation of yield error per year between actual yield minus predicted yield multiplied by a factor of three. These 3 standard deviations of return error mean that 99 out of 100 results of the adjusted return values ​​are within band. Screenshot of the author’s desktop.

We define a new function that grabs a stock symbol and outputs a price data plot, an adjusted yield trend line. and the error band of three standard deviations.

def ticket_read(start_date, end_date, ticker):
EFT_df = DataReader(ticker, 'yahoo', start = start_date, end = end_date)
EFT_df.index = pd.to_datetime(EFT_df.index, format ='%Y-%m-%d')
# determine nummber of peruods per year from
delta_days = (EFT_df.index[1] - EFT_df.index[0]).days
if delta_days > 365:
nppy = 1
elif delta_days > 27:
nppy = 52
nppy = 365
column_name = 'Adj Close'
lp = EFT_df[column_name][0]
hp = 1.1*lp
ly =0.05
hy = 0.50
low_bounds = [lp,ly/nppy]
high_bounds = [hp, hy/nppy]
nstd = 3
peb(EFT_df, ticker, column_name, nppy, low_bounds, high_bounds, nstd)
start_date = '1900/01/01'
end_date = '2022/04/25'
ticker = '^IXIC'
ticket_read(start_date, end_date, ticker)
Figure 11. The data plot ˆIXIC. Screenshot of the author’s desktop.

Where you place the optimized fit window determines the yield. For example, in Figure 12 the period windows are:

tickers = ‘^IXIC’#Image 1 - the full period range
start_date = '1971/02/05'
end_date = '2022/4/21'
#Image 2- after 2003 surge # early period
start_date = '1971/02/05'
end_date = '2010/4/21'
#Image 3
start_date = '2004/01/01' #late windw
end_date = '2022/4/21'
#Image 4 - latest bull market
start_date = '2012/01/01'
end_date = '2022/4/21'
Figure 12.:Four frame video showing four different period windows resulting in four different return adjustments for the EFT ÎXIC (Total Market Composite Index). Screenshot of the author’s desktop.

In the figure above, the return adjustments vary between 12% and 24% depending on where you place the window over the past fifty-two years.

To note: These prices are not adjusted for inflation. How would you modify the code to account for price inflation?

Try adjusting the return and plot on different assets.

What has been done to find the return of a TEF from historical price data is called technical analysis.

You should be warned that past prices are not a good predictor of future prices.

However, as I explained in:

The growth of the US capital market is due to productivity growth. The main driver of productivity is to turn technological innovation into assets

The return of any broad-based asset (more than 30 stocks, bonds, and other assets) is driven by increasing wealth, primarily by new technology products. For example, imagine your life without vaccines, antibiotics, electricity, lights, automobiles, refrigerators, cell phones, personal computers, the Internet, online shopping, and other technological products.

Technology-based assets generate higher returns than less technology-dependent assets. In future blog posts, I will discuss emerging technologies that I am watching to invest in, both as stocks and ETFs.

All code and images for this blog post are at You might want to start with

Consider becoming a Medium member for $5 per month. You will have unlimited access to my articles and thousands of articles by other authors on data science, programming, relationships, humor, health, lifestyle and much more.


About Author

Comments are closed.