Obtaining and Visualizing Financial data using Python: Part 1
4 min readOct 7, 2023
One of the first sources from which you can get historical daily price-volume stock market data is Yahoo finance. You can use pandas_datareader
or yfinance
module to get the data and then can download or store in a csv file by using pandas.to_csv method.
#import libraries
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import statistics
import warnings
warnings.filterwarnings('ignore')
#daily close for 'n' number of years
data1 = yf.download(tickers = 'MSFT', start = '2020-01-01', end = '2023-10-07') # date should be in YYYY-MM-DD format
data1.tail() # if no argument given, last 5 values will be displayed
[*********************100%***********************] 1 of 1 completed
Date Open High Low Close Adj Close Volume
2023-10-02 316.279999 321.890015 315.179993 321.799988 321.799988 20570000
2023-10-03 320.829987 321.390015 311.209991 313.390015 313.390015 21033500
2023-10-04 314.029999 320.040009 314.000000 318.959991 318.959991 20720100
2023-10-05 319.089996 319.980011 314.899994 319.359985 319.359985 16965600
2023-10-06 316.549988 329.190002 316.299988 327.260010 327.260010 25645500
# plot the closing price in a simple chart
data1['Close'].plot(figsize=(8,6))
plt.title("Close Price of Microsoft over 3 years",fontsize=12)
plt.xlabel("Time Period",fontsize=10)
plt.ylabel("Price",fontsize=10)
plt.grid(linewidth=0.45,linestyle = '-.',color='k')
plt.show()
# data with different periods and frequencies can be obtained
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
data2 = yf.download(tickers = 'MSFT', interval = '1m', period = '7d')
data2.head()
# limitation of yfinance is only 7 days worth of 1 minute granularity data can be fetched
[*********************100%***********************] 1 of 1 completed
Datetime Open High Low Close Adj Close Volume
2023-09-28 09:30:00-04:00 310.989990 311.260010 310.269989 310.351196 310.351196 830111
2023-09-28 09:31:00-04:00 310.250000 310.320007 309.952209 310.170013 310.170013 158030
2023-09-28 09:32:00-04:00 310.170013 310.200012 309.769989 309.769989 309.769989 82994
2023-09-28 09:33:00-04:00 309.750000 309.920013 309.450012 309.850006 309.850006 122423
2023-09-28 09:34:00-04:00 309.880005 310.450012 309.799988 310.369995 310.369995 119957
data3 = yf.download(tickers= 'MSFT', interval = '5m', period = '60d')
data3.head()
# Only 60 days worth of data of 5 minute granularity can be fetched
[*********************100%***********************] 1 of 1 completed
Datetime Open High Low Close Adj Close Volume
2023-07-14 09:30:00-04:00 347.589996 348.910004 346.670013 346.760010 346.760010 2793026
2023-07-14 09:35:00-04:00 346.049988 348.230011 346.029999 347.809998 347.809998 899202
2023-07-14 09:40:00-04:00 347.835602 348.350006 347.049896 347.049896 347.049896 551767
2023-07-14 09:45:00-04:00 347.054993 348.019989 346.559998 347.489990 347.489990 474803
2023-07-14 09:50:00-04:00 347.519989 348.410004 347.519989 348.160004 348.160004 381701
Fetching data for multiple tickers
data4 = pd.DataFrame(columns=tickers_list)
tickers_list =['AAPL','TSLA','MSFT','KO']
#here if you want to add additional tickers later, use the function ticker_list.append(#write ticker here with inverted commas and execute)
for ticker in tickers_list:
data4[ticker] = yf.download(ticker, '2020-01-01','2023-01-01')['Close']
data4
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
Date AAPL TSLA MSFT KO
2020-01-02 75.087502 28.684000 160.619995 54.990002
2020-01-03 74.357498 29.534000 158.619995 54.689999
2020-01-06 74.949997 30.102667 159.029999 54.669998
2020-01-07 74.597504 31.270666 157.580002 54.250000
2020-01-08 75.797501 32.809334 160.089996 54.349998
... ... ... ... ...
2022-12-23 131.860001 123.150002 238.729996 63.820000
2022-12-27 130.029999 109.099998 236.960007 64.209999
2022-12-28 126.040001 112.709999 234.529999 63.570000
2022-12-29 129.610001 121.820000 241.009995 63.950001
2022-12-30 129.929993 123.180000 239.820007 63.610001
756 rows × 4 columns
# plot the closing price in a simple chart
data4.plot(figsize=(8,6))
plt.title("Close Price over 3 Years",fontsize=12)
plt.xlabel("Time Period",fontsize=10)
plt.ylabel("Price",fontsize=10)
plt.grid(linewidth=0.45,linestyle = '-.',color='k')
plt.show()
Simple calculations on Price data
#performing functions on the stock data
standard_deviation = statistics.stdev(data1['Close'])
skewness = data1['Close'].skew()
kurtosis = data1['Close'].kurt()
mean = data1['Close'].rolling(window=20).mean().plot()
plt.title("MSFT 20 Day Rolling Mean",fontsize=12)
plt.ylabel('Price')
plt.grid(linewidth=0.25,linestyle = '-.',color='k')
print(f'Standard Deviation for the dataset is' ,standard_deviation)
print('Skewness for the dataset is' ,skewness)
print(f'Kurtosis for the dataset is' ,kurtosis)
Standard Deviation for the dataset is 50.42030072399359
Skewness for the dataset is -0.14999511059432513
Kurtosis for the dataset is -0.7719545867211601
data1['Close'].pct_change().plot()
plt.title('Daily price change', fontsize =12)
plt.grid(linewidth=0.25, linestyle='-.')
data4.pct_change().plot(figsize = (9,5))
plt.title('Daily price change', fontsize =12)
plt.grid(linewidth=0.25, linestyle='-.')