## Archive for the 'quant' Category

July 31, 2010
- 5922 rows
- 610 columns
- timestamp : 40182 – 40290 | 108 samples | 9 hours of trading data | no actual date information
- most of variables should represent stock prices with (open,high,low,last_price) values in 5-min intervals
- (OPEN = value at timestamp, HIGH = highest traded value in 5-min interval, LOW = lowest traded value in interval, LAST = last traded value at end of interval)
- for stock prices the following should hold :
*open(p(t+1)) = last_price(p(t)) + delta | *however – that is not always the case – dataset is filled with missing data in order to reflect the real-world trading scenario (missed measurements, data loss, etc)
- some variables are categorical/logical (open interpretation on what these actually represent) [additional question is – should they be added to the model]
- data might not be available for each variable at each time sample
- first shoot at the data might indicate that we should go for basic time-series analysis
- big question is price formation dynamics due to price correlation (any-to-all stocks regression modeling or any-to-(time?)-correlated stocks model)
- handling missing data will be essential for getting high-performance predictions (methods ?)
- offtopic : financial time series similarity detection ? pattern matching etc. “find similar stocks”, metrics ?
- clustering algorithms for time series datasets ?

Posted in quant **|** Leave a Comment »