ATCS Projects 2023 Spring
Below, we list some tentative topics for your project. You are more than welcome
to propose your own ideas.
Your project can be related to your own research as well.
If you are not sure if the topic is appropriate, please contact the instructor.
You are encouraged to think about some new cool applications based on the learning
technique we covered in
the class (feel free to use other technique as well, but your project has to
relate to some theoretical elements of learning, optimization and prediction).
Simply applying an existing algorithm to a small dataset does not constitute a final
project. An arbitrary heuristic for an arbitrary formulation without any
insights does not suffice neither.
If you have some idea, but not really sure whether the idea is feasible or not,
talk the instructor (but you are advised to do some preliminary
search using google yourself beforehand).
If you need computing resource beyond your own PC and laptop, contact us and we
will help you.
Mid-term report: (May 12)
A team should have at most 2 persons.
By that time, you should already have started
(you have fixed your topic, your team, your idea how to pursue).
You need to start early by reading relevant papers, collecting and processing
the data, and/or starting with some preliminary code etc.
You need to submit a brief mid-term report, which contains "What topic, what
other people has done in
this topic, what is my new idea, how i plan to do it, and report if there is any
preliminary result".
¡¡
Final report Deadline: TBA No late submission
accepted.
-your final project deadline. You need to submit your final project report, your
code and experiment result, plots generated .
Final Presentation (date TBA, should be in the exam
week i.e., 17 or 18th week):
Each team needs to present their results in class. You should make some slides.
Each project has 5-10 mins. The slides should be in English. The presentation can
be either in Eng or Chs.
WARNING:
There are plenty open source codes for various ML tasks online.
If you use any open source code or package, you should cite it properly (in your
report and slides)! It is very important!
However, you can modify existing open source code. If you do so, you need to be
very explicit in your report which code you are using and which part is your
modification, what your modification does.
Failing to do so is considered as plagiarism (you
will get 0 for the project).
The project can be either theoretical or empirical (or both).
For a theoretical work, you can do the following:
(1) Be creative: Design a new algorithm with some theoretical guarantee. Prove
something new theorems. This may be a difficult direction. Note that a very
small and uninteresting result would not constitute a course project. If you are
not sure whether your result (or direction) is interesting, contact the
instructor.
For empirical work, you can do the following:
(1) Be creative: Find an interesting new problem and solve it, and/or design a
new algorithm for an existing problem with or without theoretical guarantee.
(2) Implement others' methods: In this case, you need to survey a direction. The
survey needs to be very detailed. You also need to implement a few
most popular algorithms and compare them experimentally, and write down the
experimental result in the survey as well. I would expect you to obtain at least
some insights or potential improvement of existing methods (not just try to run
existing open-sourced codes).
¡¡
In either case, your final report should be in the format of a NeurIPS paper (NeurIPS
format, including references).It should contain a title, an abstract, an
introduction (this is where you tell others why you do this, i.e., the
motivation, and a summary of what you do), a related work section (you have to
mention what others have done, it is important), and the main text (the details
of your method), and the experimental section. It should be at least 6 pages
long (excluding the references). In sum, it should look like a paper.
¡¡
Using ideas from online learning/bandit for hyperparameter optimization
(try to provide a better modeling for the training process and better location of the resource)
Theory and Applications of Diffusion models
Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
Statistical Efficiency of Score Matching: The View from Isoperimetry
Diffusion Models are Minimax Optimal Distribution Estimators
Theory of Deep Learning
There is already a large body of literature on this topic.
Recent active research topics:
Talk to the instructor if you want to do something in this domain
Langevin Dynamics
Convergence/hitting time:
Generalization:
Direction: analysis SGLD for NKT or mean field
Theory of Semi-supervised/self-supervised learning
¡¡
Combining Labeled and Unlabeled Data with Co-Training
Co-Training and Expansion: Towards Bridging Theory and Practice
Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data
Predicting What You Already Know Helps: Provable Self-Supervised Learning.
A theoretical analysis of contrastive unsupervised representation learning
possible direction: generalize the common representation assumption in [4]. for example, one may want to model the scenario where learning pretext task is helpful to build a partial (or close) representation for the downstream task. One still needs to do some finetuning for the downstream task. This is more realistic.
Theory of Meta-learning (multi-task, transfer learning)
Optimization in NN
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability (a very interesting recent empirical paper. direction: explain the empirical finding of the paper)
On the global landscape of neural networks: an overview (a good survey)
Convex? NN
direction: the above mentioned papers are mainly about optimization. How about generalization (use the new convex formulation)?
Neural Tangent Kernel / Wide NN
Mean Field Regime of wide NN
Robustness
Check out this blog as well https://gradientscience.org
Explanation in ML
SHAP
LIME
Intergral gradient
Pricing Derivative and Machine Learning
(idea: connect the fundamental theorem of finance and statistical learning theory)
Apply online learning algorithm to trade stocks/futures - 1
Talk to the instructor for the idea and the data
Statistical Arbitrage
Statistical arbitrage is a major class of method in quantitative trading.
You can read the following material.
Continuous process in HFT (some papers also requires knowledge from stochastic control, such as Hamilton¨CJacobi¨CBellman (HJB) equation)
Algorithmic Trading of Co-integrated Assets
Buy Low Sell High: a High Frequency Trading Perspective
Refer to the following lecture notes for background knowledge (very well written): Stochastic Calculus, Filtering, and Stochastic Control
Multi-factor Models in Finance
Please contact the instructor for the data.
Asset Allocation
Risk parity
Hierarchical Risk Parity
Value-at-Risk (VAR) and Conditional VAR (CVAR)
Extend coding assignment 1
1. try to capture both the time series and cross-sectional dependency of the assets.
2. dynamic correlation model
3. problem with estimating the covariance
matrix
¡¡