Projects

Water Detection

Setting: Given a glass of water, detect the how much water in the glass.

Requirement: You should collect a dataset (can be a toy dataset) by yourself, describe the tasks and define a evaluation metrices, and provide a baseline to this task.

Contact: huangbb@shanghaitech.edu.cn

Video Completion

Setting: Given a video, assume that some frames in the video is masked. You aim to recover the video.

Requirement: Please design an algorithm to solve the problem. To show the effectiveness of your improvement, you should propose a baseline to the task. Besides, please conduct the experiment on at least 4 different videos. You may divide videos into 2 levels, one is easy-case and another one is hard-case. You should describe the standard and metric of the difficulty and performance to better evaluate your algorithm.

Contact: zhaozb@@shanghaitech.edu.cn

Jigsaw puzzle

Setting: Given a set of unordered pieces from a single image, can you output the correct order to assemble the complete picture?

Requirement: Here we provide an online jigsaw puzzle at this url. Input your id in the upper right corner. Pick any image you like. You can start with 20 pieces. See if you can provide an algorithm that solves the task. Report the result with different number of pieces.

Contact: jinlei@shanghaitech.edu.cn

Fake talking vedio

Setting: Given a piece of audio, generate plausible vedio according to it by vedio clipping.

Requirement: You should collect a short vedio containing talking head as dataset e.g. BV1HJ411L7DP, design and implement a pipeline to finish this task. Split it into train and test dataset, in test stage we only need the audio as input.

Details:

1）Don't pay much attention on audio feature. You can use any package to extact audio features such as librosa,pyAudisoAnalysis. You can simply use it as black box. Common audio feature : Mel, MFCC.

2）Simply, it's a task to learn a mapping from audio to vedio.

3) The main metric to judge the generated vedio is whether the mouth in the vedio can match the audio, so you don't need to take the smoothness of the generated vedio into consideration, but you can try to use vedio interval instead of vedio frame to make it smooth.

Example pipeline: for a new input audio feature use knn to find the nearest vedio frame, append it into list, and finally generate the corresponding vedio. You may use Kmean, mask on mouth, svm or other technology to make your pipeline robust. Enjoy this project.

Contact: zhiyh@shanghaitech.edu.cn

Skateboarding state

Setting: Given a photo of a skateboarding, tell its state (static/sliding/overhead).

Requirement: Later you will be provided with a toy dataset of pictures that contain a skaboarding in one of the three states. You are encouraged to enlarge this dataset to get your model robuster. You should submit a zip file including the code and report, which is supposed to contain your baseline and result, discussion is encouraged. The method is not limited, have fun with it. :)

Contact: qianych@shanghaitech.edu.cn

A taste of CNN based gaze estimation

Setting: Given a video (e.g., talking or working), following one person's gaze in this video.

Reference: https://arxiv.org/pdf/1907.02364.pdf

Requirement:

Please design a CNN based gaze estimation.

You can choose any CNN backbones.

You can use the same training/testing setup in the above paper and give report.

Please use any video captured by yourselves to test your model and make a demo.

Contact: liandz@shanghaitech.edu.cn

Industrial inspection and defect anomaly detection

Setting: Given an image of industrial production, detecting the defect area.

Reference:

Paper: https://www.mvtec.com/fileadmin/Redaktion/mvtec.com/company/research/mvtec_ad.pdf

Dataset: https://www.mvtec.com/company/research/datasets/mvtec-ad/

Requirement:

Determining the input image is normal or abnormal.

If the input image is abnormal, it is then detecting the fault area.

CNN-based method and the image processing-based method are both can be used.

Contact: liuwen@shanghaitech.edu.cn

8.Apple Counting

Setting: Given a apple picture, and return the number of apples.

Requirement:

If you want, you can count other objects.

For apple counting dataset, you should downlowd some apple images from Baidu or Google as dataset.

Any method is accepted including Machine Learning and Deep learning method.

Do not warry about your grades. Do not have to do tough project. Do some interesting. Enjoy your project.

Reference:

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7780439

Contact: zhongzm@shanghaitech.edu.cn

Web Analytics Made Easy - Statcounter