OUR PAPER WAS PUBLISHED on ECCV 2020
It is part of my computational task during my summer intern in Lichtman Lab and Hanspiter Lab.
It is part of the big synapse project. Also the challenge 2 of CREMI
We now rank No.1 in the CREMI contest. And we will submit a paper to CVPR this November.
For whole work summary please see here
Also I finished another NMJ project during summer intern in Lichtman Lab
Codes
weekly report
First
Second
Third
Fourth
First two weeks
summarize and rewrite some data augmentation repo
The three main repos are gunpowder from funke lab, EM-data, EM-seglib from Sebastian lab.
Gunpowder is quite complicated since it includes a whole pipeline of data preprocessing. It defines some batch provider and batch filter at first, making it harder to understand the data augmentation steps in the middle. Since it is written in python2 and designed for caffe, I rewrite and summarize, compare the gunpowder’s four augmentation methods with Sebastian’s codes.
Funke said in his paper about CREMI challenge 2 that data augmentation is the most important steps in the whole pipeline. I rewrite the four kinds of data augmentation methods combining Funkey and Sebastian’s codes.
gunpowder
gunpowder on cremi
- simple augmentation
- intensity augment
- ElasticAugment
- DefectAugment
simple augmentation
class gunpowder.SimpleAugment(mirror_only=None, transpose_only=None)
Randomly mirror and transpose all Arrays and Points in a batch.
simple augment can be achieved by EM-segLib/em_segLib/transform.py!
in this repo, only transpose_only_xy=True is used!
intensity augment
class gunpowder.IntensityAugment(array, scale_min, scale_max, shift_min, shift_max, z_section_wise=False)
Randomly scale and shift the values of an intensity array.
- array (ArrayKey) – The intensity array to modify.
- scale_min (float) –
- scale_max (float) –
- shift_min (float) –
- shift_max (float) –
The min and max of the uniformly randomly drawn scaling and shifting values for the intensity augmentation. Intensities are changed as:
1 | a = a.mean() + (a-a.mean())*scale + shift |
- z_section_wise (bool) – Perform the augmentation z-section wise. Requires 3D arrays and assumes that z is the first dimension.
Randomly scale and shift the values of an intensity array,
- can do z axis sections augmentation,first dim should be z
- normalized array desired
- set scale and shift’s biggest and smallest value
1 | np.random.uniform(low=self.shift_min, high=self.shift_max)) |
Return scale and shift value
- for each array: a.mean() + (a-a.mean())*scale + shift
- np.clip(0,1)
similar implementation!
- scale is similar to DataProvider/python/data_augmentation.py class GreyAugment(DataAugment)!
- DataProvider/python/data_augmentation.py class GreyAugment(DataAugment)may be better:
1 | img *= 1 + (np.random.rand() - 0.5)*self.CONTRAST_FACTOR |
- it borrows from (http://elektronn.org/).ELEKTRONN is used for 2D/3D large scale image. Also used for segmentation
ElasticAugment
The author used ElasticAugment([4,40,40], [0,2,2], [0,math.pi/2.0], prob_slip=0.05,prob_shift=0.05,max_misalign=25)
1 | class gunpowder.ElasticAugment(control_point_spacing, jitter_sigma, rotation_interval, prob_slip=0, prob_shift=0, max_misalign=0, subsample=1) |
- reshape array data into (channels,) + spatial dims
- first create_identity_transformation
- create_identity_transformation from funkey another repo augment,create_identity_transformation, channel change to three , use np.meshgrid increase two channels
- if sum(jitter_sigma) > 0: create_elastic_transformation
1
2
3
4
5augment.create_elastic_transformation(
target_shape,
self.control_point_spacing,
self.jitter_sigma,
subsample=self.subsample)first get control_point_offsets,the interpolation to upscale,generate 3 channel images
then by rotation_interval [0,math.pi/2.0], rotation_start = rotation_interval[0], rotation_max_amount = rotation_interval[1] - rotation_interval[0]
rotation = random.random()*self.rotation_max_amount + self.rotation_start(less than 90)
If rotation>0,do create_rotation_transformation
- then if prob_slip + prob_shift > 0,做__misalign(transformation)
According to thres to do randomly shift by section
at last if subsample >1 before elastic augmentation do subsampling to speed up elasticaugment after augmentation then do upscale
Instead of creating an elastic transformation on the full
resolution, create one subsampled by the given factor, and linearly
interpolate to obtain the full resolution transformation. This can
significantly speed up this node, at the expense of having visible
piecewise linear deformations for large factors. Usually, a factor
of 4 can savely by used without noticable changes. However, the
default is 1 (i.e., no subsampling).
DefectAugment
class gunpowder.DefectAugment(intensities, prob_missing=0.05, prob_low_contrast=0.05, prob_artifact=0.0, prob_deform=0.0, contrast_scale=0.1, artifact_source=None, artifacts=None, artifacts_mask=None, deformation_strength=20, axis=0)[source]
Augment intensity arrays section-wise with artifacts like missing sections, low-contrast sections, by blending in artifacts drawn from a separate source, or by deforming a section.
- intensities (ArrayKey) – The key of the array of intensities to modify.
- prob_missing (float) –
- prob_low_contrast (float) –
- prob_artifact (float) –
- prob_deform (float) – Probabilities of having a missing section, low-contrast section, an artifact (see param artifact_source) or a deformed slice. The sum should not exceed 1. Values in missing sections will be set to 0.
contrast_scale (float, optional) – By how much to scale the intensities for a low-contrast section, used if prob_low_contrast > 0. (class (artifact_source) – BatchProvider, optional):A gunpowder batch provider that delivers intensities (via ArrayKey artifacts) and an alpha mask (via ArrayKey artifacts_mask), used if prob_artifact > 0.
artifacts (ArrayKey, optional) – The key to query artifact_source for to get the intensities of the artifacts.
- artifacts_mask (ArrayKey, optional) – The key to query artifact_source for to get the alpha mask of the artifacts to blend them with intensities.
- deformation_strength (int, optional) – Strength of the slice deformation in voxels, used if prob_deform > 0. The deformation models a fold by shifting the section contents towards a randomly oriented line in the section. The line itself will be drawn with a value of 0.
- axis (int, optional) – Along which axis sections are cut.
From a special artifect source read data and defectaugment
1 | artifact_source = ( |
threshold:
1
2
3
4
5
6DefectAugment(
prob_missing=0.03,
prob_low_contrast=0.01,
prob_artifact=0.03,
artifact_source=artifact_source,
contrast_scale=0.1)get missing,low_contrast, artifact, deform value prob_missing_threshold = self.prob_missing (0.03=0.03)
prob_low_contrast_threshold = prob_missing_threshold + self.prob_low_contrast (0.04=0.03+0.01)
prob_artifact_threshold = prob_low_contrast_threshold + self.prob_artifact (0.07=0.03+0.04)
prob_deform_slice = prob_artifact_threshold + self.prob_deform (0.07=0.07+0)- for each slice,generate 0-1 random value r,if:
- r < prob_missing_threshold: ‘zero_out’
- do nothing
- elif r < prob_low_contrast_threshold: ‘lower_contrast’
- mean = section.mean(), section -= mean, section *= self.contrast_scale, section += mean
- elif r < prob_artifact_threshold:’artifact’
- raw.data[section_selector] = section(1.0 - artifact_alpha) + artifact_rawartifact_alpha
- artifact_alpha, artifact_raw from artifact_source’s mask and raw, equals to a Alpha Blending
- elif r < prob_deform_slice: ‘deformed_slice’ if deformed slice, needs bigger upstream roi for deformed slice
- r < prob_missing_threshold: ‘zero_out’
1 | section = raw.data[section_selector].squeeze() |
cremi exapmle
gunpowder/examples/cremi at master · funkey/gunpowder · GitHub
docker has problems (MALIS, pycaffe)
Install docker on server Get Docker CE for Ubuntu | Docker Documentation
1 | docker pull funkey/gunpowder |
Encounter problems with installing nvidia docker and docker run.
Seems we should try to rewrite some codes to incorporate directly without the bothering of docker, python2, caffe, malis and other settings.
resources
GitHub - donglaiw/EM-data: dataset loader
DataProvider/python/dataprovider at refactoring · torms3/DataProvider · GitHub
1706.00120 Superhuman Accuracy on the SNEMI3D Connectomics Challenge Sebastian group
Then I incorporate the data augmentation methods into synapse prediction model for further use.
augmentation on cremi
Model test and redesign
I tried zudi’s synapse prediction model using 3D U-net, since it is based on pytorch, I spent some time to read the pytorch documentation. We have discussed about the model improvement issue and I have tried to use other architecture for better prediction results.
synapse prediction model
improvement on U net
I can try a lot fine tune and redesign about U-net and 3D U-net, maybe from kaggle top solutions and github.
Week 3
In week 3, apart from NMJ project, I concentrate mainly on synapse prediction project and make many progress. Since zudi has come back, it is more efficient to discuss and collaborate, we have done a lot this week.
short term plan
Our short term plan is adding all the new tricks (dilation CNN block from Deepglobe’s Dlink-net, DICE_BCE combined loss, all the data augmentation methods) to the model. We train the whole model for three days on all A, B, C samples. And fine tune it later for one day. Then we submit the result to see if it is better.
In my consideration, the previous model has a not very little gap with the state-of-art one. About one fold error score. So it is a lot for us to do. I have used dilation CNN, combined Loss, and some very useful data augmentation methods which take me many days to accomplish(All admit that proper augmentation is one of the most import procedure to achieve better results.) So this week I have read many other people’s good paper, project summary and codes in similar tasks( the dilation CNN, DICE_BCE loss and data augmentation all originate from this.) I believe there are more to implement. The state-of-art model and pipeline looks really good, so I think I should use more strategy to improve our results.
Model improvement and training
settings and training
Since I have done several deep learning and machine learning projects(Emaize, CardiacAI, 3D CT images, Deepshape), I am quite familiar with linux, python deep learning environment and all extra settings( jupyter, TensorboardX etc). I can be quite efficient in running the previous model designed by zudi.
training settings
create envs
1 | conda env create -f bin/synapse_pytorch/envs/py3_pytorch.yml |
training parameter setting
1 | CUDA_VISIBLE_DEVICES=0,1 python3 -u bin/synapse_pytorch/train.py -t data/cremi/ -dn images/im_A_v2_200.h5@images/im_B_v2_200.h5@images/im_C_v2_200.h5 -ln gt-syn/syn_A_v2_200.h5@gt-syn/syn_B_v2_200.h5@gt-syn/syn_C_v2_200.h5 -o outputs/cremi0719mixloss -lr 0.001 --volume-total 400000 --volume-save 20000 -mi 24,256,256 -g 2 -c 6 -b 2 -l mix |
Tensorboard monitor setting
it is a real time monitoring
1 | #http://140.247.107.75:6006 |
1 | kill jupyter and release port |
network visualization
I visualize our current model, it is a very big one.
1 | import torch |
Current model:
Scripts: Summer_Intern/pytorch_synapse.ipynb at master · james20141606/Summer_Intern · GitHub
model structure and loss function improvement
I read D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction and other winners in some challenges and implement more model structure and loss function design.
dilated CNN block
It can further enlarge the visual area of the model, which is helpful to learn more information efficiently. The dilation is adaptive according the the layer depth. The maximum is 8, we may change it smaller later.
loss function
I read Deepglobe challenge’s solution and change the loss function to DICE + BCE like this:
P: predict result, GT: ground truth, N: batch size
It added a DICE part to previous BCE loss function, which is better since it is common to use DICE as loss function for U-net, the combined loss function is also used in some challenges winner.
loss function implementation:
Summer_Intern/loss.py at master · james20141606/Summer_Intern · GitHub
Summer_Intern/loss_dlink_net.ipynb at master · james20141606/Summer_Intern · GitHub
1 | class dice_bce_loss(nn.Module): |
I use TensorboardX to monitor the training procedure. I monitor the combined loss, DICE loss and BCE loss simultaneously. To my relief, the loss function decrease not only because BCE decrease, the DICE loss also decrease. Since 1- DICE reflect the overlap ratio of predicted and ground truth area, it means our model learns well.
Train loss:
BCE loss:
DICE loss:
We should do further improvement on loss function, BCE is divided by batch size, so we may do the same with DICE.
We think BCE punishes FN since we set a weight to punish it. And we think DICE is punishing FP since the remained are mainly FP, and we think it is good: our strategy is remove FN at first, then we can prune FP using some methods. So DICE loss may become one of our methods.
We can also balance the FP, FN by adjusting DICE and BCE’s weight. It is essential for us to decrease FP since it is the main gap between the state of art model. We retain so many FP in predicting.
Data augmentation
I spent many time on this part because I believe it is the most important part besides the model, also this part may determine which pipeline is better between many good models.
Last week we have discussed a lot about augmentation’s details. This week I rewrite several of them, and learned many different augmentation methods from many sources( mature pipeline, API, challenge winner and papers) I compare their results by rewrite their codes and visualize the results. Some of the codes I read are hard to understand(like gunpowder) since it contain a whole pipeline’s complex manipulation.
I have applied several of them and implement them in training and testing pipeline.
The codes for final use is: Summer_Intern/augmentation.py at master · james20141606/Summer_Intern · GitHub
And the implementation and visualization codes:
Summer_Intern/cremi_augmentation_implementation.ipynb at master · james20141606/Summer_Intern · GitHub
Summer_Intern/augment_cremi.ipynb at master · james20141606/Summer_Intern · GitHub
training
I do transformation on image and mask at the same time. For simple augmentation it is simple, for elastic transformation, I use random seed to reproduce, and binarize mask data to 0, 1 again. For intensity augmentation I don’t do any transformation on mask. For defect transformation, there are many to notice, including random seed to reproduce and binarize.
I do many visualization to benchmark every augmentation
train
simple augmentation
Do X, Y, Z mirror and transpose, so in total there are 16 methods. Produce a list to generate four 0, 1 random int to decide do or not do these four methods. Do same transformation on image and mask.
Usage: simpleaug_train_produce()
1 | # produce 16 binary arr |
intensity augmentation
Usage: IntensityAugment()
Use mix mode for random choose 2D or 3D intensity augmentation. I do not transform mask because there is only pixel shift.
1 | class IntensityAugment(): |
elastic augmentation
Usage: apply_elastic_transform(img, mask)
I compare gunpowder and this Elastic Transform for Data Augmentation | Kaggle one, after comparison, it seems gunpowder version elastic has more functions, if we do not use rotate, it is fine.
The output range of create_transformation is 0-width, we can normalize the results. And I use random seed to reproduce on images and masks
1 | def create_identity_transformation(shape, subsample=1): |
defect augmentation
Usage: transformedimgs, transformedmasks = apply_deform(testdatraw/256.,testdatseg,0,20,0.08)
I use gunpowder’s defect, the block region used to be zero, but I change it to zero for better batch normalization results. I use random seed to reproduce images and masks. The block size and missing section’s ratio can change.
1 | def prepare_deform_slice(slice_shape, deformation_strength, iterations, randomseed): |
test
For each sample in test, I will produce 16 transformed images and reverse the predicted masks later.
Usage: simpleaug_test_produce()
Usage: simpleaug_test_reverse()
In test, it is common to only use 16 kinds simple augmentation
For X Y Z axis mirro and xy transpose, in all $2^4 = 16$ images produced.
1 | class simpleaug_test_produce(): |
further useful augmentation
I have found many other augmentations in many sources. Since our main aim now is to get a better result in a short time. I will see if the augmentation strategy is enough. If not, I will implement them later. If the augmentation is good enough, we can do more on task 3 later. Since the model is really hard to train and see the results(may take a week for the feedback), time is really precious, and we can’t test one method at one time.
I have tried some here Summer_Intern/further_aug_imgaug.ipynb at master · james20141606/Summer_Intern · GitHub, but I will decide what to do first next week.
- 2018 annual datascience bowl top1: segment cells
https://www.leiphone.com/news/201804/qfus8zALhZLoA8Ai.html
There are many data augmentation methods:
https://github.com/selimsef/dsb2018_topcoders/blob/7a87c07e1fb8e090186a3914a1443469f5107962/albu/src/augmentations/transforms.py
It seems clear and they used an augmentation API imgaug
imgaug
Apart from gunpowder’s method, there are a lot to use! - random zoom, rotate, flip
- contrast and brightness
- heavy geometric transform: Elastic Transform, Perspective Transform, Piecewise Affine transforms, Pincushion Distortion
- contrast limited adaptive histogram equalization (CLAHE) ,Sharpen,Emboss
- Gaussian noise
- Blur、Median Blur、Motion Blur
- HSV
- rearrange channel
- repeat of cell nuclear
future work
compare with other work and thoughs
- For randomly chosen sample, we use pixel ratio to determine if one sample is used to train, Funke use probability, we can think which is better.
- Funke use regression whereas we use binary classification, it is simple to change to regression, but we should compare our loss function and strategy at first. For example, Funke’s STED loss may not be good.
- Funke use auxiliary loss for better prune, but we have visualize the result and find the proposed synapse matchh the membrane well, so it may not be necessary.
future plan
As mentioned above, I believe there are more to implement. The state-of-art model and pipeline looks really good, so I think I should use more strategy to improve our results.
- read and consider V-net’s structure. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
vnet.pytorch
GitHub - mattmacy/vnet.pytorch: A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
GitHub - mattmacy/torchbiomed: Datasets, Transforms and Utilities specific to Biomedical Imaging - if submitted result has a big difference with Funke’s, I may try to reproduce their work.
- read task 3 complete codes apart from GitHub - paragt/EMSynConn: One algorithm to detect synaptic location AND connectivity, both dyadic and polyadic, in Electron Microscopy volume.
- how to fine tune, apart from learning rate adjustment, we should consider some prune work. For example, the paper Stacked U-Nets with Multi-Output for Road Extraction:
postprocessing: Various post-processing techniques for road extraction have been proposed in the literature, e.g., centerline extraction using structured SVM or Markov random field, handling noisy data using a special CNN, recovering lines by sampling junction-points, and bridging road gaps by heuristic search. Here we develope a novel post processing technique by linking broken roads through shortest path search with decreasing confidence thresholds. More specifically, we first convert the raster road prediction image to vector format so we can bridge gaps and trim spurious roads, then we render a raster image from road vectors and merge it with the original prediction because the challenge needs raster images for IoU calculation.
- examine results carefully to understand the difference between FP and TP, gain biological intuition from it for better model design. For example the density of membrane and vesicle
Week 4
This week I mainly focus on synaptic partner prediction and NMJ pipeline, and wait for our current models’ result on CREMI synapse prediction challenge.
Loss adjustment
Last week we add DICE loss and this week we change BCE loss to Focal loss. It is adaptive to better consider weights.
P: predict result, GT: ground truth, N: batch size
Augmentation improvement
This week I add and improve two augmentation methods: deform and elastic augmetation.
Summer_Intern/cremi_augmentation_implementation.ipynb at master · james20141606/Summer_Intern · GitHub
deformation augmentation
Remove random seeds, do not add deformation on masks.
Write the function to avoid deformation on adjacent sections. We wish this can force the network pay attention to 3D characteristics of the image.
1 | def prepare_deform_slice(slice_shape,deformation_strength,iterations): |
elastic augmentation
1 | from scipy.ndimage.interpolation import map_coordinates |
elastic augmentation is important and should be treated very carefully. I use the idea in Best Practices for Convolutional Neural Networks applied to Visual Document Analysis.
The main methods are Gaussian filter and interpolation. There is a version of elastic augmentation using affine transformation, I test it but the effect is hard to control.
So I use Gaussian filter and interpolation for gentle elastic transformation. Then normalize to 0-1, binarize label and make sure the transformation is same in images and masks.
1 | apply_elastic(img,mask) |
task2 reverse predicting
I have use alignment and padding and shift scripts to process the raw data and it should be reversed after prediction. I test the codes to make sure it works.
Codes: Summer_Intern/T_align.m at master · james20141606/Summer_Intern · GitHub
Week 5 & 6
align location volume
I align the annotation about pre and post synaptic partners location for later training and analyze three volumes pre and post’s section distance
It seems that the volume C annotation has some apparent outliers, it is impossible that the two partners have such a big distance, so I removed the outlier with distance more than 300.
The final process include two more changes: do not use good slice replace bad slice
And remove partners with distance more than 300. Now the three volumes have 217,634,722 synaptic partners.
data augmentation
For elastic transformation, it is weird to use gaussian filter and then warp, but it is good to use warp first and then do gaussian filter.
- elastic: smoothed random motion field=random vector+gaussian filter
- warping: random global affine matrix
loss function and new design
- Intensity influence the result severely, besides decreasing contrast and brightness,
http://cs231n.stanford.edu/reports/2017/pdfs/300.pdf - we can also try group normalization and multi GPU batch normalization.
32 channel, 8 groups, each 4 channels - distance transform to assign weight for negative class, similar with tanh(D/S)
- multi resolution and multi task learning
Multi task! is using multiple loss on different resolution level and add these lossed together. It is a kind of like attention. But the combined losses are not very precise to control different levels.
We may also consider MULTI -SCALE DENSE NETWORKS FOR RESOURCE E FFICIENT IMAGE CLASSIFICATION, it can output results on different resolution levels and are better than resnet based models.
Loss
DICE loss are hard to convergence.
I formulate some equations to change the DICE loss, and also implement some other DICE loss adjustment.
generalized DICE loss
Change DICE denominator to square. So that the points having different distance from GT have different derivative.
change denomi
Using this formulation we do not need to assign weights to samples of different classes to establish the right balance between foreground and background voxels, and we obtain results that we experimentally observed are much better than the ones computed through the same network trained optimising a multinomial logistic loss with sample re-weighting
DICE Loss
consider negative?
sensitivity-specificity
weighted dice
still based only on pairwise comparisons of probabilities associated with the same label and don’t take into account inter-class relationships.
reverse prediction and deal with cracks
For CREMI contest, we do several preprocessing, like align the dataset, and deal with two specific sections with cracks( it is very important, because the crack will severely influence the result.)
I will realign the predicted result to the original one and submit the prediction. And I will deal with two sections with cracks separately.
对那两层做crack align
[ ] 脚本 align_new
detect crack region and split the picture into two regions
- align the two parts with neighbor sections. Use warp to align. The crack will look bigger.
- do interpolation, using optical flow
Reverse the predicted crack section back:
- delete interpolation region
- find connected region and extract them
- reverse warp the two parts
1
2imwarp(im, affine2d(tmp(:,:,2)),'FillValues',0,'OutputView',imref2d(sz));
tform = affine2d([2 0.33 0; 0 1 0; 0 0 1])
For reverse warp, we can use invert function
Invert geometric transformation - MATLAB invert
- add the two parts
- get the reversed images and test if it is right.
it’s good enough to transform synapse prediction result, no synapse will be on the border
So we do not need the padding work
for predicted data:
- [ ] Use Jupyter Notebook to get connected components from em
And apply invert on them to get reversed version
8.6 reverse all predictions
A+ B+ C+ reverse prediction
Test if the algorithm is right!
use gt-syn transformed,and reverse,check if it is the same with the original one
- input: gt-syn/syn
- output: reverse/syn
Check if output=images/volumes/labels/clefts
Rewrite the reverse alignment code
after a long time, it finally works, with the help of donglai. It seems it is really hard to reverse back to the original align.
- The forward strategy looks like:
Read alignment shift distance, cumsum to get continuous shift, get each section’s shift. Have a very big tmp arr, put the small original arr in it. Get a bigger ROI to have the medium one
1 | syn_o=zeros([1250+sum(ww([1,3])),1250+sum(ww([2,4])),153],'uint16'); |
- The reverse strategy looks like:
Read alignment shift distance, cumsum to get continuous shift, get each section’s shift. Have a very big tmp arr, put the predicted region(medium size) in it. Get a small, original ROI to have the original one
through test there is no problem!
A+: in 15 and 48 layer(from one, not zero), do reverse crack!
It is 14 and 47 section of predicted A+ syn
data/prediction/im015_reverse0.png, data/prediction/im048_reverse0.png
Last three weeks
Week 7,8,9 (10)
8.13
We add squeeze and excitation block to our model. It further improves our performance on CREMI contest. We achieve 2nd place by the end of August. This architecture also works for synapse polarity prediction task.
8.29
Reverse back a new version
- Get ROI, delete interpolation region
- inverse warp
- get reverse0
-merge - Put back to predicted A+ and reverse A+
/n/coxfs01/xupeng/projects/synapse/data/reverse