OpenCV学习 (English Version)

Aug 5, 2022

学习网站：https://learnopencv.com/

Read, Display and Write an Image#

cv2.IMREAD_UNCHANGED or -1
cv2.IMREAD_GRAYSCALE or 0
cv2.IMREAD_COLOR or 1

1
import cv2
2
img_grayscale = cv2.imread('test.jpg',0)
3

4
img_color = cv2.imread('test.jpg',cv2.IMREAD_COLOR)
5
img_grayscale = cv2.imread('test.jpg',cv2.IMREAD_GRAYSCALE)
6
img_unchanged = cv2.imread('test.jpg',cv2.IMREAD_UNCHANGED)
7

8
cv2.imshow('color image',img_color)
9
cv2.imshow('grayscale image',img_grayscale)
10
cv2.imshow('unchanged image',img_unchanged)
11

12

13
cv2.imwrite('grayscale.jpg',img_grayscale)

Reading and Writing Videos#

reading#

From a file#

1
import cv2
2

3
vid_capture = cv2.VideoCapture('1.mp4')
4

5
if vid_capture.isOpened() == False:
6
    print("Error opening the video file")
7
else:
8
    # You can replace 5 with CAP_PROP_FPS as well, they are enumerations
9
    fps = vid_capture.get(5)
10
    print('Frames per second : ', fps, 'FPS')
11

12
    # You can replace 7 with CAP_PROP_FRAME_COUNT as well, they are enumerations
13
    frame_count = vid_capture.get(7)
14
    print('Frame count : ', frame_count)
15

16
while (vid_capture.isOpened()):
17
    # vid_capture.read() methods returns a tuple, first element is a bool
18
    # and the second is frame
19
    ret, frame = vid_capture.read()
20
    if ret == True:
21
        cv2.imshow('Frame', frame)
22
        # 20 is in milliseconds, try to increase the value, say 50 and observe
23
        key = cv2.waitKey(20)
24

25
        if key == ord('q'):
26
            break
27
    else:
28
        break
29

30
vid_capture.release()
31
cv2.destroyAllWindows()

cv2.VideoCapture – Creates a video capture object, which would help stream or display the video.
cv2.VideoWriter – Saves the output video to a directory.
In addition, we also discuss other needed functions such as cv2.imshow(), cv2.waitKey() and the get() method which is used to read the video metadata such as frame height, width, fps etc.

get()

cv2.VideoCapture.get(0) 视频文件的当前位置（播放）以毫秒为单位 cv2.VideoCapture.get(1) 基于以0开始的被捕获或解码的帧索引 cv2.VideoCapture.get(2) 视频文件的相对位置（播放）：0=电影开始，1=影片的结尾。 cv2.VideoCapture.get(3) 在视频流的帧的宽度 cv2.VideoCapture.get(4) 在视频流的帧的高度 cv2.VideoCapture.get(5) 帧速率 cv2.VideoCapture.get(7) 视频文件中的帧数

From Image-sequence#

1
vid_capture = cv2.VideoCapture('Resources/Image_sequence/Cars%04d.jpg')
2

3
# e.g. Cars0001.jpg, Cars0002.jpg, Cars0003.jpg, etc

From a webcam#

1
vid_capture = cv2.VideoCapture(0, cv2.CAP_DSHOW)

You might be wondering about the flag CAP_DSHOW. This is an optional argument, and is therefore not required. is just another video-capture API preference, which is short for directshow via video input.CAP_DSHOW

writing#

step
- Retrieve the image frame height and width, using the method.get()
```
1
# Obtain frame size information using get() method
2
frame_width = int(vid_capture.get(3))
3
frame_height = int(vid_capture.get(4))
4
frame_size = (frame_width,frame_height)
5
fps = 20
```
- Initialize a video capture object (as discussed in the previous sections), to read the video stream into memory, using any of the sources previously described.
- Create a video writer object.
```
1
output = cv2.VideoWriter('Resources/output_video_from_file.avi', cv2.VideoWriter_fourcc('M','J','P','G'), 20, frame_size)
```
  - filename: pathname for the output video file
  - apiPreference: API backends identifier
  - fourcc: 4-character code of codec, used to compress the frames (fourcc)
    
    AVI: cv2.VideoWriter_fourcc('M','J','P','G')
    
    MP4: cv2.VideoWriter_fourcc(*'XVID')
  - fps: Frame rate of the created video stream
  - frame_size: Size of the video frames
  - isColor: If not zero, the encoder will expect and encode color frames. Else it will work with grayscale frames (the flag is currently supported on Windows only).
- Use the video writer object to save the video stream to disk.
```
1
while(vid_capture.isOpened()):
2
    ret, frame = vid_capture.read()
3
    if ret == True:
4
         output.write(frame)
5
    else:
6
         print('Stream disconnected')
7
         break
8

9
vid_capture.release()
10
output.release()
```

Errors#

reading#

While reading frames it can throw an error if the path is wrong or the file is corrupted or frame is missing.

writing#

Most common are frame size error and api preference error.
If the frame size is not similar to the video, then even though we get a video file at the output directory, it will be blank.
If you are using the NumPy shape method to retrieve frame size, remember to reverse the output as OpenCV will return height x width x channels.
If it is throwing an api preference error, we might need to pass the CAP_ANY flag in the VideoCapture() argument. It can be seen in the webcam example, where we are using CAP_DHOW to avoid warnings being generated.

Resizing#

When resizing an image:

It is important to keep in mind the original aspect ratio of the image (i.e. width by height), if you want to maintain the same in the resized image too.
Reducing the size of an image will require resampling of the pixels.
Increasing the size of an image requires reconstruction of the image. This means you need to interpolate（插值） new pixels.

Width and Height#

1
import cv2
2
import numpy as np
3

4
image = cv2.imread('img/000.png')
5
cv2.imshow('Original Image', image)
6

7
# (width, height)
8
resized_down = cv2.resize(image, (400, 300),  interpolation= cv2.INTER_LINEAR)
9
resized_up   = cv2.resize(image, (1200, 900), interpolation= cv2.INTER_LINEAR)
10

11
h,w,c = image.shape
12
# tuple: (height, width, channel)
13
print("Original Height and Width:", h,"x", w)
14

15
cv2.imshow('Resized Down', resized_down)
16
cv2.waitKey()
17
cv2.imshow('Resized Up', resized_up)
18
cv2.waitKey()
19
cv2.destroyAllWindows()

Scaling factor#

1
scaled_f_up = cv2.resize(image, None, fx = 1.2, fy = 1.2, interpolation = cv2.INTER_LINEAR)
2

3
scaled_f_down = cv2.resize(image, None, fx = 0.6, fy= 0.6, interpolation = cv2.INTER_LINEAR)

Interpolation Methods#

INTER_AREA: INTER_AREA uses pixel area relation for resampling. This is best suited for reducing the size of an image (shrinking). When used for zooming into the image, it uses the INTER_NEAREST method.
INTER_CUBIC: This uses bicubic interpolation for resizing the image. While resizing and interpolating new pixels, this method acts on the 4×4 neighboring pixels of the image. It then takes the weights average of the 16 pixels to create the new interpolated pixel.
INTER_LINEAR: This method is somewhat similar to the INTER_CUBIC interpolation. But unlike INTER_CUBIC, this uses 2×2 neighboring pixels to get the weighted average for the interpolated pixel.
INTER_NEAREST: The INTER_NEAREST method uses the nearest neighbor concept for interpolation. This is one of the simplest methods, using only one neighboring pixel from the image for interpolation.

1
res_inter_nearest = cv2.resize(image, None, fx= scale_down, fy= scale_down, interpolation= cv2.INTER_NEAREST)
2
res_inter_linear = cv2.resize(image, None, fx= scale_down, fy= scale_down, interpolation= cv2.INTER_LINEAR)
3
res_inter_area = cv2.resize(image, None, fx= scale_down, fy= scale_down, interpolation= cv2.INTER_AREA)
4

5
vertical= np.concatenate((res_inter_nearest, res_inter_linear, res_inter_area), axis = 0)
6
cv2.imshow('Inter Nearest :: Inter Linear :: Inter Area', vertical)

Cropping#

Basic Cropping#

1
import cv2
2
import numpy as np
3

4
img = cv2.imread('img/000.png')
5
#(h = 480, h = 640, c = 3)
6
cropped_image = img[80:280, 150:330]
7
#cropped = img[h = start_row:end_row, w = start_col:end_col]
8

9
cv2.imshow("original", img)
10
cv2.imshow("cropped", cropped_image)
11

12
cv2.imwrite("Cropped Image.jpg", cropped_image)
13

14
cv2.waitKey(0)
15
cv2.destroyAllWindows()

Dividing Into Small Patches#

1
import cv2
2
import numpy as np
3

4
img = cv2.imread('img/000.png')
5

6
image_copy = img.copy()
7
imgheight=img.shape[0]
8
imgwidth=img.shape[1]
9

10
M = 160
11
N = 160
12
x1 = 0
13
y1 = 0
14

15
for y in range(0, imgheight, M):
16
    for x in range(0, imgwidth, N):
17
        if (imgheight - y) < M or (imgwidth - x) < N:
18
            break
19

20
        y1 = y + M
21
        x1 = x + N
22

23
        # check whether the patch width or height exceeds the image width or height
24
        if x1 >= imgwidth and y1 >= imgheight:
25
            x1 = imgwidth - 1
26
            y1 = imgheight - 1
27
            # Crop into patches of size MxN
28
            tiles = image_copy[y:y + M, x:x + N]
29
            # Save each patch into file directory
30
            cv2.imwrite('saved_patches/' + 'tile' + str(x) + '_' + str(y) + '.jpg', tiles)
31
            cv2.rectangle(img, (x, y), (x1, y1), (0, 255, 0), 1)
32

33
        elif y1 >= imgheight:  # when patch height exceeds the image height
34
            y1 = imgheight - 1
35
            # Crop into patches of size MxN
36
            tiles = image_copy[y:y + M, x:x + N]
37
            # Save each patch into file directory
38
            cv2.imwrite('saved_patches/' + 'tile' + str(x) + '_' + str(y) + '.jpg', tiles)
39
            cv2.rectangle(img, (x, y), (x1, y1), (0, 255, 0), 1)
40

41
        elif x1 >= imgwidth:  # when patch width exceeds the image width
42
            x1 = imgwidth - 1
43
            # Crop into patches of size MxN
44
            tiles = image_copy[y:y + M, x:x + N]
45
            # Save each patch into file directory
46
            cv2.imwrite('saved_patches/' + 'tile' + str(x) + '_' + str(y) + '.jpg', tiles)
47
            cv2.rectangle(img, (x, y), (x1, y1), (0, 255, 0), 1)
48

49
        else:
50
            # Crop into patches of size MxN
51
            tiles = image_copy[y:y + M, x:x + N]
52
            # Save each patch into file directory
53
            cv2.imwrite('saved_patches/' + 'tile' + str(x) + '_' + str(y) + '.jpg', tiles)
54
            cv2.rectangle(img, (x, y), (x1, y1), (0, 255, 0), 1)
55

56
#Save full image into file directory
57
cv2.imshow("Patched Image",img)
58
cv2.imwrite("patched.jpg",img)
59

60
cv2.waitKey()
61
cv2.destroyAllWindows()

Rotation and Translation#

Rotation#

1
import cv2
2

3
image = cv2.imread('img/000.png')
4

5
height, width, _ = image.shape
6
center = (width/2, height/2)
7

8
rotate_matrix = cv2.getRotationMatrix2D(center=center, angle=45, scale=0.5)
9
rotated_image = cv2.warpAffine(src=image, M=rotate_matrix, dsize=(int(width*0.9), int(height*0.9)))
10

11
cv2.imshow('Original image', image)
12
cv2.imshow('Rotated image', rotated_image)
13

14
cv2.waitKey(0)
15
cv2.imwrite('rotated_image.jpg', rotated_image)

The getRotationMatrix2D() function takes the following arguments:

center: the center of rotation for the input image
angle: the angle of rotation in degrees
scale: an isotropic scale factor which scales the image up or down according to the value provided

The following are the arguments of warpAffine() function:

src: the source mage
M: the transformation matrix
dsize: size of the output image
dst: the output image
flags: combination of interpolation methods such as INTER_LINEAR or INTER_NEAREST
borderMode: the pixel extrapolation method
borderValue: the value to be used in case of a constant border, has a default value of 0

需要注意的是图像宽高一定为整数，因此dsize要加上int()将值改为整数，保证类型正确，否则将会出现如下错误。

1
cv2.error: OpenCV(4.6.0) :-1: error: (-5:Bad argument) in function 'warpAffine'
2
> Overload resolution failed:
3
>  - Can't parse 'dsize'. Sequence item with index 0 has a wrong type
4
>  - Can't parse 'dsize'. Sequence item with index 0 has a wrong type

Translation#

Translation (平移)

1
import cv2
2
import numpy as np
3

4
image = cv2.imread('img/000.png')
5

6
height, width, _ = image.shape
7
center = (width/2, height/2)
8

9
tx, ty = width / 4, height / 4
10

11
translation_matrix = np.array([
12
    [1, 0, tx],
13
    [0, 1, ty]
14
], dtype=np.float32)
15

16
translated_image = cv2.warpAffine(src=image, M=translation_matrix, dsize=(width, height))
17

18
cv2.imshow('Original image', image)
19
cv2.imshow('Translated image', translated_image)
20
cv2.waitKey(0)
21
cv2.imwrite('translated_image.jpg', translated_image)

Annotating#

Color Line#

1
imageLine = img.copy()
2

3
pointA = (200,80)
4
pointB = (450,80)
5
cv2.line(imageLine, pointA, pointB, (255, 255, 0), thickness=3)
6

7
cv2.imshow('Image Line', imageLine)
8
cv2.waitKey(0)

point(x, y):

The x-axis represents the horizontal direction or the columns of the image.
The y-axis represents the vertical direction or the rows of the image.

所以上面画的是水平线

Outlined Circle#

1
imageCircle = img.copy()
2

3
circle_center = (415,190)
4
radius =100
5
cv2.circle(imageCircle, circle_center, radius, (0, 0, 255), thickness=3, lineType=cv2.LINE_AA)
6

7
cv2.imshow("Image Circle",imageCircle)
8
cv2.waitKey(0)

这个linetype参数并不是指线型是实线、虚线还是点画线，这个参数实际用途是改变线的产生算法。

Filled Circle#

1
cv2.circle(..., thickness=-1, ...)

Rectangles#

In the rectangle() function, you provide the starting point (top left) and ending point (bottom right) for the corners of the rectangle.

1
imageRectangle = img.copy()
2

3
start_point =(300,115)
4
end_point =(475,225)
5

6
cv2.rectangle(imageRectangle, start_point, end_point, (0, 0, 255), thickness= 3, lineType=cv2.LINE_8)
7

8
cv2.imshow('imageRectangle', imageRectangle)
9
cv2.waitKey(0)

Ellipses#

pass

Half-Ellipses#

pass

Text#

1
imageText = img.copy()
2

3
text = 'I am a Happy dog!'
4
org = (50,350)
5
# write the text on the input image
6
cv2.putText(imageText, text, org, fontFace = cv2.FONT_HERSHEY_COMPLEX, fontScale = 1.5, color = (250,225,100))
7
# display the output image with text over it
8
cv2.imshow("Image Text",imageText)
9
cv2.waitKey(0)
10
cv2.destroyAllWindows()

org specifies the starting location for the top left corner of the text string.

OpenCV supports several font-face styles from the Hershey font collection, and an italic font as well.

Color spaces#

RGB Color Space#

The RGB colorspace has the following properties

It is an additive colorspace where colors are obtained by a linear combination of Red, Green, and Blue values.
The three channels are correlated by the amount of light hitting the surface.

the inherent problems associated with the RGB Color space:

significant perceptual non-uniformity.
mixing of chrominance ( Color related information ) and luminance ( Intensity related information ) data.

LAB Color-Space#

L – Lightness ( Intensity ).
a – color component ranging from Green to Magenta(洋红色).
b – color component ranging from Blue to Yellow.

properties

Perceptually uniform color space which approximates how we perceive color.
Independent of device ( capturing or displaying ).
Used extensively in Adobe Photoshop.
Is related to the RGB color space by a complex transformation equation.

1
img = cv2.imread('cube1.jpg')
2
imgLAB = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)

YCrCb Color-Space#

pass

HSV Color Space#

H – Hue 色相 ( Dominant Wavelength ).
S – Saturation 色相 ( Purity / shades of the color ).
V – Value ( Intensity ).

1
img = cv2.imread('cube1.jpg')
2
imgLAB = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

There is drastic difference between the values of the red piece of outdoor and Indoor image. This is because Hue is represented as a circle and red is at the starting angle. So, it may take values between [300, 360] and again [0, 60].

segmentation#

simplest way#

Data Analysis for a Better Solution#

Filtering#

Image Filtering Using Convolution in OpenCV | LearnOpenCV #

pass

Thresholding#

Binary Thresholding#

Binary Thresholding ( THRESH_BINARY )

1
# Binary Threshold
2
if src(x,y) > thresh
3
  dst(x,y) = maxValue
4
else
5
  dst(x,y) = 0
6

7
# ---------------
8
import cv2
9

10
src = cv2.imread("threshold.png", cv2.IMREAD_GRAYSCALE)
11

12
# Set threshold and maxValue
13
thresh = 0
14
maxValue = 255
15

16
# Basic threshold example
17
th, dst = cv2.threshold(src, thresh, maxValue, cv2.THRESH_BINARY);

Inverse-Binary Thresholding#

Inverse-Binary Thresholding ( THRESH_BINARY_INV )

1
# Inverse Binary Threshold
2
if src(x,y) > thresh
3
  dst(x,y) = 0
4
else
5
  dst(x,y) = maxValue
6

7
# ---------------
8
th, dst = cv2.threshold(src, thresh, maxValue, cv2.THRESH_BINARY_INV);

Truncate Thresholding#

Truncate Thresholding ( THRESH_TRUNC )

1
# Truncate Threshold
2
if src(x,y) > thresh
3
  dst(x,y) = thresh
4
else
5
  dst(x,y) = src(x,y)
6

7
# ---------------
8
th, dst = cv2.threshold(src, thresh, maxValue, cv2.THRESH_TRUNC);

Threshold to Zero#

Threshold to Zero ( THRESH_TOZERO )

1
# Threshold to Zero
2
if src(x,y) > thresh
3
  dst(x,y) = src(x,y)
4
else
5
  dst(x,y) = 0
6

7
# ---------------
8
th, dst = cv2.threshold(src, thresh, maxValue, cv2.THRESH_TOZERO);

Inverted Threshold to Zero#

Inverted Threshold to Zero ( THRESH_TOZERO_INV )

1
# Inverted Threshold to Zero
2
if src(x,y) > thresh
3
  dst(x,y) = 0
4
else
5
  dst(x,y) = src(x,y)
6

7
# ---------------
8
th, dst = cv2.threshold(src, thresh, maxValue, cv2.THRESH_TOZERO_INV);

Blob Detection#

pass

Edge Detection#

pass

Mouse and Trackbar#

pass

Contour Detection#

Simple Background Estimation#

Deep Learning with OpenCV DNN#

1
InputImg
2

3
Input.float().to(Device)
4

5
ValDataLoader
6

7
ValDataLoader = PipeDatasetLoader(FolderPath, 1)
8

9
ValDataLoader = DataLoader(ValDataset, batch_size=1, shuffle=False, drop_last=False, num_workers=0, pin_memory=True)

Author Junyao Hu

Published Aug 5, 2022

Link https://junyaohu.github.io/blog/opencv-learning/

Read, Display and Write an Image#

Reading and Writing Videos#

reading#

From a file#

From Image-sequence#

From a webcam#

writing#

Errors#

reading#

writing#

Resizing#

Width and Height#

Scaling factor#

Interpolation Methods#

Cropping#

Basic Cropping#

Dividing Into Small Patches#

Rotation and Translation#

Rotation#

Translation#

Annotating#

Color Line#

Outlined Circle#

Filled Circle#

Rectangles#

Ellipses#

Half-Ellipses#

Text#

Color spaces#

RGB Color Space#

LAB Color-Space#

YCrCb Color-Space#

HSV Color Space#

segmentation#

simplest way#

Data Analysis for a Better Solution#

Filtering#

Thresholding#

Binary Thresholding#

Inverse-Binary Thresholding#

Truncate Thresholding#

Threshold to Zero#

Inverted Threshold to Zero#

Blob Detection#

Edge Detection#

Mouse and Trackbar#

Contour Detection#

Simple Background Estimation#

Deep Learning with OpenCV DNN#

Comments