[Python NumPy] 무작위 표본 추출, 난수 만들기 (random sampling, random number generation)

Python 분석과 프로그래밍/Python 데이터 전처리 2017. 1. 21. 22:39

이번 포스팅에서는 시간과 비용 문제로 전수 조사를 못하므로 표본 조사를 해야 할 때, 기계학습 할 때 데이터셋을 훈련용/검증용/테스트용으로 샘플링 할 때, 또는 다양한 확률 분포로 부터 데이터를 무작위로 생성해서 시뮬레이션(simulation) 할 때 사용할 수 있는 무작위 난수 만들기(generating random numbers, random sampling)에 대해서 알아보겠습니다.

Python NumPy는 매우 빠르고(! 아주 빠름!!) 효율적으로 무작위 샘플을 만들 수 있는 numpy.random 모듈을 제공합니다.

NumPy 를 불러오고, 정규분포(np.random.normal)로 부터 개수가 5개(size=5)인 무작위 샘플을 만들어보겠습니다. 무작위 샘플 추출을 할 때마다 값이 달라짐을 알 수 있습니다.

In [1]: import numpy as np

In [2]: np.random.normal(size=5)

Out[2]: array([-0.02030555, 0.38279633, -1.02369692, 1.48083476, -0.44058273])

In [3]: np.random.normal(size=5) # array with different random numbers

Out[3]: array([ 1.11942454, -1.03486318, 1.69015608, -0.43601241, -1.52195043])

먼저, seed와 size 모수 설정하는 것부터 소개합니다.

seed : 난수 생성 초기값 부여

난수 생성 할 때 마다 값이 달라지는 것이 아니라, 누가, 언제 하든지 간에 똑같은 난수 생성을 원한다면 (즉, 재현가능성, reproducibility) seed 번호를 지정해주면 됩니다.

# seed : setting the seed number for random number generation for reproducibility

In [4]: np.random.seed(seed=100)

In [5]: np.random.normal(size=5)

Out[5]: array([-1.74976547, 0.3426804 , 1.1530358 , -0.25243604, 0.98132079])

# exactly the same with the above random numbers

In [6]: np.random.seed(seed=100)

In [7]: np.random.normal(size=5) # 위의 결과랑 똑같음

Out[7]: array([-1.74976547, 0.3426804 , 1.1530358 , -0.25243604, 0.98132079])

size : 샘플 생성(추출) 개수 및 array shape 설정

다차원의 array 형태로 무작위 샘플을 생성할 수 있다는 것도 NumPy random 모듈의 장점입니다.

# size : int or tuple of ints for setting the shape of nandom number array

In [8]: np.random.normal(size=2)

Out[8]: array([ 0.51421884, 0.22117967])

In [9]: np.random.normal(size=(2, 3))

Out[9]:

array([[-1.07004333, -0.18949583, 0.25500144],

[-0.45802699, 0.43516349, -0.58359505]])

In [10]: np.random.normal(size=(2, 3, 4))

Out[10]:

array([[[ 0.81684707, 0.67272081, -0.10441114, -0.53128038],

[ 1.02973269, -0.43813562, -1.11831825, 1.61898166],

[ 1.54160517, -0.25187914, -0.84243574, 0.18451869]],

[[ 0.9370822 , 0.73100034, 1.36155613, -0.32623806],

[ 0.05567601, 0.22239961, -1.443217 , -0.75635231],

[ 0.81645401, 0.75044476, -0.45594693, 1.18962227]]])

다양한 확률 분포로부터 난수를 생성해보겠습니다. 먼저, 정수를 뽑는 이산형 확률 분포(discrete probability distribution)인 (1-1) 이항분포, (1-2) 초기하분포, (1-3) 포아송분포로 부터 무작위 추출하는 방법을 알아보겠습니다.

각 확률분포에 대한 설명까지 곁들이면 포스팅이 너무 길어지므로 참고할 수 있는 포스팅 링크를 걸어놓는 것으로 갈음합니다.

- 이항분포 (Binomial Distribution) : http://rfriend.tistory.com/99

- 초기하분포 (Hypergeometric distribution) : http://rfriend.tistory.com/100

- 포아송 분포 (Poisson Distribution) : http://rfriend.tistory.com/101

(1-1) 이항분포로 부터 무작위 표본 추출 (Random sampling from Binomial Distribution) : np.random.binomial(n, p, size)

앞(head) 또는 뒤(tail) (n=1) 가 나올 확률이 각 50%(p=0.5)인 동전 던지기를 20번(size=20) 해보았습니다.

# (1) 이산형 확률 분포 (Discrete Probability Distribution)
# (1-1) 이항분포 (Binomial Distribution) : np.random.binomial(n, p, size)
# : 복원 추출 (sampling with replacement)
# : n an integer >= 0 and p is in the interval [0,1]

In [11]: np.random.binomial(n=1, p=0.5, size=20)

Out[11]: array([0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])

In [12]: sum(np.random.binomial(n=1, p=0.5, size=100) == 1)/100

Out[12]: 0.46999999999999997

(1-2) 초기하분포에서 무작위 표보 추출 (Random sampling from Hypergeometric distribution) : np.random.hypergeometric(ngood, nbad, nsample, size)

good 이 5개, bad 가 20개인 모집단에서 5개의 샘플을 무작위로 비복원추출(random sampling without replacement) 하는 것을 100번 시뮬레이션 한 후에, 도수분포표를 구해서, 막대그래프로 나타내보겠습니다.

# (1-2) 초기하분포 (Hypergeometric distribution)
# : 비복원 추출(sampling without replacement)
# : np.random.hypergeometric(ngood, nbad, nsample, size=None)

In [13]: np.random.seed(seed=100)

In [14]: rand_hyp = np.random.hypergeometric(ngood=5, nbad=20, nsample=5, size=100)

In [15]: rand_hyp

Out[15]:

array([1, 1, 1, 1, 1, 1, 0, 3, 0, 1, 2, 0, 1, 1, 2, 0, 1, 1, 0, 0, 0, 3, 2,
        0, 0, 0, 1, 3, 1, 1, 0, 0, 1, 0, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2,
        1, 0, 1, 1, 2, 1, 2, 1, 2, 0, 1, 2, 1, 1, 0, 1, 0, 1, 0, 1, 1, 2, 0,
        2, 0, 1, 2, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 2, 1, 1, 0, 0, 2, 1, 1,
        1, 1, 1, 1, 1, 3, 1, 0])

# result table of 100 simulation

In [16]: unique, counts = np.unique(rand_hyp, return_counts=True)

In [17]: np.asarray((unique, counts)).T

Out[17]:

array([[ 0, 27],
        [ 1, 53],
       [ 2, 16],
      [ 3, 4]], dtype=int64)

# bar plot

In [18]: import matplotlib.pyplot as plt

In [19]: plt.bar(unique, counts, width=0.5, color="blue", align='center')

Out[19]: <Container object of 4 artists>

(1-3) 포아송분포로 부터 무작위 표본 추출 : np.random.poisson(lam, size)

(random sampling from Poisson distribution)

일정한 단위 시간, 혹은 공간에서 무작위로 발생하는 사건의 평균 회수인 λ(lambda)가 20인 포아송 분포로 부터 100개의 난수를 만들어보겠습니다. 그 후에 도수를 계산하고, 막대그래프로 분포를 그려보겠습니다.

# (1-3) 포아송 분포 (Poisson Distribution)
# np.random.poisson(lam=1.0, size=None)
# Poisson distribution is the limit of the binomial distribution for large N

In [20]: np.random.seed(seed=100)

In [21]: rand_pois = np.random.poisson(lam=20, size=100)

In [22]: rand_pois

Out[22]:

array([21, 19, 22, 14, 26, 15, 25, 25, 19, 25, 15, 24, 21, 13, 26, 23, 21,
      16, 24, 17, 18, 18, 15, 18, 22, 28, 21, 18, 17, 31, 23, 13, 20, 19,
        24, 17, 20, 13, 19, 16, 16, 21, 16, 21, 19, 20, 20, 19, 19, 20, 13,
        29, 9, 13, 20, 29, 15, 15, 21, 20, 21, 18, 16, 20, 23, 18, 22, 14,
        19, 20, 18, 17, 20, 24, 20, 15, 19, 19, 25, 17, 19, 27, 20, 17, 12,
      22, 16, 23, 17, 11, 15, 19, 16, 21, 21, 25, 26, 23, 15, 25])

In [23]: unique, counts = np.unique(rand_pois, return_counts=True)

In [24]: np.asarray((unique, counts)).T

Out[24]:

array([[ 9, 1],
      [11, 1],
        [12, 1],
        [13, 5],
      [14, 2],
      [15, 8],
      [16, 7],
      [17, 7],
        [18, 7],
        [19, 12],
        [20, 12],
        [21, 10],
        [22, 4],
      [23, 5],
       [24, 4],
      [25, 6],
        [26, 3],
        [27, 1],
        [28, 1],
        [29, 2],
        [31, 1]], dtype=int64)

In [25]: plt.bar(unique, counts, width=0.5, color="red", align='center')

Out[25]: <Container object of 21 artists>

다음으로 연속형 확률분포(continuous probability distribution)인 (2-1) 정규분포, (2-2) t-분포, (2-3) 균등분포, (2-4) F 분포, (2-5) 카이제곱분포로부터 난수를 생성하는 방법을 소개합니다.

각 분포별 이론 설명은 아래의 포스팅 링크를 참조하세요.

- 정규분포 (Normal Distribution) : http://rfriend.tistory.com/102

- t-분포 (Student's t-distribution) : http://rfriend.tistory.com/110

- 균등분포 (Uniform Distribution) : http://rfriend.tistory.com/106

- F-분포(F-distribution) : http://rfriend.tistory.com/111

- 카이제곱분포 (Chisq-distribution) : http://rfriend.tistory.com/112

(2-1) 정규분포로부터 무작위 표본 추출 : np.random.normal(loc, scale, size)

(random sampling from Normal Distribution)

평균이 '0', 표준편차가 '3'인 정규분포로 부터 난수 100개를 생성해보고, 히스토그램을 그려서 분포를 bin 구간별 빈도(frequency)와 표준화한 비율(normalized percentage)로 살펴보겠습니다.

# (2) 연속형 확률분포 (continuous probability distribution)
# (2-1) 정규분포(normal distribution)로부터 난수 생성
# Draw random samples from a normal (Gaussian) distribution
# np.random.normal(loc=0.0, scale=1.0, size=None)
# mu : Mean (“centre”) of the distribution
# sigma : Standard deviation (spread or “width”) of the distribution
# size : Output shape

In [26]: np.random.seed(100)

In [27]: mu, sigma = 0.0, 3.0

In [28]: rand_norm = np.random.normal(mu, sigma, size=100)

In [29]: rand_norm

Out[29]:

array([-5.24929642, 1.02804121, 3.45910741, -0.75730811, 2.94396236,
        1.54265652, 0.66353901, -3.21012999, -0.56848749, 0.76500433,
       -1.37408096, 1.30549046, -1.75078515, 2.45054122, 2.01816242,
       -0.31323343, -1.59384113, 3.08919806, -1.31440687, -3.35495474,
        4.85694498, 4.62481552, -0.75563742, -2.52730721, 0.55355607,
        2.8112466 , 2.19300103, 4.08466838, -0.97871418, 0.16702804,
        0.66719883, -4.32965099, -2.26905692, 2.44936203, 2.25133428,
       -1.36784078, 3.5688668 , -5.07185048, -4.06919715, -3.69730354,
       -1.63331749, -2.00451521, 0.02194369, -1.83881621, 3.89924422,
       -5.19928687, -2.9499303 , 1.07252326, -4.84073551, 4.4121416 ,
       -3.56405279, -1.64923858, -2.82013848, -2.48379709, 0.3265904 ,
        1.52342877, -2.58668204, 3.74840923, -0.23883374, -2.66919444,
       -2.64539517, 0.05591685, 0.71353387, 0.04064565, -4.9065882 ,
       -3.13262963, 1.83911665, 2.20861564, 3.08076432, -4.29657183,
       -5.5235649 , 1.09827968, -0.99533141, -2.06765393, 6.10382268,
       -1.65214324, 2.25135999, -3.92097702, 1.74172001, -3.31356928,
        2.07036441, 2.0606702 , -4.70006259, 2.71492236, 2.3364672 ,
        1.28469861, 0.32661597, 0.0848509 , -1.73647747, -3.5983536 ,
       -5.11785602, 1.10749187, 5.62972028, -1.13071005, 5.49580825,
        0.0090523 , -0.2280704 , 0.01187278, -0.55504233, -7.46145461])

In [30]: import matplotlib.pyplot as plt

# histogram with frequency

In [31]: count, bins, ignored = plt.hist(rand_norm, normed=False)

# histogram with normalized percentage
In [32]: count, bins, ignored = plt.hist(rand_norm, normed=True)

(2-2) t-분포로 부터 무작위 표본 추출 : np.random.standard_t(df, size)

(Random sampling from t-distribution)

자유도(degrees of freedom)가 '3'인 t-분포로부터 100개의 난수를 생성하고, 히스토그램을 그려보겠습니다.

# (2-2) t-분포 (Student's t-distribution)로부터 난수 생성
# Draw samples from a standard Student’s t distribution with df degrees of freedom
# np.random.standard_t(df, size=None)
# df : Degrees of freedom
# size : Output shape

In [33]: np.random.seed(100)

In [34]: rand_t = np.random.standard_t(df=3, size=100)

In [35]: rand_t

Out[35]:

array([-1.70633623, 0.61010003, 0.45753218, -0.85709656, -0.42990712,
       -0.7437467 , 0.8444005 , -0.4040428 , 2.13905276, -0.10844638,
        0.67238716, 1.88720362, -2.57340231, -0.69724955, -3.40107659,
       -0.57745433, -0.36487447, 3.95862541, 2.34665412, -0.94310449,
        0.81852816, -0.48391289, 0.01380029, -0.43003718, -2.25784604,
       -0.18216847, -1.21433582, 0.46347964, 0.50024665, -1.1595865 ,
        0.02358778, -1.18879826, -0.38767689, 2.24289791, -2.80798472,
       -2.838893 , -0.39222432, -1.61499121, -1.78498184, 0.44618923,
       -1.5181203 , 5.44389927, 4.17743903, -0.49617121, -0.02996529,
        0.89595015, 1.14860485, -3.16541308, 0.14279246, 0.83121743,
       -0.32403947, 0.59297222, -0.39750861, 0.57634934, 0.81587478,
       -1.29367024, -0.28580516, -0.48422765, -0.83697192, 0.50702557,
       -1.98915687, 2.92965716, -1.19522074, 0.65511251, 2.12055605,
       -0.03640814, -0.41931018, 3.31199804, -0.61725596, 0.79681204,
        1.86805014, -0.54345259, 3.11909936, 0.86410458, 2.66353682,
        0.23735454, -0.76306875, 0.24471792, -0.13515045, 0.26402784,
        4.68946895, 0.70573709, -0.17783758, 1.85205955, -0.18352788,
       -0.65713104, -0.73674278, 2.16549569, 1.22326388, -0.5112858 ,
       -1.54451989, -1.73428432, 0.46947115, 1.66594804, 0.51687137,
        1.51361314, -2.22193709, 0.89557421, 0.56222653, -0.55564416])

# histogram

In [36]: import matplotlib.pyplot as plt

In [37]: count, bins, ignored = plt.hist(rand_t, bins=20, normed=True)

(2-3) 균등분포로 부터 무작위 표본 추출 : np.random.uniform(low, high, size)

(random sampling from Uniform distribution)

최소값이 '0', 최대값이 '10'인 구간에서의 균등분포에서 100개의 난수를 만들어 보고, 히스토그램을 그려서 분포를 확인해보겠습니다.

# (2-3) 균등분포 (Uniform Distribution)로 부터 난수 생성
# Draw samples from a uniform distribution
# np.random.uniform(low=0.0, high=1.0, size=None)
# low : Lower boundary of the output interval
# high : Upper boundary of the output interval
# [low, high) : includes low, excludes high

In [38]: np.random.seed(100)

In [39]: rand_unif = np.random.uniform(low=0.0, high=10.0, size=100)

In [40]: rand_unif

Out[40]:

array([ 5.43404942, 2.78369385, 4.24517591, 8.44776132, 0.04718856,
        1.21569121, 6.70749085, 8.25852755, 1.3670659 , 5.75093329,
        8.91321954, 2.09202122, 1.8532822 , 1.0837689 , 2.19697493,
        9.78623785, 8.11683149, 1.71941013, 8.16224749, 2.74073747,
        4.31704184, 9.4002982 , 8.17649379, 3.3611195 , 1.75410454,
        3.72832046, 0.05688507, 2.52426353, 7.95662508, 0.15254971,
        5.98843377, 6.03804539, 1.05147685, 3.81943445, 0.36476057,
        8.90411563, 9.80920857, 0.59941989, 8.90545945, 5.76901499,
        7.42479689, 6.30183936, 5.81842192, 0.20439132, 2.10026578,
        5.44684878, 7.69115171, 2.50695229, 2.8589569 , 8.52395088,
        9.75006494, 8.84853293, 3.59507844, 5.98858946, 3.54795612,
        3.40190215, 1.7808099 , 2.37694209, 0.44862282, 5.0543143 ,
        3.76252454, 5.92805401, 6.29941876, 1.42600314, 9.33841299,
        9.46379881, 6.02296658, 3.8776628 , 3.63188004, 2.04345277,
        2.76765061, 2.46535881, 1.73608002, 9.66609694, 9.570126 ,
        5.97973684, 7.31300753, 3.40385223, 0.92055603, 4.63498019,
        5.08698893, 0.88460173, 5.28035223, 9.92158037, 3.95035932,
        3.35596442, 8.05450537, 7.54348995, 3.13066442, 6.34036683,
        5.40404575, 2.96793751, 1.10787901, 3.12640298, 4.5697913 ,
        6.5894007 , 2.54257518, 6.41101259, 2.00123607, 6.57624806])

# histogram

In [41]: import matplotlib.pyplot as plt

In [42]: count, bins, ignored = plt.hist(rand_unif, bins=10, normed=True)

이산형 균등분포에서 정수형 무작위 표본 추출 : np.random.randint(low, high, size)

(Random INTEGERS from discrete uniform distribution)

이산형 균등분포(discrete uniform distribution)으로 부터 정수형(integers)의 난수를 만들고 싶으면 np.random.randint 를 사용하면 됩니다. 모수 설정은 np.random.uniform()과 같은데요, high에 설정하는 값은 미포함(not include, but exclude)이므로 만약 0부터 10까지 (즉, 10도 포함하는) 정수형 난수를 만들고 싶으면 high=10+1 처럼 '+1'을 해주면 됩니다.

# np.random.randint : Discrete uniform distribution, yielding integers

In [43]: np.random.seed(100)

In [44]: rand_int = np.random.randint(low=0, high=10+1, size=100)

In [45]: rand_int

Out[45]:

array([ 8, 8, 3, 7, 7, 0, 10, 4, 2, 5, 2, 2, 2, 1, 0, 8, 4,
       10, 0, 9, 6, 2, 4, 1, 5, 3, 4, 4, 3, 7, 1, 1, 7, 7,
        0, 2, 9, 9, 3, 2, 5, 8, 1, 0, 7, 6, 2, 0, 8, 2, 5,
       10, 1, 8, 10, 1, 5, 4, 2, 8, 3, 5, 0, 9, 10, 3, 6, 3,
        4, 10, 7, 6, 3, 9, 0, 4, 4, 5, 7, 6, 6, 2, 10, 4, 2,
        7, 1, 10, 6, 10, 6, 0, 10, 7, 2, 3, 5, 4, 2, 4])

# histogram

In [46]: import matplotlib.pyplot as plt

In [47]: count, bins, ignored = plt.hist(rand_int, bins=10, normed=True)

(2-4) F-분포로 부터 무작위 표본 추출 : np.random.f(dfnum, dfden, size)

(Random sampling from F-distribution)

자유도1이 '5', 자유도2가 '10'인 F-분포로 부터 100개의 난수를 만들고, 히스토그램을 그려서 분포를 확인해보겠습니다.

# (2-4) F-분포 (F-distribution)으로부터 난수 생성
# Draw samples from an F-distribution (Fisher distribution)
# numpy.random.f(dfnum, dfden, size=None)
# dfnum : degrees of freedom in numerator
# dfden : degrees of freedom in denominator

In [48]: np.random.seed(100)

In [49]: rand_f = np.random.f(dfnum=5, dfden=10, size=100)

In [50]: rand_f

Out[50]:

array([ 0.17509245, 1.34830314, 0.7250835 , 0.55013536, 1.49183341,
        1.19802261, 1.24949706, 0.70015548, 0.71890936, 0.37020715,
        4.70371284, 0.86726338, 5.12146941, 0.12848202, 0.68237285,
        0.79663258, 1.36935299, 1.08005188, 0.99311831, 0.15607878,
        3.7778542 , 2.35609305, 0.16850985, 0.98599364, 1.12567067,
        3.21579679, 0.87982087, 0.38319493, 0.96834789, 1.00428004,
        1.65589171, 1.2581278 , 1.71881244, 0.11251552, 1.65949951,
        1.15809569, 1.33210756, 0.37989215, 0.252446 , 1.22409406,
        1.86571485, 0.42345727, 3.52740557, 1.32989807, 2.0095314 ,
        1.20016474, 3.5067706 , 0.67232354, 2.79268109, 0.38115844,
        1.3978449 , 0.7089553 , 2.12685211, 0.73462708, 2.03686026,
        0.50287078, 0.31183315, 1.66994305, 5.36906534, 1.55708073,
        2.66826698, 1.31701804, 0.66126086, 0.19123589, 0.58223398,
        0.41897952, 2.17842598, 0.98481411, 0.46953552, 0.99266818,
        0.4463218 , 0.43809118, 0.37791494, 2.46417893, 0.91230902,
        0.50247167, 1.0960922 , 0.61328846, 2.07107491, 0.65524443,
        4.00311763, 1.61430287, 0.16159395, 2.42851301, 1.38124899,
        0.33750889, 1.93776135, 1.55612023, 0.59284748, 0.56785228,
        1.09259657, 1.22611626, 0.0744978 , 0.10373193, 1.95616674,
        2.29130443, 0.62968361, 0.67477008, 0.60981642, 0.58408102])

# histogram

In [51]: import matplotlib.pyplot as plt

In [52]: count, bins, ignored = plt.hist(rand_f, bins=20, normed=True)

(2-5) 카이제곱분포로 부터 무작위 표본 추출 : np.random.chisquare(df, size)

(Random sampling from Chisq-distribution)

자유도(degrees of freedom)가 '2'인 카이제곱분포로부터 100개의 난수를 생성하고, 히스토그램을 그려서 분포를 확인해보겠습니다.

# (2-5) 카이제곱분포 (Chisq-distribution)로 부터 난수 생성
# Draw samples from a chi-square distribution
# np.random.chisquare(df, size=None)
# df : Number of degrees of freedom

In [53]: np.random.seed(100)

In [54]: rand_chisq = np.random.chisquare(df=2, size=100)

In [55]: rand_chisq

Out[55]:

array([ 1.56791674e+00,   6.52483769e-01,   1.10509323e+00,
         3.72577379e+00,   9.46005029e-03,   2.59236110e-01,
         2.22187032e+00,   3.49570821e+00,   2.94001314e-01,
         1.71177147e+00,   4.43873095e+00,   4.69425742e-01,
         4.09939940e-01,   2.29423517e-01,   4.96147209e-01,
         7.69095282e+00,   3.33925872e+00,   3.77341773e-01,
         3.38808346e+00,   6.40613699e-01,   1.13022638e+00,
         5.62781567e+00,   3.40364791e+00,   8.19283487e-01,
         3.85739072e-01,   9.33081811e-01,   1.14094971e-02,
         5.81844910e-01,   3.17596456e+00,   3.07450508e-02,
         1.82680669e+00,   1.85169520e+00,   2.22193172e-01,
         9.62350625e-01,   7.43158820e-02,   4.42204683e+00,
         7.91831906e+00,   1.23627383e-01,   4.42450081e+00,
         1.72030053e+00,   2.71331337e+00,   1.98949905e+00,
         1.74379278e+00,   4.13018033e-02,   4.71511953e-01,
         1.57353105e+00,   2.93167254e+00,   5.77218949e-01,
         6.73452471e-01,   3.82643217e+00,   7.37827846e+00,
         4.32309651e+00,   8.91036809e-01,   1.82688432e+00,
         8.76376262e-01,   8.31607381e-01,   3.92226832e-01,
         5.42815006e-01,   9.17994841e-02,   1.40813894e+00,
         9.44019133e-01,   1.79692816e+00,   1.98819039e+00,
         3.07702183e-01,   5.43139774e+00,   5.85166185e+00,
         1.84409785e+00,   9.81282349e-01,   9.02561614e-01,
         4.57179904e-01,   6.48042320e-01,   5.66147763e-01,
         3.81372088e-01,   6.79897935e+00,   6.29369648e+00,
         1.82247546e+00,   2.62832513e+00,   8.32198571e-01,
         1.93144279e-01,   1.24537005e+00,   1.42139617e+00,
         1.85239984e-01,   1.50170184e+00,   9.69653206e+00,
         1.00517243e+00,   8.17731091e-01,   3.27413768e+00,
         2.80768685e+00,   7.51035408e-01,   2.01044435e+00,
         1.55481738e+00,   7.04210092e-01,   2.34838981e-01,
         7.49795080e-01,   1.22121505e+00,   2.15139414e+00,
         5.86749873e-01,   2.04942998e+00,   4.46596145e-01,
         2.14369617e+00])

# histogram

In [56]: import matplotlib.pyplot as plt

In [57]: count, bins, ignored = plt.hist(rand_chisq, bins=20, normed=True)

많은 도움 되었기를 바랍니다.

pandas DataFrame 에 대한 무작위 표본 추출 방법은 https://rfriend.tistory.com/602 를 참고하세요.

728x90

저작자표시 비영리 변경금지

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

[Python NumPy] 배열과 배열, 배열과 스칼라 연산 (Numerical Operations between Arrarys and Scalars) (2)	2017.02.04
[Python NumPy] ndarray 데이터 형태 지정 및 변경 (Data Types for ndarrays) (0)	2017.01.30
[Python NumPy] 다차원 배열 ndarray 만들기 (0)	2017.01.14
[Python pandas] Series, DataFrame 행, 열 생성(creation), 선택(selection, slicing, indexing), 삭제(drop, delete) (0)	2017.01.03
[Python] 데이터 정렬 (sort, arrange) : DataFrame.sort_values(), sorted(), list.sort() (2)	2016.12.31

Posted by Rfriend

R, Python 분석과 프로그래밍의 친구 (by R Friend)

[Python NumPy] 무작위 표본 추출, 난수 만들기 (random sampling, random number generation)

'Python 분석과 프로그래밍 > Python 데이터 전처리' 카테고리의 다른 글

카테고리

태그목록

티스토리툴바