R, Python 분석과 프로그래밍의 친구 (by R Friend)

r
Python
파이썬
pandas
PostgreSQL
numpy
판다스
ggplot2
GreenPlum
langchain
랭체인
Greenplum Database
data.table
greenplum db
인테리어쇼
인쇼
인쇼랑 인테리어
postgresql db
군집분석
matplotlib
인테리어Show
pytorch
dplyr 패키지
HIVE
지리공간 데이터 분석
PostgreSQL database
dplyr package
Cluster Analysis
낮은 수준의 그래프 함수
dataframe
tensorflow
Inner Join
PL/Python
데이터 재구조화
데이터 변환
PostGIS
kubeflow
sf package
pandas DataFrame
PL/R
넘파이
max()
min()
차원 축소
기계학습
github
선형대수
Left Join
chatGPT
ipython-sql
dplyr
Seaborn
Python pandas
window function
Keras
과적합
다변량 그래프
low level graphic function
R graphics
Graphical Parameters
그래프 모수
scatter plot
일원분산분석
list comprehension
연관규칙
산점도
len()
히스토그램
역행렬
분산
list
scipy
SQL
ChatOpenAI
ChatPromptTemplate
.SD
tokenization
linear interpolation
apache madlib
Rfriend
raster data
GPDB
matplotlib.pyplot
분산병렬처리
로지스틱 회귀분석
apply()
subset
범용함수
ufuncs
Universal Functions
missing value
right join
Dimensionality reduction
FULL JOIN
for loop
agglomerative hierarchical clustering
응집형 계층적 군집화
맨하탄 거리
association rule
특이값 분해
eigenvector
ginv()
invertible matrix
chisq.test()
correlation coefficient
quantile()
Cleveland dot plot
클리브랜드 점 그래프
geom_point()
standardization
binarization
이항변수화
ifelse()
mean()
var()
median()
t-test
히트맵
heatmap
중앙값
논리 연산자
gdal
eigenvalue
filter()
count()
비교 연산자
고유벡터
고유값
variance
union all
Clustering
표준편차
Else
평균
dictionary
조건문
최소값
최대값
Array
하이브
PCA
점
Copy
선
리스트
Few-shot prompting
Configure chain internals at runtime
Retrieval-Augmented Generation
text embedding
recall rate
분류 모델 성능 평가 지표
tf.constant()
element-wise product
행렬 곱
인테리어 Show
statsmodels
레스터 데이터
이차 인덱스
secondary indices
좌표 참조 시스템
Coordinate Reference Systems
지리공간 벡터 데이터
st_sfc()
sf 패키지
density based spatial clustering of applications with noise
plotly
.SDcols
DT[i j by]
Google colab
tensorflow 2
np.matmul()
seasonality
시계열 분해
time series decomposition
pd.to_datetime()
토큰화
DBSCAN
python numpy
ImportError: cannot import name 'urlopen'
TypeError: the JSON object must be str
import json
json.loads()
pythpn
R Shiny
sns.distplot()
corr()
predict whether breast cancer or not
데이터 전처리
유방암 예측
likelihood
Spectral Analysis
파이토치
Data Preprocessing
grouped.agg()
dict.get()
groupby()
np.dot()
np.argmax()
jupyter notebook
yaml
lower()
python string methods
openAI
Semantic Search
w3schools.com/sql
assign()
R proxy package
central limit theorem
universal function
Unary ufuncs
dtype
data reshape
np.where()
결측값
pd.concat()
pd.read_csv()
파이썬 판다스
group_by()
mutate()
distinct()
select()
Kubernetes
matrix multiplication
오미자 생과
장수 오미자
dbeaver
Sankey Diagram
R 시각화
비유사성 측도
vectorization
표준화 거리
유클리드 거리
sequence analysis
순차분석
연관규칙 분석
diagonalization
backpropagation
3차원 산점도
범례 추가
title()
col.main
hist()
arrange()
map()
inner product
벡터의 곱
the inverse of a matrix
unit matrix
항등행렬
diagonal matrix
범주형 자료분석
categorical data analysis
two-way ANOVA
다중비교
aov()
one-way ANOVA
quartile
IQR
일변량 연속형 자료 요약
contingency table
교차표
xtabs()
table()
Wilcoxon signed rank test
wilcox.test()
prop.test()
t.test()
args=list(df=1))
stat_function(fun=dunif)
Uniform Distribution
균등분포
stat_function(fun=df)
F-분포
Exponential Distribution
지수분포
t-Distribution
geom_vline()
Bubble Chart
버블 그래프
산점도 행렬
correlation matrix
aggregate()
melt()
annotate()
Line Graph
Mosaic Chart
Pie Chart
geom_bar()
Bar Chart
dummy variable
factor analysis
dimension reduction
qqline()
qqnorm()
범주화
t()
rbind()
cbind()
within()
%*%
lapply()
diff()
IQR()
sd()
exp()
log()
sqrt()
attach()
names()
cat()
capture.output()
깃헙
Overfitting
psycopg2
summary()
마할라노비스 거리
이원분산분석
join()
standard deviation
Confusion matrix
단위행렬
도수분포표
산술 연산자
Principal Component Analysis
sort()
str()
elif
identity matrix
신수정
상관계수
dot product
Fast fourier transform
문자열 매칭
대각행렬
전치행렬
벡터공간
스펙트럼 분석
다각형
막대 그래프
cross join
정밀도
SvD
outlier
outer join
주성분분석
PROJ
Loc
polygon
length()
요인분석
Histogram
합집합
Median
이동평균
except
튜플
Machine Learning
Tuple
fft
병렬처리
목공
graphviz
right outer join
left outer join
precision
geos
urllib
order by
group by
정규 표현식
파이프라인
HAVING
최빈값
차분
Broadcasting
Dimension
crs
행렬
graphics
min
Serialization
Pipeline
표준화
내적
문자열
벡터
json
반복문
vector
rank
regular expression
우도
split
size
시각화
RAG
UNION
trend
Mode
AUC
색깔
If
MAX
인테리어
like
Point
Break
리눅스
차원
sort
Linux
posterior probability
Prior probability
Bayesian Statistic
베이지언 통계
subword fertility
토크나이저 성능평가 지표
Unigram Language Model
SentencePiece
WordPiece
BPE (Byte Pair Encoding)
서브워드 토큰화 알고리즘
subword tokenization algorithms
number of branches
분기 수
확률의 역수
퍼플렉서티
혼란도
언어 모델 성과 평가 지표
CUSUM (Cumulative Sum Control Chart)
변화점 감지 알고리즘
통계적 공정 관리 SPC
시계열의 평균 또는 분산의 갑작스러운 변화 Shift
점진적인 시계열 데이터 변동
RecursiveCharacterTextSplitter
PyPDF Loader
Chat History Memory
ConversationalRetrievalChain
RetrievalQA
재귀적으로 모든 URL의 데이터 로드하기
load all URLs under a root directory
Root directory 밑의 모든 URL에 있는 Text 가져오기
langchain.document_loaders.recursive_url_loader
RecursiveUrlLoader
Recursive URL
SystemMessagePromptTemplate
SystemMessage
HumanMessage
input messages
input prompt
OpenAI Chat Models
total cost
completion tokens
prompt tokens
total tokens
tracking OpenAI API token usage
OpenAI API 호출에 대한 토큰 사용 트래킹
get_openai_callback()
langchain.cache
set_llm_cache
SQLite Cache
InMemoryCache
astream_log
비동기 인터페이스
async interface
표준 인터페이스
standard interface
Chain-of-Thought Prompting이란 무엇인가?
CoT Prompting
Chain-of-Thought 프롬프팅
Chain-of-Thought Prompting elicits reasoning in Large Language Models
Chain-of-Thought Prompting
프롬프트 파이프라인
기존 프롬프트를 재활용
여러개의 프롬프트를 조합
Pipeline prompts
Final prompt
PipelinePromptTemplate
부분적 프롬프트 템플릿
부분적 변수 초기화
partial_variables
prompt.partial()
문자열 값을 반환하는 함수를 가진 부분적 프롬프트 템플릿
partial prompt templates with functions
Partial prompt templates
Semantic Similarity Example Selector
one-shot prompting
Dynamic few-shot prompting
Chat Models
Example selector types
SemanticSimilarityExampleSelector
FewShotPromptTemplate
demonstration through examples
예시를 통한 시연
퓨샷 프롬프팅
최신 정보를 웹 검색해서 LLM 모델에 제공하여 답변 생성하기
지식단절
knowledge cutoff
Bing API
langchain.tools.DuckDuckGoSearchRun
adding web search tools
웹 검색 툴 추가
how to install postgresql and pgvector in Google Colab?
IVFFlat
vector indexing
create extension vector
Google Colab에 pgvector 설치하는 방법
Google Colab에 PostgreSQL 설치하는 방법
pgvector extension
여러개 템플릿에서 질문과 가장 유사한 템플릿 검색
질문과 템플릿 유사도 검색
구문 검색으로 템플릿 라우팅하기
Routing templates dynamically
Python Code generator
langchian_experimental.utilities.PythonREPL
Python Read-Eval-Print Loop
PythonREPL
Writing Python codes
Python code 생성
SQL Query in natural language
자연어로 질의응답
자연어로 질의해서 SQL Query 하기
Querying a SQL DB
WebBaseLoader()
웹사이트 문서 로딩
RecursiveCharacterTextSplitter()
Vector DB 시장 경쟁 현황
Vector DB Landscape
Chroma Vector DB
langchain.memory.chst_message_histories
MessagePlaceholder
RedisChatMessageHistory
RunnableWithMessageHistory
Add Memory
Add message history
이전 대화 내역 기억 추가
LLM과 ChatModel의 차이
실행 중에 chain의 내부 대안 설정
Prompt 설정
LLM 설정
실행 중에 특정 Runnable의 대안 설정
configurable_alternatives()
실행 중 HubRunnables 내 Prompts 설정하기
실행 중 LLM 내 특정 파라미터 설정하기
runnable의 특정 fileds 설정하기
ConfigureableField()
configurable_fileds()
인풋에 조건에 따라 라우팅하기
동적으로 라우팅하기
LangChain Language Expression
RunnableBranch
%%timeit
chain 개별 처리 시간 vs. 병렬 처리 시간 비교
독립적인 체인을 병렬 처리하기
run independent chains in parallel
RunnableParallel
print("\r")
순환물
if elif else 조건절
embeddings
대규모 언어모델 답변 생성
정보 증강
RangChain
검색-증강 생성 방법
ChatModel
output parser
prompt template
LangChain Expression Language (LCEL)
Plugins and Connectors
LangServe
LangChain 템플릿
AI 앱 Orchestration
LLM 애플리케이션 개발 프레임워크
시맨틱커널
Semantic Kernel
model.state_dict()
model.load_state_dict()
torch.load()
torch.save()
general checkpoint 다시 로딩하기
general checkpoint 저장하기
모델 로딩하기
모델 가중치 저장하기
ToTensor()
transforms.Compose()
target_transform
datasets
원소 간 곱
ValueError: Expected more than 1 value per channel when training
스킵 연결
Xavier/ Glorot Initialization
ReLU 활성화함수
기울기 소실 문제
Access to the weights and biases in the PyTorch's trained model
계산 그래프
수치 미분
backward propagation of errors
bayesian updating
켤레 사전 분포
bayesian statistics
대안적 설명의 제거
원인과 효과의 공변성
시간적 우선순위
CLT simulation using python
표본 평균 분포
무작위 표본추출
중심극한의 정리
빈도론자와 베이지언의 p-value 해석 차이
independent two-sample t-test
stats 메소드
독립된 두 표본 t-검정
p-값
dict.keys()
split a python dictionary by proportions
Dictionary를 특정 비율로 분할하기
Dictionary에서 특정 값을 가진 sub Dictionary 만들기
document loader
FewShotChatMessagePromptTemplate
PromptTemplate
LangSmith
텍스트를 벡터로 표현
text clustering
텍스트 임베딩
sentence-transformers model
Transform each element of a list-like to a row.
리스트 내 여러개 원소를 각 행으로 변환하기
리스트를 각 행으로 풀어서 변환하기
pandas.Series.explode()
ChatGPT 채팅 API
"text-embedding-ada-002"
ChatGPT API 사용하기
OpenAI API 사용하기
OpenAI ChatGPT API 사용
정수 인덱스와 문자열 레이블 매핑
레이블과 정수 인덱스 매핑
고유의 문자열 키에 정수를 값으로 가지는 매핑 사전
고유의 정수 키에 문자열을 값으로 가지는 매핑 사전
정수에 문자열을 매핑하기
문자열에 정수를 매핑하기
리스트의 숫자를 문자열로 변환하기
리스트의 문자열을 숫자로 변환하기
python-pptx
pdf 파일에서 텍스트 추출
extract text from a PDF file
extract text from a PowerPoint file
PowerPoint 파일에서 텍스트 추출하기
Python으로 PDF 파일에서 텍스트 추출하기
Python으로 파워포인트에서 텍스트 추출하기
create OpenAI API key
챗GPT API key 생성
ChatGPT API key 생성
OpenAI API key 발급
Python packages 한꺼번에 설치하기
Python modules installment at once
Python 모듈 한꺼번에 설치
pip install -r requirements.txt
Python 패키지 한꺼번에 설치
파이썬 패키지 한꺼번에 설치
파이썬 모듈 한꺼번에 설치
vector similarity search
벡터 유사도 검색
large scale AI-powered search in Greenplum
대규모 AI 기반 검색 구축
pgvector
max() over(partition by group_col)
max() over(partition by null)
전체 행을 대상으로 요약 통계량 계산
OVER(PARTITION BY NULL)
OVER(PARTITION BY)
OVER()
그룹별 요약 통계량 계산
3D surface plot
3D scatter plot
3차원 표면도
array_agg_array
서로 다른 차원의 배열 묶기
차원이 다른 array 를 concatenation 할 때 에러 발생
Arrays with differing element dimensions are not compatible for concatenation.
cannot concatenate incompatible arrays
2차원으로 Array 묶기
array append
2D Array Aggregation
DataFrame에 대해 element-wise 로 함수 적용
Series 에 대해 element-wise 로 함수 적용
함수 적용하기
applymap()
dictionary 와 list comprehension
eval() 과 list comprehension
중첩 순환문과 list comprehension
조건절을 포함한 list comprehension
for loop 순환문
eval() SyntaxError
표현식(expressions)
eval() 메소드
SyntaxError: unexpected EOF while parsing
evaluate expressions dynamically
동적으로 문자열 표현식을 평가하여 실행
동적으로 표현식 평가
eval() function
dict.items()
PL/Python UDF for getting multiple elements in an array in PostgreSQL
how to get multiple values in an array using position in Greenplum
how to get multiple elements in an array in PostgreSQL
PL/Python in Greenplum
PL/Python in PostgreSQL
Array 에서 여러개 위치의 원소 값을 PL/Python 사용자 정의함수를 사용해서 가져오기
Array 에서 시작~끝 위치의 원소 값을 Slicing 해서 가져오기
Array 에서 특정 위치의 하나의 원소 값 가져오기
string_agg() 함수의 반대
the opposite of STRING_AGG()
문자열 Array를 나누어서 개별 문자열 행으로 풀기
문자열을 구분자를 기준으로 나누어서 여러개의 원소를 가진 Array 로 만들기
개별 문자열을 하나의 문자열로 합치기
unnest(string_to_array())
string_to_array()
string_agg()
SQL Query 실행 순서
order of execution of a SQL Query
gppkg -q --all
plcontainer image-list
pl/container runtime-show
PL/Python UDF
PL/Python 사용자정의함수
설치된 PL/Container Runtime ID 확인
설치된 Docker Container Image 이름 확인
설치된 Package 이름 확인
SQL Queries sequence
SQL 쿼리 처리 순서
SQL Query 처리 순서
하위 폴더 내 모든 파일 이름 가져오기
폴더 내 모든 파일 이름 가져오기
unix style filename pattern matching
unix style pathname pattern expansion
유닉스 스타일 파일명 패턴 매칭
유닉스 스타일 경로명 확장
fnmatch
인덱스 반복자
다차원 인덱스 반복자
iterable objects
반복 가능한 객체
np.ndenumerate()
jupyter notebook cell all output clear
주피터 노트북 커널 새로 시작하고 모든 셀 실행하기
kernel restart
주피터노트북 모든 셀 결과 지우기
run all cells
주피터 노트북 모든 셀 실행하기
주피터 노트북 새로 시작하기
주피터 노트북 셀의 모든 결과 지우기
cell all output clear
모델 저장하기
문서 검색
에드워드 버거 감독
그 명예는 어디서 찾느냐?
내 아들이 전쟁에서 죽었는데
전쟁은 열병 같아. 아무도 원하지 않지만 갑자기 들이닥쳤지. 신은 가만히 보고만 있어
1918년 12월 전쟁이 끝났을 때는 오직 서부 전선에 적은 이동만 있었을 뿐이었다.
전쟁은 참호전의 양상으로 굳어졌고
전쟁은 늙은이들이 일으키고 피는 젊은이들이 흘린다
토니 발레롱가
돈 셜리
It takes courage to change people's hearts.
Because genius is not enough.
사람의 마음을 움직이려면 용기가 필요해요
왜냐면 천재성만으론 부족하거든
Torchvision 내장 데이터
plt.subplots()
image visualization
이미지 시각화
torchvision.datasets
All quiet on the western front
영화 서부 전선 이상없다
영화 그린 북
prompt engineering
프롬프트 엔지니어링
torch.hsplit()
torch.vsplit()
텐서 세로로 나누기
텐서 가로로 나누기
torch.split()
텐서 나누기
텐서 세로로 합치기
텐서 가로로 합치기
PyTorch 텐서 합치기
torch.stack()
torch.row_stack()
torch.vstack()
torch.column_stack()
torch.hstack()
torch.concat()
ValueError: only one element tensors can be converted to Python scalars
원하는 위치는 부분집합 가져오기
numpy array를 텐서로 변환하기
torch.empty()
torch.ones()
torch.zeros()
0으로 채워진 텐서
텐서 만들기
tensor objects
행렬 곱셈 (matrix multiplication)
GPU로 디바이스 등록
CPU로 디바이스 등록
GPU vs. CPU 성능 비교
텐서 객체 등록
디바이스 정의
Numpy ndarray 대비 PyTorch tensor 의 성능 비교
numpy ndarray
단어 임베딩
Brier score
evaluation metrics for the imbalanced data classification model
F0.5 Score
F2 Score
Geometric-Mean
불균형 데이터 분류모델의 성과평가 지표
불균형 데이터 분류 모델링 방법
적정한 분류 모델 성능 평가 지표
확률 튜닝 알고리즘
비용 가중치 조정
불균형 데이터 샘플링 방법
불균형 데이터 처리방법
불균형 데이터
F-1 점수
evaluation metrics for classification model
interpolation in parallel
결측값 선형 보간 병렬 처리
시계열 데이터 선형 보간
특정 문자를 포함하지 않는 모든 칼럼 선택
특정 문자로 끝나는 모든 칼럼 선택
특정 문자로 시작하는 모든 칼럼 선택
DataFrame.filter(regex)
DataFrame.filter(like)
DataFrame.filter(items)
특정 조건에 맞는 칼럼 선택해서 가져오기
특정 조건에 맞는 행 선택해서 가져오기
DataFrame.filter()
지에엠컨설팅
직장 생활의 착각과 진실
직장 생활 심리학 참고서
심리학 직장 생활을 도와줘
칼럼 유형별 칼럼 제외
칼럼 유형별 칼럼 선택
select_dtypes()
pandas.DataFrame
sort elements in a list
mapping elements in a list using dictionary
convert chacracter elements in list into numeric elements
리스트의 숫자형 원소를 문자형 원소로 변환
리스트 원소 정렬
리스트에서 다른 리스트의 원소 빼기
리스트의 원소를 사전형의 Key:Value 기준으로 매핑하여 변환
리스트의 문자형 원소를 숫자형 원소로 변환
안수연 옮김
세르주 블로크 그림
다비드 칼리 글
나는 기다립니다...
string replacement using key:value mapping dictionary
키 값 매핑 사전형
key value mapping
칼럼 재정렬
캄럼 순서 바꾸기
reverse DataFrame column
칼럼 순서를 뒤집기
칼럼을 역순으로 바꾸기
칼럼 순서 재정렬
text generation
frequency domain analysis in parallel
spectral analysis in parallel
spectrum analysis in parallel
PL/Container
perplexity
깨어나게 하고 행동하게 하는 555개의 통찰
통찰의 시간
시계열 그래프
취업자 수 증가율
짝을 이룬 t-test 가정사항
매칭된 짝 t-test
반복측정 표본 t-test
짝을 이룬 차이 t-test
종속 표본 t-test
scipy.stats.ttest_rel
paired samples t-test
짝을 이룬 t-test
등분산 가정
정규성 가정
t-test 가정사항
scipy.stats
t 통계량
두 집단 간 평균 차이 검정
LLE 알고리즘
매너폴드 학습
비선형 차원 축소
unroll swiss roll using LLE
manifold learning approach
Locally Linear Embedding
LLE
dimensionality reduction via projection approach
투사를 통한 차원 축소
precision recall curve
trim() 공백없애기
테이블 생성해서 데이터 추가하기(insert)
random_normal()
generate random number from normal distribution
generate_series()
샘플 테이블 만들기
정규분포에서 난수 생성
의사결정 트리 그래프
tree plot
dot language
sklearn.tree.export_graphviz
sklearn.tree.export_text
sklearn.tree.plot_tree
의사결정나무 시각화
decision tree visualization
pandas DataFrame에 tokenization 함수 적용하기
pandas DataFrame에 텍스트 전처리 함수 적용하기
텍스트 특수문자 제거
re package
텍스트 토큰화
텍스트 데이터 전처리
~'(a|b|c)'
any() array{}
how to match multiple string matching in PostgreSQL
OR 조건으로 여러개의 문자열 매칭하는 SQL
여러개의 문자열 매칭
matching multiple conditions
api 사용방법
코사인 유사도
conjugate prior
Bayes' Theorem
Out of Vocabulary problem
sklearn TfidfVectorizer
TF-IDF score
단어의 중요도
TF-IDF values for corpus
Inverse Document Frequency
Term Frequency
Mary Pipher
Letters to a young therapist
Regular Expression Matching
도커 이미지 vs. 도커 컨테이너
도커 컨테이너가 가상 머신보다 좋은 장점
왜 도커인가
docker container vs. docker image
도커 컨테이너 vs. 가상 머신
docker container vs. virtual machine
docker container 란 무엇인가
PyPDF2
유클리디언 거리
원소 간 짝을 이루어서 계산한 거리를 행렬로 반환
distance.cdist()
distance.pdist()
scipy.spatial.distance
원소 간 거리 계산
pairwise distance calculation
diatance
HNSW
morphemes & lexemes
phonemes
형태소와 어휘항목
언어 구조의 구성요소
언어란 무엇인가
NLP Tasks
자연어 처리 응용 분야
Text Analytics
텍스트 데이터 마이닝
np.fill_diagonal()
np.diagonal()
creating a diagonal matrix
대각행렬 만들기
fill main diagonal
flip by main diagonal
return main diagonal
주대각성분 채우기
주대각성분 기준으로 뒤집기
주대각성분 가져오기
THE PIT
콘서트 ONLY
한세대학교 공연예술학과 교수
내 마음의 풍금 작곡
THE PIT ORCHESTRA
김문정씨의 첫번째 에세이
뮤지컬 음악감독
이토록 찬란한 어둠
G-mean
ValueError: Cannot mask with non-boolean array containing NA / NaN values
pandas DataFrame에서 문자열 매칭해서 데이터 가져오기
indexing using pattern matching in pandas DataFrame
문자열 패턴 매칭해서 데이터 가져오기
문자열 Series 에서 패턴 매칭
pandas.Sereis.str.contains()
pandas Sereis
lead() over (partition by order by)
avg() over (partition by order by)
rank() over (partition by order by)
row_number() over (partition by order by)
ntile() over (partition by order by)
first_value() over(partition by order by)
lag() over(partition by order by)
시계열 데이터
상처받은 치유자
자유로운 영혼의 달콜 쌉싸롬한 사랑이야기
LLE (Locally-Linear Embedding)
projection based dimensionality reduction
다중공선성 문제 해결
과적합 방지
add arrow
add circle
add ellipse
add rectangle
그래프에 다각형 추가하기
그래프에 화살표 추가하기
그래프에 직사각형 추가하기
그래프에 원 추가하기
adding shapes in matplotlib's plot
시인 이소호
이소호 에세이
시키는 대로 제멋대로
interactive bubble chart using ploly
애니메이션 버블 그래프
animated bubble chart
plotly bubble chart
python matplotlib
sankey element using holoviews
flow visualization
holoviews.extension('bokeh')
holoviews
생키 다이어그램
Extract substring
Split string on delimiter and return the given field
문자열에서 구분자로 나누어서 일부분 가져오기
문자열에서 위치 기반으로 일부분 가져오기
split_part() 함수
substr() 함수
substring() 함수
SimpleImputer
StandardScaler
scikit learn pipeline
column transformer with mixed types
숫자형과 범주형 변수가 있는 데이터 전처리 파이프라인
범주형 변수 데이터 전처리 파이프라인
숫자형 변수 데이터 전처리 파이프라인
sklearn.pipeline
2D array를 1D array로 풀기
unnest 2D array into 1D array in PostgreSQL
1D array에서 원소 indexing 하기
2D array를 1D array로 unnest 하기
madlib.array_unnest_2d_to_1d()
numpy array 소수점 반올림 np.round()
numpy array 열 기준 백분율
numpy array 행 기준 백분율
matrix colormap
percentage by column in numpy array
percentage by row in numpy array
특정 칼럼 이름을 선별적으로 변경하고 싶은 경우
data.frame의 칼럼 이름을 매핑 테이블을 이용해서 변경하기
nvidia-smi -q -d memory
NVIDIA GPU 모니터링
NVIDIA GPU 장치 관리
NVIDIA System Management Interface
GPU 이름 확인
GPU 드라이버 확인
GPU name check
GPU driver check
GPU monitoring
Outlier detection
changing the order of x-axis ticks in matplotlib
matplotlib 그래프의 X축 범주형 항목 순서 바꾸기
막대그래프의 X축 범주형 항목 순서 바꾸기
pandas DataFrame의 index 순서 바꾸기
changing the order of index in pandas DataFrame
reordering the x-axis label in matplotlib
changing the order of x-axis label
AWS S3에 파이썬 객체를 직렬화해서 저장하기
s3.put_object
serialize a python object and save it to AWS S3
conditions or boolean array
iloc 정수 기반 위치에 의한 데이터 선택
loc 레이블 기반 데이터 선택
integer-based indexing
label-based selection
loc vs. iloc
y 라벨이 원핫 인코딩되어 있으면 categorical_crossentropy() 손실함수 사용
y label onehot encoded
y label integer
categorical_crossentoropy()
sparse_categorical_crossentropy()
Keras loss function for multiclass classification
상수는 값 변경 불가능
변수는 값 변경 가능
변수를 상수로 변환
to.convert_to_constat()
tf.Variable()
tf.reshape() 에서 -1의 의미는?
reshape a Tensor
tf.reshape([-1])
텐서 행과 열 바꾸기
텐서 전치
tf.transpose()
tf.reshape()
텐서 형태 변환
tf.Tensor.shape
tf.Tensor.dtype
tf.convert_to_tensor()
tf.stack()
텐서는 어떻게 만드나
텐서는 무엇인가
what is a tf.Tensor
how to make a tf.Tensor
Hadamard product
행렬 원소 간 곱
tf.matmul()
tf.math.multiply()
percentile_disc()
identifying outliers
Interquartile Range
그룹별로 결측값을 이전 값으로 채우기
over(partition by order by)
결측값을 이후 값으로 채우기
backward filling NULL values with the next non-null value
NULL 값을 이전 NON-NULL 값으로 채우기
first_value() window function
결측값을 이전 값으로 채우기
forward filling NULL values with the previous non-null value
reshape table from long to wide format
세로로 긴 테이블을 가로로 긴 테이블로 변환
세로로 긴 테이블을 피봇하기
alter table rename
PostgreSQL tablefunc extention crosstab() function
Apache MADlib pivot() function
converting long-format to wide-format table
여러개의 칼럼을 가진 테이블을 key value 의 테이블로 변환하기
옆으로 긴 테이블을 세로로 긴 테이블로 변환하기
unnest()
옆으로 넓은 테이블을 세로로 긴 테이블로 변환하기
여러개의 칼럼을 행으로 변환하기
reshape horizontally wide into vertically long format table
transpose columns into rows
enumerate()
list all schemas
list all columns
list all tables
모든 Column 조회
모든 View 조회
모든 Table 조회
모든 Schema 조회
모든 Database 조회
범례 글자 크기 설정
legend(labels)
legend(handles)
범례 그림자 설정
범례 배경 색 설정
범례 위치 설정
범례 설정하기
범례 추가하기
matplotlib.pyplot.legend()
set_title()
set_xlabel()
set_xticklabels()
set_xticks()
제목 설정하기
축 이름 설정하기
눈금 이름 설정하기
눈금 설정하기
TIMESTAMP에서 월 요일 이름 가져오기
localized weekday name
localized month name
날짜 시간 포맷 정해서 문자열로 바꾸기
'YYYY-MM-DD HH12:MI:SS'
to_char(expression format)
TIMESTAMP 데이터 유형을 포맷을 가진 문자열로 변환
converts a TIMESTAMP data type to a string
greenplum to_char()
postgresql to_char()
day of year
ISODOW
extract(field from source)
INTERVAL 데이터 유형에서 날짜 시간 가져오기
TIMESTAMP 데이터 유형에서 날짜 시간 가져오기
extract() function
서브 플롯 간 여백 없애기
plt.tight_layout()
sharey=True
sharex=True
하위 플롯간 Y축 공유하기
하위 플롯 간 X축 공유하기
plt.subplots_adjust()
하위 플롯 간 간격 조절하기
pandas DataFrame에서 적어도 하나의 칼럼이 조건을 만족하는 행 가져오기
pandas.DataFrame.all()
pandas.DataFrame.any()
파이썬 코드 실행시간 측정
%timeit
datetime.now()
measuring the time elapsed
코드 소요시간 측정
코드 실행시간 측정
Dictionary에서 값 상위 n개 가져오기
리스트에서 원소별 개수 세기
그룹별로 DataFrame 정렬하기
pandas.Series.sort_values()
pandas.Series.cumsum()
pandas.Series.searchsorted(p)
그룹별로 특정 분위수 위치 구하기
그룹별로 누적 비율 구하기
Python pandas DataFrame
pandas DataFrame에 그룹별 첫번째 값 칼럼 추가
정렬 후 그룹별 마지막 값 추가
정렬 후 그룹별 첫번째 값 추가
pd.DataFrame.transform()
pd.DataFrame.sort_values().groupby()
pd.merge(how on)
그룹별 마지막 값 DataFrame에 추가하기
그룹별 첫번째 값 DataFrame에 추가하기
pandas DataFrame transform('last')
pandas DataFrame transform('first')
pandas DataFrame plot 의 선 종류와 색깔 스타일 정의하기
pandas DataFrame 시계열 도표 그리기
pandas DataFrame 각 원소를 나누기
pandas DataFrame 칼럼 축으로 합하기
pandas.DataFrame.div()
pandas.DataFrame.sum(axis=1)
attach images by dragging & dropping
GitHub에 파일 첨부
GitHub에 다양한 요소를 삽입하여 문서 작성하기
GitHub Flavored Markdown Spec
GitHub에 그림 첨부
CommonMark
GFM
GitHub Flavored Markdown
git 기본설정 확인
git config --global
Git 출력 색깔 설정
Git 이메일 주소 설정
Git 사용자 이름 설정
git config color.ui
git config user.email
git config user.name
Git 기본설정
seasonal differencing
계절 차분
stationarity
differencing
variance stabilization transformation
분산 안정화 변환
배치 정규화
fun.aggregation
문자열 재구조화할 때 처음 값만 가져오기
문자열 재구조화할 때 콤마로 구분해서 합치기
how to concatenate strings after dcast aggregation
dcast() for string charactor
defaulting to 'length'
Aggregate function missing
data.table dcast()
rolling windows vs. expanding windows
window functions in R using zoo package
moving average of time series using R
시계열 데이터 누적 평균
시계열 데이터 이동평균
expanding windows
rolling windows
rollapply() window function
zoo package
YAML language extension by Red Hat
Kubernetes YAML 언어 확장
setting.json 에서 편집
K8s IDE
Kubernetes Extension
checking satationary time series
stationarity test
KPSS 통계적 가설 검정
ADF 가설 검정
KPSS test
ADF test
statistical hypothesis test
정상성 여부 확인 통계적 가설 검정
정상성 여부 확인
OOV
pandas.to_csv()
pandas.read_csv()
tarfile
DataFrame을 파일로 다운로드
압축 해제
웹사이트에서 압축파일 다운로드
plt.axvline()
plt.vlines()
plt.hlines()
adding vertical lines
adding horizontal lines
수직선 추가
수평선 추가
stochastic process
ARIMA모형 가정사항 정상성
정상시계열의 시도표
stationary process
random walk process
white noise process
정상확률과정
확률보행과정
백색잡음과정
spectrum analysis
시계열 데이터로 부터 주파수 분해
고속 푸리에 변환
주파수 분석
inverse FFT
spectrum anlysis
오차역전파법
y축 색깔 지정하기
2번째 y축 만들기
2중축 그래프
ax1.twinx()
스케일이 다른 y값 이중축 그래프
이중축 그래프
2 axes plot
여러개 웹 페이지 크롤링하여 DataFrame으로 만들기
웹 스크랩핑
web straping
Run and Recurring Run
파이프라인 구성요소
Pipeline components
Workflow 관리
인공지능은 사람처럼 될 수 있는가?
인간은 특별한가?
노벨 문학상 수상
인공지능 로봇 친구
에이프렌드
AI Friend
Klara and The Sun
클라라와 태양
깃 토큰 발급
git token
generate new tokens
개인 토큰 발급 방법
토큰으로 깃 인증
비밀번호 인증 종료
support for password authentication was removed
a personal access token
download MiniKF kubeconfig file
set KUBECONFIG
install kubectl
쿠버네티스 커맨드라인 툴
kuberbetes tool kubectl
모든 고민은 인간관계에서 비롯된다
트라우마를 부정하라
The Courage To Be Disliked
이전 버전의 R package 설치
특정 버전의 R 패키지 설치
R 패키지 설치
R package installation with old version
install.packages()
시계열 데이터 보간
감독 라세 할스트롬
줄리엣 비노사
영화 초콜렛
pandas DataFrame 칼럼 위치 바꾸기
how to change the order of DataFrame column
마지막 칼럼을 제일 앞으로 순서 바꾸기
DataFrame 칼럼 순서 바꾸기
sort values by column in DataFrame
renaming MultiIndex Columns
from wide-format to long-format
MultiIndex Column DataFrame
존재하고자 하는 용기
선택할 대안의 탐색
아야!선 영역
신뢰 경청 명료화
일치와 나눔의 인간관계를 위하여
how to have a creative relationship instead of a power struggle
William V Pietsch
윌리암 피취
Human BE-ing
인간 만남 그리고 창조
자기 알기
심리 치유 에세이
MiniKF on MacOS
미니 큐브플로우
Mini-Kubeflow
MiniKF
installing Kubeflow on laptop or desktop
맥북에 Kubeflow 설치하기
프랭크 다라본트 감독
감사 표현하기
있는 그대로 관찰하기
자신의 느낌과 욕구 인식하고 표현하기
삶을 풍요롭게 하기 위해 부탁하기
공감으로 듣기
model serving
pipelines
create and manage jupyter notebooks
machine learning model deployment
모델 배포
큐브플로우
일상에서 쓰는 평화의 언어 삶의 언어
마셜 B. 로젠버그 박사
Nonviolent Communication
Python statsmodels.tsa.api
시계열데이터 모델 평가 지표
추세와 계절성이 있는 시계열 데이터 지수평활법으로 예측
holt-winters method in python
onehotencoder
지수평활법
everyone wants to be found
Scarlett Ingrid Johansson
성숙한 삶
성공하는 조직
성장하는 나
일의 격
mean absolute percentage error
평균오차
평균오차제곱합
오차제곱합
전체제곱합
goodness-of-fit
모형 적합도
시계열 자료 분석
xticks
y축 눈금값 추가
x축 눈금값 추가
cosine curve plot
sine curve plot
사인 곡선 그래프
코사인 곡선 그래프
인쇼랑
누적이동평균
cumulative moving average
simple moving average
curse of dimensionality
ERROR: UNION types integer and text cannot be matched
단순이동평균
두 테이블 JOIN 하기
여러개의 테이블 JOIN 하기
두 테이블 연결하기
pivot table
any 연산자
exists 연산자
all 연산자
한정술어
DBeaver SQL 포맷 설정
DBeaver 글꼴 설정
DBeaver 테마 설정
DBeaver 폰트 유형 및 폰트 크기 설정
DBeaver 대문자로 자동 바꿔주기 설정
DBeaver 행번호 표시 설정
폰트 크기 설정
대문자 기본 설정
분배기 이동
가스 배관 철거
집 예쁘게 인테리어 하기
ANOVA in parallel using PL/Python on Greenplum database
one-way ANOVA in parallel
복수개의 칼럼에 대해 일원분산분석
여러개의 변수에 대해 일원분산분석 검정하기
statsmodels.api.stats.anova_lm()
one-way ANOVA for multiple variables
assumptions of ANOVA test
분산분석 검정의 가정사항
scipy.stats.f_oneway()
샘플 크기가 다른 2개 이상 그룹간 평균 비교
ANOVA with different sized samples
WGS84 좌표계를 KATECH 좌표계로 변환하기
spatial_ref_sys 테이블에 신규 좌표계 등록하기
PostGIS_Transform_Geometry()
ST_Transform()
KATECH 좌표계
distribute replicated
spatial_ref_sys table
ST_Transform
ERROR: function cannot execute on QE slice because it accesses relation spatial_ref_sys
결측값을 기계학습 모형 예측값으로 대체하기
DataFrame의 결측값을 선형회귀모형 추정값으로 채우기
DataFrame의 결측값 채우기
pandas DataFrame 칼럼 이름 바꾸기
pandas DataFrame 에서 특정 칼럼 제외하기
pandas DataFrame 에서 특정 칼럼 선택하기
pandas DataFrame 에 특정 칼럼 포함 여부 확인하기
pandas DataFrame 의 칼럼 이름 확인 하기
task parallelism
random forest variable importance
random forest feature importance
massively parallel processing
잔차 연결
cellStats()
descriptive statistics for raster objects
summary() function
레스터 객체 요약 통계량
summarizing raster objects
geospatial data
raster objects
subsetting layers from multi-layers raster objects
다층 레스터 객체에서 일부 층 가져오기
레스터 객체 데이터 일부 가져오기
raster::subset()
subsetting raster objects
레스터 객체 조작하기
manipulating raster object
str_subset()
setdiff() 로 두 데이터 프레임의 차이 집합 구하기
stringr
sf 지리 벡터 데이터 합치기
nest join
anti join
semi join
dplyr로 두개 테이블 join 하기
dplyr 패키지로 지리 벡터 속성 집계하기
data.table aggregation by group
dplyr group_by() summarise()
Base R aggregate(by)
group by summarization
지리 벡터 데이터의 속성을 그룹별로 집계하기
지리 벡터 데이터에서 속성 요약정보 계산
지리공간 벡터 데이터의 속성 집계하기
geograpy vector attribute aggregation
select geographic vector data
subset geographic vector data using dplyr
지리공간 벡터 데이터를 dplyr 로 전처리하기
지리공간 벡터 데이터에서 열 가져오기
지리공간 벡터 데이터에서 행 가져오기
st_drop_geometry()
vector attributes subset
geographic vector data subset
imbalanced data
정수 키를 자동 생성하여 넘파이 배열을 사전으로 변환하기
python numpy array를 dict 로 변환하기
넘파이 배열을 사전으로 변환하기
converting numpy array to dict
배열 내 최대값의 위치 인덱스 가져오기
사전의 키에 해당하는 값 가져오기
배열이 여러개 원소에 한꺼번에 함수 적용하기
np.vectorize()
transforming numpy array elements by mapping dict(key: value)
배열의 원소값을 dict 의 (key: value)를 매핑해서 변환하기
python dictionary 사전 값 기준 정렬
pandas value_counts()
return_counts=True
np.unique()
1D 배열 안의 고유한 원소 집합
배열 안의 고유한 원소 별 개수
unique elements set in array
display max rows in jupyter notebook
jupyter notebook 출력 최대 행의 개수 설정
투영 좌표계
set_units()
st_area()
R units package
좌표계의 지리공간 단위
CRS spatial units
지리 좌표 참조 시스템
Projected Coordinate Reference Systems
Geographic coordinate systems
RasterStack Class
RasterBrick Class
RasterLayer Class
레스터 클래스
Raster Classes
geospatial data anlaysis
system.time()
R 속도 측정
object.size(DT)
R 파일 크기 확인
자동 인덱싱
DT[i
빠른 이진 탐색 기반 부분집합 가져오기
fast binary subset
key vs. secondary indices
키와 이차 인덱스 비교
auto indexing
Manifold Learning
linear gression for each groups
linear regression by groups
그룹별로 회귀계수 구하기
그룹별로 선형회귀모형 적합하기
grouped regression
which.max()
which.min()
dynamic indexing by maximum value
dynamic indexing by minimum value
group optima
그룹별 최대값 행 가져오기
그룹별 최소값 행 가져오기
split by group
그룹별로 무작위로 행 가져오기
그룹별로 마지막 행 가져오기
그룹별로 첫번째 행 가져오기
그룹별로 부분집합 가져오기
group subsetting
data.table에서 left join 하기
Key를 기준으로 데이터셋 병합하기
조건이 있는 데이터셋 조인
조건이 있는 상태에서 데이터셋 합치기
conditional joins
x의 m개 원소 간 모든 가능한 조합 만들기
모든 가능한 조합 만들기
combn
cotrolling a model's right-hand side
모델의 오른쪽 부분 조절하기
칼럼 유형 변환
.SDcols 로 일부 칼럼만 가져오기
.SD는 무엇인가?
column type conversion
column subsetting
data.table 칼럼 선별하기
data.table 조건에 따라 관측치 선별하기
data.table 그룹별 관측치 개수 세기
data.table 패키지를 사용해서 그룹별 관측치 개수 별로 data.table을 구분해서 생성하기
그룹별로 관측치 개수가 2개 이상인 관측치만 구분해서 DataFrame 만들기
그룹별로 관측치 개수가 1개인 관측치만 DataFrame 만들기
그룹별로 행의 개수 세기
그룹별 관측치 개수 별로 DataFrame을 구분해서 생성하기
Sys.sleep()
레스터 데이터 분석
GeoSpatial data anlysis
앞으로 예상 소요시간 출력
현재까지 총 소요 시간 출력
진행 개수 출력
진행 상태 퍼센트 출력
진행 상태 막대 출력
GitHub에 SSH key 등록하기
ssh -T git@github.com
GitHub에 SSH key 등록이 성공했는지 확인하는 방법
공개 키 GitHub에 등록하기
generate SSH Key
SSH Key 생성하기
DataFrame 으로 부터 임의 표본 추출
ValueError: Replace has to be set to `True` when upsampling the population `frac` > 1.
not both
ValueError: Please enter a value for `frac` OR `n`
ValueError: Cannot take a larger sample than population when 'replace=False'
random sampling using a DataFrame column as weights
가중치를 부여한 무작위 표본추출
random sampling by fraction
random sampling by number
fraction of random sampling
비율로 무작위 표본 추출
random sampling with replacement
무작위 복원 표본 추출
DataFrame.sample() method
random sample from pandas DataFrame
DataFrame에서 무작위 표본 추출
DataFrame의 열에 대해 반복 순환하여 값 가져오기
DataFrame의 행에 대해 반복 순환하여 값 가져오기
DataFrame의 행과 열에 대해 반복 순환하여 값 가져오기
DataFrame.iterrows() iterate over DataFrame rows
itertuples() iterate over pandas DataFrame rows as namedtupres
열) 반복
pandas DataFrame 튜플 (인덱스
pandas DataFrame 열 반복
DataFrame 행 반복
압축 수준
compressionlevel
allowZip64
압축 옵션
download and extract ZIP file
웹에서 ZIP 파일 다운로드하여 압축 해제하기
FileExistsError: [Errno 17] File exists:
Python zipfile 모듈로 여러개의 파일을 ZIP 파일로 압축하기
FileExistsError: [Errno 17] File exists
extract a zip file using python zipfile module
read a zip file using python
write a zip file using python
1개 파일 압축 해제 하기
모든 파일 압축 해제하기
ZipFile().extractall()
ZipFile.extract()
mode='r'
mode='a'
mode='x'
mode='w'
압축 파일 닫기
압축 파일 열기
압축 파일 해제하기
압축 파일 읽기
압축 파일 쓰기
ZipFile() 클래스
zipfile 모듈
spatial vector data encoding
as(Class="Spatial")
st_as_sf()
sf 클래스를 sp 클래스로 변환하기
sp 클래스를 sf 클래스로 변환하기
sp package
sf object
simple feature columns
simple feature geometry
geometry + non-geometric attributes
data.frame()
st_sf()
단순 지리특성 칼럼 + data.frame 속성
the sf class
World Geodetic System 1984
European Petroleum Survey Group
공간 참조 시스템
specify CRS using EPSG definition
check CRS
st_crs() function
st_sfc() function
GeoSpatial Data Analysis
vector data geometry
st_geometry_type
combining simple feature geometry into a simple feature column
combining sfg into a sfc
R sf 패키지
단순 지리특성 기하(sfg)를 단순 지리특성 칼럼(sfc)으로 합치기
ST_ stands for the standard Spatial Type
simple feature geometries
st_geometrycollection()
st_multipoint()
st_multilinestring()
st_multipolygon()
st_polygon()
st_linestring()
st_point()
기하집합
다중면
다중선
다중점
geometrycollection
miltilinestring
miltipoint
vector data
geometry types
기하 유형
벡터 데이터
geospatial data visualization
R sf package
vector data visualization
벡터 데이터 시각화
world map visualization
세계지도 시각화
geography markup language
Keyhole markup language
well-known text
WKT
항공 지도 사진
인공위성 지도 사진
raster package
레스터 데이터 모델
벡터 데이터 모델
raster data model
vector data model
save interactive map as web page
상호작용하는 동적 지도를 웹 페이지로 저장하기
지도 스타일 추가하기 (add map styles)
동적 지도
addProviderTiles("Tile Name Here")
leaflet() 에 다른 외부 타일 제공자의 타일 지도를 추가
Leaflet-prodivers preview
add map style tiles
맵 스타일 타일 추가하기
map styles
R leaflet 패키지
interactive map
정렬될 k-거리 그래프
elbow method using sorted k-dist plot
지리공간 데이터
DBSCAN 알고리즘 절차 및 R 예제
k-means 군집화와 DBSCAN 군집화 알고리즘 비교
k-means vs. DBSCAN
DBSCAN 알고리즘의 Eps MinPts 입력모수 결정 방법
Determining the Parameters Eps and MinPts
대용량 공간 데이터 군집분석
공간 데이터에 대한 군집분석
clustering for spatial data
대한민국 언론의 신뢰도
대한민국 사법부의 신뢰도
문서 위조에 대한 검찰과 법원의 이중성
대한민국 검찰의 권한
검찰 사법부 언론 카르텔
groupby().pct_change()
Percentage change between the current and a prior element by Group
Percentage change between the current and a prior element
pct_change()
그룹별 전년 동분기 대비 성장률
그룹별 전분기 대비 성장률
그룹별로 전년 동분기 대비 변동률 구하기
그룹별로 전분기 대비 변동률 구하기
eval() 함수
interpretation of clustering results
interpretation of cluster analysis
군집해의 해석
군집분석 결과 해석
밀도기반 군집분석
hierarchical culstering dendrogram
the sihouette method
the elbow method
팔꿈치 방법
실루엣 방법
계층적 군집분석 덴드로그램
how to select the number of cluster
selecting the number of cluster k
determining the number of cluster
how to determine the number of cluster k?
군집의 개수 선택 방법
군집의 개수 결정 방법
cluster analsys
그룹별 선형회귀분석
대용량 데이터 선형회귀분석
Linear Regression using Apache MADlib
Linear Regression using SQL
MADlib을 통한 선형 회귀분석
SQL 을 활용한 선형 회귀분석
선형 회귀분석
madlib.correlation()
그룹별 상관계수
상관관계 분석
correlation coefficients using Apache MADlib in Greenplum DB
correlation coefficients using SQL in Greenplum
correlation coefficients using Apache MADlib in PostgreSQL
correlation coefficients using SQL in PostgreSQL
madlib.summary()
MADlib summary 함수
그룹별 요약통계량
행 개수
3사분위수
1사분위수
summay statistics
quantile
서열형 변수 혼합형의 유사성 측정
서열형 변수 혼합형의 거리 측정
명목형
이분형
연속형
연속형과 범주형 혼합 데이터의 비유사성 측정
연속형과 범주형 혼합 데이터의 거리 측정
연속형과 범주형 혼합 데이터의 유사성 측정
distance measures for categorical data
similarity measures for categorical data
asymmetric binary attributes distance
jaccard coefficient distance
simple matching distance
hamming distance
명목형 자료의 거리 측정
서열형 자료의 거리 측정
이분형 자료의 거리 측정
범주형 자료의 거리 측정
명목형 자료의 유사성 측정 척도
서열형 자료의 유사성 측정 척도
이분형 자료의 유사성 측정 척도
범주형 자료의 유사성 측정 척도
Jupyter Notebook에서 Greenplum DB에 SQL query한 결과를 pandas DataFrame으로 가져오기
Jupyter Notebook에서 PostgreSQL DB에 SQL query 한 결과를 pandas DataFrame으로 가져오기
{variable_name}
$variable_name
named style :variable_name
dynamic variable substitution
Jupyter Notebook에서 Greenplum DB에 로컬변수로 변환해서 query 하기
Jupyter Notebook에서 PostgreSQL DB에 로컬변수로 변환해서 query 하기
Mac OS에서 숨김 파일 보기
Windows에서 숨김 파일 보기
DSN connections
Jupyter Notebook에서 Greenplum database에 접속하는 방법
Jupyter Notebook에서 PostgreSQL database에 접속하는 방법
DB credentials 별도 보관
jupyter notebook에서 SQL meta-command 하기
jupyter notebook에서 SQL query 하기
jupyter notebook에서 Greenplum database 접속하기
jupyter notebook에서 PostgreSQL database 접속하기
pgspecial
패턴이 있는 여러개의 칼럼을 재구조화하기
dcast
casting multiple value.vars simultaneously
melt multiple columns simultaneously
여러개의 칼럼을 동시에 재구조화 하기 (cast)
여러개의 칼럼을 동시에 녹이기 (melt)
TO_CHAR()
fun.agg
fun.aggregate
:= 로 data.table 얕은 복사한 data.table 참조하여 새로운 칼럼 추가하기
copy() function
옆으로 펼쳐진 데이터를 세로로 길게 재구조화하기
세로로 긴 데이터를 옆으로 펼쳐서 재구조화하기
Limitations in current melt/dcast approaches
dcast()
dcasting data.tables (long to wide)
melting data.tables (wide to long)
effieient reshaping using data.table
효율적인 데이터 재구조화
R data.table 구문 소개 포스팅 링크
R data.table 참고 사이트 링크
copy() function for deep copy
깊은 복사하는 방법
side effect of shallow copy
얕은 복사의 부작용
shallow copy vs. deep copy in R
얕은 복사 vs. 깊은 복사
':='(colA = valA)functional form
LHS := RHS
칼럼 삭제하기
칼럼 갱신하기
칼럼 추가하기
reference semantics
참조 의미론
':='
복수개의 칼럼에 칼럼 이름과 값 할당
여러개의 칼럼에 값 할당
칼럼 이름에 값 할당
:= 연산자
벡터 스캔 방식과 이진 탐색 방식의 속도 비교
NULL)
설정되어 있는 키를 제거하려면 setkey(DT
Key 값이 매칭되는 값이 존재하는 행만 가져오기
Key 값이 매칭되는 마지막 행 가져오기
Key 값이 매칭되는 첫번째 행 가져오기
Key 값이 매칭되는 모든 행 가져오기
nomatch = NULL
mult = "last"
mult = "first"
mult = "all"
mult argument
nomatch argument
by를 사용하여 그룹별 집계하기
keyby를 사용하여 그룹별 집계하고 정렬 후 키 설정하기
setkey()
keyby
연산 결과 키 재설정
data.table 키에 대해 부분집합 가져와서 연산하고 연산 결과에 대해 키 재설정하기
키 설정
how to get key in data.table?
key()
data.table 의 키 특성 (Keys properties)
data.table에서 키 사용하여 부분집합 가져오기
data.table에서 키 확인하기
data.table에서 키 설정하기
fast subset
fast binary search
special symbol .SD in R data.table stands for Subset of Data
lapply in R data.table?
How to standardize or transform the multiple numeric columns using .SD
여러개의 숫자형 변수를 R lapply 함수를 써서 한꺼번에 변환하기
여러개 숫자형 변수를 한꺼번에 표준화 하기
특정 칼럼을 지정하는 data.table .SDcols
다수의 모든 칼럼을 대상으로 하는 data.table .SD
data.table 그룹별 집계 후 group by key 기준으로 정렬하기
data.table 체인 연산
data.table chaining of operations
aggregation by group
그룹별 집계하기
data.table 변수 이름 변경하기 (rename)
data.table order()
data.table 내림차순 정렬하기
data.table 오름차순 정렬하기
.() 로 data.table 반환하기
list() 로 data.table 반환하기
.N 행 개수 세기
구글 코랩
랜선 집들이
data.table compute or do in j
data.table select column(s) in j
data.table subset rows in i
data.table indexing 인덱싱
data.table 슬라이싱
data.table 기본 구문 사용법
data.table 정렬 order
data.table 그룹별 요약통계량
data.table 구문과 SQL 구문 비교
general form of data.table syntax
data.table 을 csv 파일로 쓰기
csv 파일을 빠르게 읽어와서 data.table 만들기
fread() 함수
fwrite() 함수
여러개로 쪼개져 있는 파일들을 하나의 data.table로 합치기
rbindlist()
how to read csv file and make it R data.table?
how to convert R data.frame to data.table
data.frame을 data.table로 변환하기
데이터를 읽어와서 data.table 만들기
by: group by what?
j: what to do?
i: on which rows
data.table 의 기본 구문
R data.table 설치
왜 R data.table 인가?
R data.table 은 무엇인가?
R data.table 패키지
FileNotFoundError: [Errno 2] No such file or directory:
mounting google drive to Colab
loading a file from the local folder to Colab
files.upload()
drive.mount()
from google.colab import drive
from google.colab import files
Cloab에 Google drive 연결하는 방법
Colab에 파일 업로드 하는 방법
Colabarotory
Gradient Tapes을 이용한 자동 미분
tape.gradient(loss x)
with tf.GradientTape() as tape:
records relevant operations onto a tape
연관된 연산을 테이프에 기록
reverse mode automatic differentiation
후진 모드 자동 미분
Compressed Sparse Row(CSR) 형태(format) 자료구조의 장점과 단점
csr_matrix.data
csr_matrix.indices
csr_matrix.indptr
how to make term-document matrix using compressed sparse row (CSR) matrix
Compressed Sparse Row 행렬을 이용하여 Term-Document 행렬 만들기
Document-Term Matrix
Term-Document Matrix
단어 문서 행렬
문서 단어 행렬
converting SciPy compressed sparse row matrix to NumPy array matrix
converting NumPy sparse matrix to SciPy compressed sparse row matrix
SciPy CSR 행렬을 NumPy 행렬로 변환하기
NumPy 배열을 SciPy CSR 행렬로 변환하기
압축 희소 행 행렬을 희소행렬로 변환하기
희소행렬을 압축 희소 행 행렬로 변환하기
todense()
toarray()
SciPy compressed sparse row matrix
scipy scr_matrix()
tf.data.Dataset.list_files()
image load
이미지 파일 라벨 파싱
이미지 파일 시각화
이미지 파일 불러오기
이미지 파일 압축 풀기
이미지 파일 다운로드
print python dict key and value with a fixed space
파이썬 사전형을 일정 간격을 두고 키와 값 인쇄하기
파이썬 사전형 인쇄
print options
인쇄 옵션
The Green Mile
비쥬얼 스튜디오 코드
nppad() 함수 사용법
pad_sequences() 함수 사용법
tensorflow.keras.preprocessing.sequence.pad_sequences() 로 padding 하기
넘파이 배열의 빈 형상을 0으로 채우기
padding array
MAPE
전라북도 장수 오미자 직거래 판매
오미자 당절임
리스트 컴프리헨션
except Exception as e
Try Except 절에서 에러 발생 시 Exception Error 이름과 Error message 출력
tf.GradientTape()
automatic differentiation
오차역전파
Gradient Tape
tf.function decorator
파이썬 코드를 텐서플로우 그래프로 변환
graph optimization
그래프 최적화
tf.function 데코레이터
TensorFlow 2.x AutoGraph
웹 기반 텐서플로우 시각화 툴 텐서보드
web-based visualization tool
TensorFlow visualization framework
텐서보드로 딥러닝 모델 시각화하기
Keras에서 MNIST 데이터 로딩하기
Keras 모델 구축
tf.keras.utils.plot_model()
Keras 모델 시각화
Keras model 컴파일 compile
model evaluation
Keras 모델 평가
Keras 모델 예측
Keras 모델 가중치 HDF5 파일로 저장하기
Keras 저장된 모델 가중치 불러오기
tf.keras.models.load_model()
keras fit verbose 설정 값
keras model 시각화 plot_model() 메소드
DNN 모델
MNIST 분류 모델
tf.convert_to_tensor()로 numpy array를 tensor로 변환
numpy() 메소드로 tensor를 numpy array로 변환
TensorFlow 1.x placeholder
TensorFlow 1.x 변수 Variable
TensorFlow 1.x 상수 constant
TensorFlow 2.x placeholder 미사용
TensorFlow 2.x 변수 Variable
tensorflow 2.0 상수 constant
TensorFlow 2.x 즉시 실행 모드 (eager execution)
Tensorflow 1.x 지연 실행 모드 (lazy execution)
deep neural network
직방의 부동산 설문조사 표본설계의 오류
np.repeat()
plot the model's decision boundary
기계학습 모델의 의사결정 경계 시각화하기
확률표본 추출 시 초기값 seed number 설정하기
비복원 임의 표본 추출
복원 임의 표본 추출
특정 확률을 지정하여 임의표본추출하기
임의 샘플링
무작위 샘플링
np.random.choice() 메소드
Word Similarity
경로 거리 기반 단어 유사성 측정
단어의 네트워크 구조
단어의 상 하 구조
파이썬으로 자연어 처리 하기
a large lexical DB of English
WordNet corpus
Natural Language Toolkit
contour plot
등고선 그래프
f-1 score
vectorization operation
A배열의 idx 위치에 B배열의 원소를 순서대로 더하기
np.add.at()
요약통계량
specificity
LIKE operator
string pattern matching
character pattern matching
문자열을 특정 구분자를 기준으로 분리하고 부분 가져오기 함수 split_part()
문자열의 n개 자리 중 모자라는 부분 채우기 함수 rpad()
문자열의 왼쪽의 n개 자리 중 빈 자리 채우기 lpad()
문자열의 오른쪽부터 처음 n개 문자열 함수 right()
문자열의 왼쪽부터 처음 몇개 문자열 가져오기 함수 left()
문자열의 오른쪽 왼쪽 양쪽 특정 문자 잘라내기 함수 trim(leading trailing both from)
문자열 오른쪽의 특정 문자 잘라내기 함수 rtrim()
문자열 왼쪽 의 특정 문자 잘라내기 함수 ltrim()
문자열의 특정 위치 부분 자르기 함수 substr()
문자열 위치 인덱싱 함수 position()
문자열을 특정 숫자만큼 반복 함수 repeat()
문자열을 뒤집기 함수 reverse()
문자열을 다른 문자열로 대체 replace()
문자열 특정 위치에 다른 문자열 덮어쓰기 함수 overaly()
문자열 타이틀 형식으로 변환 함수 initcap()
문자열 대문자로 변환 upper()
문자열 소문자로 변환 함수 lower()
문자열 바이트 길이 octet_length
문자열 비트 길이 bit_length()
문자열 길이 length
구분자 포함해 문자열 합치기 함수 concat_ws()
문자열 합치기 함수 concat()
문자열 합치기 연산자 ||
무제한 길이 문자열 text
문자열 처리 연산자
고정 길이 문자열 char(n)
가변 길이 문자열 varchar(n)
varchar(n)
charctor()
character type function in postgresql
character type operators in postgresql
fillna(dictionary)
how to fill missing values per each columns differently in python
여러개의 칼럼별로 결측값을 다르게 대체하는 방법
그린플럼
포스트그레스에스퀴엘
other operators
comparison operators
postgresql arithmatic operators
Python 으로 YAML 파일 읽어서 정렬하기
Python으로 YAML 파일 쓰기
Python 으로 YAML 파일로 직렬화하기
Python 으로 YAML 파싱해서 읽기
PyYAML
reset_index()
reindex()
set_index()
모든 조합 Cartesian Product의 MultiIndex
pd.MultiIndex.from_product()
TimeStamp와 ID의 모든 조합 MultiIndex
모든 데이터 삭제
테이블 비우기
데이터 갱신
데이터 수정
데이터 정의 언어 (DDL)
DDL (Data Definition Language)
테이블에 데이터 등록하고 가져오기
테이블 데이터 제약조건 설정
postgresql server에 연결하기
insert into table values
Table에 값 입력하기
View 생성
Table 생성
Schema 생성
create schema
pytorch tensor
boto3
infer_datetime_format
dayfirst
date_parser
파일에서 데이터 읽어올 때 날짜 시간 파싱하는 방법
파일에서 데이터 읽어올 때 날짜 파싱하기
datetime parsing
심리학자 아들러
차원의 저주
vanishing gradient problem
PL/R 오류 처리 tryCatch
오류나 경고 메시지 처리 R tryCatch
R exception handler tryCatch()
R 코드 예외 처리
tryCatch() 를 이용한 오류에 견고한 R 코드
PivotalR on HAWQ
PivotalR 구조
wrapper of MADlib
PivotalR package
Apache MADlib Functions
Apache MADlib을 활용한 훈련 검증 셋 분할 (split training test set using MADlib)
Apache MADlib 의 역사
Apache MADlib을 활용한 선형회귀모형 적합 및 예측
returns table
returns bytea
returns setof
3 ways to return PL/R results on Greenplum and PostgreSQL DB
PL/R 결과를 반환받는 3가지 방법
Procedural Language R extension
string aggregation
array aggregation in 2D array
array aggregation by column
PL/R 을 위한 데이터셋 array aggregation 준비하는 3가지 방법
Parallel Processing using Procedural Language R
무문선
PL/R을 이용해서 모형 적합하기
PL/R 함수를 사용해서 예측하기
composite data type 정의
parallelized linear regression modeling per groups using PL/R on greenplum
Error on receive from seg0 slice1 pid
server closed the connection unexpectedly
PL/R error
PL/R을 이용한 그룹별 선형회귀모형 병렬 학습 및 예측
MADlib을 이용한 그룹별 선형회귀모형 병렬 학습 및 예측'
배열 반복
자기상관계수의 95% 신뢰구간
autocorrelation plot
자기상관그림
correlation vs. autocorrelation
상관계수 vs. 자기상관계수 비교
autocorrelation coefficients
자기상관계수
text datasets
translate datasets
video datasets
object detection datasets
텐서플로우 데이터 허브
공개 오디오 데이터셋
공개 동영상 데이터셋
공개 음성 데이터셋
공개 텍스트 데이터셋
공개 이미지 데이터셋
딥러닝을 위한 공개 데이터셋
open text dataset
open audio dataset
open image dataset
tensorflow data hub
기존 함수의 매개변수 값을 고정해 새로운 함수 만들기
기존 함수를 재활용하여 새로운 함수 만들기
python functools library partial() function
functools.partial()
pickle.loads()
pickle.dumps()
파이썬 버전별로 Pickle Protocol Version
역직렬화하여 읽기
직렬화하여 저장
F1 score
문자열로 이루어진 리스트를 한 줄씩 텍스트 파일에 쓰기
문자열로 이루어진 리스트를 한꺼번에 모든 줄을 텍스트 파일에 쓰기
writelines() 메소드
writeline() 메소드
readline() 메소드
readlines() 메소드
파이썬으로 텍스트 파일 쓰기
파이썬으로 텍스트 파일 읽기
python open error 처리 옵션
python open mode 종류
file.write()
file.read()
with open(file) as object:
파이썬으로 파일에 데이터 읽기
파이썬으로 파일에 데이터 쓰기
파이썬으로 파일 닫기
파이썬으로 파일 열기
not a logical vector Call `rlang::last_error()` to see a backtrace
Error: must be a character vector
연속형 변수를 몇 개의 범주로 구분하기
여러개의 if else if 조건절을 벡터화해서 처리
case_when()
bins) 를 이용한 연속형 변수의 여러개 구간별 범주화
np.digitize(X
labels) 를 이영힌 연속형 변수의 여러개 구가별 범주화
bins
pd.cut(X
연속형 변수를 범주형 변수로 변환하기
categorization of continuous variable using pd.cut(bins)
categorization of continuous variable using np.digitize(bins)
sklearn model_selection train_test_split()
train and test set split by stratified random sampling
훈련 검증 데이터셋 분할
sklearn train_test_split
train set split
test set split
층화 무작위 추출
Chaining
np.random.shuffle() 함수로 train test set 분할하기
numpy random choice 함수를 이용한 train test set 분할
numpy random permutation 함수를 사용한 train test set 분할
무작위 층화추출을 통한 train test set 분할
무작위 샘플링을 통한 train test set 분할
scikit learn model_selection 클래스의 train_test_split 함수를 사용하여 train test set 분할하기
자동 미분
how to reverse 2D numpy array using np.flip()
how to reverse 1D numpy array
파이썬 넘파이 배열 순서 거꾸로 뒤집기
how to reverse numpy array elements order
numpy 배열 원소 뒤집기
numpy 배열 뒤집기
how to reverse numpy array
Postgresql 서버 종료
Postgresql 서버 시작
how to stop postgresql server
how to start postgresql server
그룹별 집계된 변수에 접두사 추가하기 add_prefix()
Postgresql 데이터 유형 확인
Postgresql 칼럼 이름 확인
GPDB 데이터 유형 확인
GPDB 칼럼 이름 확인
사스 vs. 메르스 vs. 신종 코로나 바이러스 감염율과 치사율 비교
비행기 사고와 자동차 사고의 사망율 비교
전염병에 대한 역대 정부별 방역 관리 대응
신종 코로나 바이러스에 대한 자유한국당의 문제점
신종 코로나 바이러스에 대한 언론의 문제점
기본 클래스
child class
parent class
class inheritance
class in python
object oriented programming (OOP)
객체지향프로그래밍과 클래스
문자열 분할
getting dummy variables in R manually using ifelse() condition
getting dummy variables in R using caret library
범주형 변수에 대해 가변수 만들기
z-transformation
[0-1] transformation
여러개의 변수를 가진 Data Frame을 표준화하기
[0-1] 변환
R을 사용한 무작위 샘플링
R을 사용한 무작위 층화 샘플링
여러개의 변수를 가진 DataFrame을 train test set 으로 분할하기
split dataframe with multiple variables into train and test set
random sampling using R
stratified random sampling using R
split into train and test set using R
random sampling
복수의 그래프에서 Y 축 이름 공유
복수 그래프에서 X 축 공유
hot to share x axis scale
how to share y axis scale
how to fix x axis scale
how to fix y axis scale
축 단위 고정
복수 그래프
여러개의 수직 막대 그래프
여러개의 수평 막대그래프
multiple horizontal bar plots with fixed and shared x axis scale
multiple vertical bar plots with fixed and shared y axis scale
multiple bar plots
sensitivity analysis of logistic regression model per each observation and variable
로지스틱 회귀모형에 대한 관측치별 변수별 민감도 분석
로지스틱 회귀모형에 대한 관측치별 변수별 기여도 분석
선형회귀모형에 대한 각 관측치별 변수별 기여도 분석
선형회귀모형에 대한 각 관측치별 변수별 민감도 분석
Sensitivity analysis of linear regression per each observations
sensitivity analysis
Cell 전체에 한꺼번에 코멘트 부호 '#' 넣기 : Ctrl + /
도움말 살펴보기 : Shift + Tab
Python 함수의 옵션
조국 전 법무부 장관에 대한 검찰 수사 비용 추정
multiplicative winters' method with quadratic trend
additive winters' method with quadratic trend
three parameter exponential smoothing
multiplicative winters' method with linear trend
additive winters' method with linear trend
two parameter exponential smoothing
multiplicative winters' method with no trend
additive winters' method with no trend
simple exponential smoothing
exponential smoothing by time series patterns
irregular noise
승법 모형
가법 모형
timeseries multiplicative model
timeseries additive model
irregular factor
seasonal factor
cycle factor
trend factor
time series component factors
backward filling missing value
forward filling missing value
결측값 채우기
결측값 선형 보간
fill missing value
upsampling
resampling
라벨 이름(label) 설정하기
Downsampling 으로 시계열 데이터 집계 시 좌/우의 포함여부(closed)
resample(label='right')
resample(closed='right')
downsampling
분기 기간 단위 집계 (quarterly period group by aggregation)
분기별 공휴일이 아닌 시작 날짜(staring business date)와 공휴일이 아닌 끝 날짜 (ending business date)로 변환
분기별 시작 날짜(starting date)와 끝 날짜(ending date)
pandas.asfreq()
covert period to desired frequency
conversion from timestamp to period
quarterly period conversion to timestamp
quarterly period dates range
pandas.period_range() 로 날짜 기간(period of time) 만들기
timezone names of Asia in python
파이썬 아시아 지역 시간대 이름
시간대를 포함해서 날짜-시간 범위 만들기
convert from naive to localized time zone
시간대가 없는 naive 상태에서 지역 시간대 설정하기
아시아 시간대 이름 리스트
Python에서 pytz 라이브러리로 시간대 확인하기
time zone information using pytz library
convert from a time zone to another time zone
localize time zone
generate time zone
offset.rollback()
offset.rollforward()
가족 관계
pandas DataFrame에 칼럼별로 lambda 함수를 적용해서 행 단위 함수를 적용하여 for loop 문을 사용하지 않고 새로운 칼럼 만들기
lag column in pandas DataFrame using shift()
shfit()
DataFrame 칼럼 한칸씩 밑으로 내리기
칼럼 재배열
특정 주기의 날짜 데이터 생성하기
월별 공휴일이 아닌 시작날짜 business month begin
월별 시작 날짜 month begin
월별 공휴일이아닌 마지막 날짜 business month end
월별 마지막 날짜 month end
주별 특정 요일 날짜 week
월별 특정 순번째의 요일 날짜 week of month
분기별 공휴일이 아닌 마지막 날짜 business quarter end
분기별 마지막 날짜 quarter end
분기별 공휴일이 아닌 시작 날짜 business quarter begin
분기별 시작 날짜 quarter begin
년별 공휴일이 아닌 마지막 날짜 business year end
년별 마지막 날짜 Year End
pandas time series frequency and date offsets
추적이동평균
중심이동평균
trailing moving average
centered moving average
차수 m인 이동평균 구하기
moving average with order m
sequential order
no redundant and gaps in time series
equally spaced time interval time series
fixed frequency time seres
일정한 주기의 시계열 데이터를 가진 DataFrame 만들기
일정한 주기의 시계열 데이터를 가진 series 만들기
시계열 데이터 index 중복 확인 및 처리
pandas DataFrame with time series
pandas Series with time series
how to deal with duplicated indices in time series
시계열 데이터를 index로 가지는 pandas Series에서 날짜-시간 데이터 잘라내기 (Truncate)
pandas.date_range()
시계열 데이터 조회
시계열 데이터 슬라이싱
시계열 데이터 인덱싱
time series data selection
DataFrame time series data slicing
Series time series data slicing
DataFrame time series data indexing
Series time series data indexing
python dateutil 라이브러리의 parser.parse() 함수를 이용하여 문자열을 datetime 객체로 변환하기
parser.parse()
from dateutil.parser import parse
'%Y-%m-%d %H:%M:%S')
datetime.strptime(ts_str
ts.strftime('%Y-%m-%d %H:%M:%S')
문자열을 pandas Timestamp로 변환하기
문자열을 Python datetime으로 변환하기
convert string to pandas Timestamp
convert string to Python datetime
convert Python datetime to string
convert pandas Timestamp to string
convert pandas Timestamp to Python standard datetime object
pandas Timestamp를 Python standard datetime 으로 변환하기
문자열을 Timestamp로 변환
pandas Timestamp을 문자열(string)로 변환
Converting between pandas Timestamp and Strings
pandas Timestamp Attributes 로 날짜
시간 정보 확인하기
pandas Timestamp Methods 로 날짜
pandas 현재 날짜-시간 가져오기
pandas Timestamp를 native Python datetime으로 변환하기
pandas Timestamp를 native Python
convert pandas Timestamp to strings
pandas Timestamp를 문자열로 변환하기
날짜-시간 문자열을 pandas Timestamp로 변환하기
시간 데이터 정보 추출
날짜 데이터 정보 추출
시간 데이터 입력
날짜 데이터 입력
pandas Timestamp
data types in datetime module in pandas
datetime module
postgresql percentile_disc(0.25) within group (order by value)
10분 단위 구간별로 수량 가중 평균 가격 구하기 (amount-weighted average of price)
10분 단위 구간별 평균(mean) 구하기
10분 단위 구간별로 표본 표준편차(sample standard deviation) 구하기
표본 분산(sample variance) 구하기
10분 단위 구간별로 범위(range) 구하기
10분 단위 구간별로 최대값(max) 구하기
10분 단위 구간별로 3사분위수(3rd quantile) 구하기
10분 단위 구간별로 중위수(median) 구하기
10분 단위 구간별로 1사분위수(q1) 구하기
10분 단위 구간별로 최소값(min) 구하기
10분 단위 구간별로 누적합 구하기(cumulative sum)
10분 단위 구간별로 마지막 행 값(last row value) 구하기
10분 단위 구간별로 첫번째 행 값(first row value)
10분 단위 구간별로 합(sum)
1시간 단위 구간별로 시계열 데이터 집계 요약하기
10분 단위 구간별로 시계열 데이터 집계 요약하기
시계열 데이터를 1달 단위 구간별로 집계 요약하기
시계열 데이터를 1일 단위 구간별로 집계 요약하기
시계열 데이터를 1시간 단위 구간별로 집계 요약하기
시계열 데이터를 10분 단위 구간별로 집계 요약하기
resample() method
그룹별로 행을 올리기
그룹별로 행을 내리기
lag a row by group
lead a row by group
GPDB 병렬처리 현황 확인하는 방법
GPDB performance 확인 하는 방법
GPDB master segment 확인하는 방법
GPDB Command Center
터미널에서 현재 수행중인 query 강제 종료 (kill)시키는 방법
터미널에서 현재 수행 중인 query 조회하는 방법
reshape dataframe from long to wide format table using R tidyverse spread() function
reshape dataframe from long to wide format using R reshape
긴 형태의 데이터프레임을 옆으로 긴 형태의 데이터프레임으로 재구조화 변환 하기
reshape2 패키지 acast() 함수
tidyverse spread
Green Book
컬럼별 데이터 유형 설정 dtype
pandas 사용자 정의 결측값 기호 na_values = []
factor()
pscl package zeroinfl()
stats package
glm()
Zero-Inflated model R 패키지
음이항 회귀모형 R 패키지
포아송 회귀모형 R package
Negative Binomial Distribution
ZIP Regression Model
Zero-Inflated Poisson Regression Model
Excess Zeros Problem
과대영 문제
Negative Binomial Regression Model
음이항 회귀 모델
과대산포 포아송 회귀모형
Overdispersed Poisson Regression
과대산포
Overdispersion
Poisson Regression Model
포아송 회귀모형
소스 코드 들여다 보기
R 패키지 함수 소스 코드
getAnywhere()
how to see function's source codes in R package?
if else 조건문
for loop 반복문
시계열 정수의 순차 3개 묶음 패턴 별 개수 구하기
median absolute deviation
중위값 절대 편차
repeat()
jupyter notebook에서 matplotlib plot 기본 크기 설정
jupyter notebook에서 DataFrame 소수점 자리수 설정
jupyter notebook에서 왼쪽으로 텍스트 정렬
jupyter notebook에서 pandas DataFrame 칼럼 최대 너비 설정
jupyter notebook cell 너비 설정
Cell 전체에 한꺼번에 코멘트 부호 '#' 넣기
jupyter notebook에서 R 사용할 수 있도록 kernel 생성하기
jupyter notebook에서 R 사용하기
여러개의 중첩된 커널밀도곡선에서 peak 값 위치 구하기
여러개의 하위그룹의 중첩된 커널밀도곡선에서 최대 피크값 구하기
adding a vertical line at the peak point of kernel density plot
커널 밀도 곡선의 최대 피크값에 수직선 추가하기
커널 밀도 곡선의 최대 피크값의 좌표 구하기
kernel density plot
우리가 조국이다
검찰개혁 사법개혁
검사스럽다 신조어 뜻
remove 2 rows above and below with condition in R DataFrame
DataFrame에서 특정 조건을 만족하는 값의 행과 앞 뒤 2개 행을 동시에 제거하기
np.var()
np.std()
np.mean()
np.eye(n)
pd.DataFrame.from_dict([{'key': value}])
pd.DataFrame.from_records([{'key': value}])
ValueError: if using all scalar values you must pass an index 원인 및 해결 방안
you must pass an index
ValueError: If using all scalar values
gropby().apply(UDF)
분포 영어 표기법
데이터셋 영어 표기법
함수 영어 표기법
확률과 정보 이론 영어 표기법
미적분 영어 표기법
선형대수 연산 영어 표기법
인덱싱 영어 표기법
집합과 그래프 영어 표기법
수와 배열 수식 영어 표기법
how to read mathematical notation in English
수식 표기법을 영어로 읽는 방법
수식을 영어로 읽기
조국 법무부장관에 대한 언론사별 뉴스 건수
조국 법무부장관에 대한 날짜별 뉴스 건수
조국 법무부장관에 대한 뉴스 건수
네이버에서 조국 법무부장관 후보자 뉴스 건수가 줄어듦
조국 법무부장관 후보자 뉴스 건수
조국 후보자 기자간담회 질문 주제
언론사별 기자 질문 현황
조국 법무부장관 후보자 기자간담회
how to read Extensible Markup Language in python
node.find("TITLE").text
urlopen(url).read()
xml.etree.ElementTree
웹사이트에서 XML 데이터 읽어오는 방법
XML 데이터 파싱하기
XML을 읽어서 DataFrame으로 만들기
convert XML to pandas DataFrame
read XML
not 'builtin_function_or_method'
urlopen()
import urllib
웹으로부터 JSON 포맷 데이터 읽어와서 pandas DataFrame으로 만들기
AttributeError: 'str' object has no attribute 'read'
not 'TextIOWrapper'
TypeError: dump() missing 1 required positional argument: 'fp'
decode JSON to python object
deserialize JSON to python object
encode python object to JSON
serialize python object to JSON
json.dumps()
read JSON to python
write python object to JSON
JSON 데이터 포맷으로 데이터 내보내기
Python으로 JSON데이터 읽기
key=lambda item: item[1]
reverse=True
사전 자료형을 값 기준으로 오름차순 정렬
사전 자료형을 값 기준으로 내림차순 정렬
사전 자료형을 키 기준으로 오름차순 정렬
사전 자료형을 키 기준으로 내림차순 정렬
sort a dictionary in descending order
sort a dictionary in ascending order
sort a dictionary by value
sort a dictionary by key
조국 후보자와 황교안 뉴스 건수 비교
조국 법무부 장관 후보
노무현 전 태통령님
뉴스 기사 건수
법무부 장관 후보
greenplum split_part()
greenplum 문자열을 구분자로 나누기
postgresql 문자열을 구분자로 나누기
postgresql split_part()
split(sepator)
한개의 문자열을 구분자로 나누어서 여러개의 칼럼을 가진 DataFrame 만들기
df.apply(to_numeric)
DataFrame.astype() method
pd.to_numeric()
DataFrame의 문자열 칼럼을 숫자형으로 바꾸기
convert string to numeric data type in pandas DataFrame
graph analytics with apache MADlib on Greenplum DB in parallel
what is graph analytics
why graph analytics
network analytics
graph analytics
replace missing values by group average in R
fill missing values by group mean in R
결측값을 그룹 별 평균으로 대체하기
how to change the levels of factor type variable in R?
how to check levels of factor type variable
levels()
요인형 변수의 수준을 확인하고 변경하기
np.concatenate(axis=1)
np.concatenate(axis=0)
how to change index name in pandas DataFrame
pandas DataFrame의 index 이름 바꾸기
lambda 함수를 사용해서 DataFrame 칼럼 이름에 접두사 붙여서 칼럼 이름 바꾸기
new values have 2 elements
ValueError: Length mismatch: Expected axis has 3 elements
DataFrame 칼럼 이름 변경
df.rename(columns = {})
df.columns = []
how to change column name in python pandas DataFrame?
DataFrame의 칼럼 이름 바꾸기
how to change decimal point format in pandas DataFrame
pandas DataFrame의 숫자형 변수의 소수점 자리수 변경 및 지정하는 방법
how to delete points in a character using R
how to remove a point in a string using R character function
문자열에 포함되 있는 점을 없애는 방법
Python Built-in Exceptions Hierarchy
python exceptions
nvidia-smi
jupyter notebook 코드 행번호 표시하기
jupyter notebook edit mode 단축키
jupyter notebook command mode 단축키
jupyter notebook shortcuts
jupyter notebook 단축키
PermissionError: [Errno 13] Permission denied:
두개 이상의 DataFrame을 여러개의 Excel sheets 에 나누어서 쓰기
freeze panes in Excel by python
DataFrame을 엑셀로 내보낼 때 틀 고정 시키는 방법
to_excel()
write pandas DataFrame to Excel file
DataFrame을 Excel로 내보내기
가변 매개변수 위치에 따른 일반 매개변수의 올바른 사용법
SyntaxError: positional argument foloows keyword argument
TypeError: function missing 1 required keyword-only argument:
pandas read_excel()
from excel to python DataFrame
파이썬으로 엑셀 데이터 불러오기
how to read excel sheet using python pandas
구분자 tab 은 sep='\t'
greenplum DB에서 천 단위 구분 기호 콤마를 없애는 방법
postgresql 에서 천 단위 구분 기호 콤마를 없애는 방법
astype('int64')
str.replace('
pd.read_csv(thousands='
csv 파일을 불러올 때 천 단위 구분 기호 콤마를 없애는 방법
text 파일을 불러올 때 천 단위 구분 기호 콤마를 없애는 방법
DataFrame에서 천 단위 숫자의 자리 구분 기호 콤마 없애는 방법
how to set data type in sqlalchemy to_sql()
using SQLalchemy engine
write data in a python DataFrame to a Postgresql and Greenplum database
python DataFrame을 Greenplum DB에 쓰기
python DataFrame을 PostgreSQL DB에 쓰기
rank over the columns
칼럼을 기준으로 순위 구하기
rank by group
그룹별 순위 구하기
순위를 구하는 rank() 함수
Series rank
DataFrame rank
flatten()
TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]
Series에서 조건에 맞는 값이 들어있는 행 indexing 하기
get rows which get specific values from pandas DataFrame or Series using isin() method
python에서 object를 times 만큼 반복하기
리스트 원소를 n번 반복하여 새로운 리스트 만들기
chain.from_iterable()
itertools
conda 버전 업데이트
conda update
conda 버전 확인하기
conda info -e
conda env list
가상환경 목록 확인하기
가상환경 만들 때 SSL Error 발생 시 대처 방법
activate a virtual environment
가상환경 활성화 하기
delete a virtual environment
가상환경 제거하기
windows 10에서 python 가상환경 만들기
create a virtual environment with conda prompt at Windows 10
anaconda prompt를 이용해서 가상환경 만들기
copy a csv file to postgresql database using python
copy a csv file to greenplum database using python
csv file
DataFrame을 CSV 파일로 내보내기
os.rmdir()
os.remove()
폴더 존재 여부 확인하기
os.path.isdir()
shutil.rmtree(path)
폴더 안의 파일 전부 삭제하기
파일 삭제하기
경로 삭제하기
df.reset_index().rename(columns={"index": "name"})
df.rename_axis().reset_index()
assign a name to pandas DataFrame index
rename index
DataFrame reset index and assign a name
For Loop 반복문 진척율 콘솔창에 한줄로 출력하기
sort pandas DataFrame by group and select top N rows by group
그룹 별로 DataFrame을 정렬한 후에 상위 N개 행 가져오기
'DataError: No numeric types to aggregate' 에러 대응방법
aggfunc='first'
pevot_table()
python spyder 사용 중인 객체 전부 지우기
df.group(...).size()
df.count()
python pandas DataFrame의 그룹별 Null 값이 아닌 행 개수 세기
python pandas DataFrame의 그룹별 행 개수 세기
python pandas Series의 그룹별 Null 값이 아닌 행 개수 세기
python pandas Series의 그룹별 행 개수 세기
python pandas Series의 행 개수 세기
python pandas DataFrame의 열 개수 세기
python pandas DataFrame의 행 개수 세기
spyder 편집 창 전부 지우기 %reset -sf
'error: Microsoft Visual C++ 14.0 is required
get query result as a pandas DataFrame
Hive DB connect in Python
Presto DB Connect in Python
MySQL DB connect in Python
IBM DB2 connect in Python
Greenplum DB connect in Python
PostgreSQL DB connect in Python
Python 문자열 일부분으로 새로운 칼럼 만들기
split a string
문자열 칼럼 분할
두 집단 간 모평균 차이 t-검정 신뢰구간
R Shiny 웹 애플리케이션
t-검정
linear regression line
선형회귀선
두 연속형 변수 간 상관계수
RShiny
R Shiny 웹 애플리케이션 만들기
R Shiny summary statistics
R Shiny로 변수 별 요약 통계량 조회 화면
반의어
나는 심리치료사입니다
console 창 화면 지우기 ctrl + l
shift + ctrl + m
%>% chain operator 단축 키
shift + ctrl + c
코멘트 부호 # 넣기
Rstudio 에서 자주 사용하는 단축 키
token_index
텍스트를 파싱하기
one-hot encoding
parsing text
normalization using python
standardization using python
reading text file using python string methods
reading csv file using string methods
문자열 메소드를 사용하여 텍스트 파일을 읽어서 파싱하기
GPDB에 R 패키지 수동 설치하는 방법
How to install external R packages on Greenplum database
how to install PL/R on Greenplum database
Procedural Language R
R 의존성 있는 패키지들 한꺼번에 다운로드 하기
download.packages()
package_dependencies() function from tools package
download R package with dependencies of package
공간지리 백업 테이블 다시 불러오기
공간지리 테이블 백업하기
create or restore a geospatial backup table
pgrestore
importing raster data using raster2pgsql
raster2pgsql
공간지리 벡터 데이터 #Google Earth
LineString
MultiPolygon
Spatial Reference System
KML (Keyhole Markup Language)
MIF (MapInfo formats)
GML (Geometry Markup Language)
import spatial vector data
공간지리 shape files를 Greenplum DB에 import하는 방법
공간지리 shape files을 PostgreSQL DB에 import하는 방법
shp2pgsql
importing PRJ
import DBF
importing SHX
importing SHP
importing shape files
위경도가 있는 csv파일을 greenplum db에 import 하기
위경도가 있는 데이터에서 공간정보 뽑아내기
extracting spatial information from flat data
importing flat data using DBeaver
importing flat data through psql
지리정보 분석
Geospatial data preprocessing
Geospatial analysis
python pandas 막대 그래프
pandas bar plot
DataFrane.plot.bar()
Greenplum DB에 Python 으로 DB 연동해서 그래프 그리기
동일 간격 범위별로 관측치 개수 세기
python pandas bar plot
width_bucket()
extract day
extract month
extract year
날짜에서 연도 월 일 추출하기
substring으로 문자열 자르기
조건문으로 변수 만들기
대문자를 소문자로 변경하기
행 번호 입력하기
filling missing value with average
결측값을 평균으로 채우기
황교안 전 법무부 장관
abalone dataset from UC Irvine Machine Learning Repository
DBeaver import csv file
DataFrame.to_sql()
pd.
insert into values()
웹에서 데이터 가져와서 테이블 생성하기
create external table
30개의 이미지 데이터를 6*5 격자에 나누어서 시각화하기
한개의 이미지 파일의 array 를 시각화하기
visualizing an image array data
visualizing 30 image data at 6*5 grid layout
preprocess and visualize images?
how to upload
이미지 파일을 업로드하고 전처리하여 시각화하는 방법
DBeaver에 PostgreSQL 연결 설정하기
DBeaver에 Greenplum 연결 설정하기
Greenplum Database query tool
PostgreSQL query tool
오픈소스 database tool DBeaver 설치 방법
파일이나 경로 이름 바꾸기
두 개의 경로를 합쳐서 새로운 경로 만들기
os library is the miscellaneous operating system interfaces
how to copy files from src to dst directory using shutil
shutil 라이브러리를 써서 소스 폴더로부터 타겟 폴더로 파일을 복사하기
how to make all list of files in a directory
경로/폴더 안에 있는 파일 이름 확인하기
how to remove the directory
경로 제거하기
how to change the folder name
폴더 이름 바꾸기
폴더 이름 만들기
how to check working directory
작업경로 확인
numpy.newaxis
numpy.reshape()
numpy.expand_dims()
adding dimensions to numpy array
다추원 배열에 차원 추가하기
Selecting rows which the RowSum is maximum per groups using R
그룹 별 행 합이 최대인 전체 행만 선별하기
Python Numpy 배열에서 0보다 작은 수를 0으로 변환하는 방법
np.clip()
np.where() : np.where(a < 0
Indexing: a[a < 0] = 0
List Comprehension: [0 if i < 0 else i for i in a]
작업관리자 성능
print(device_lib.list_local_devices())
K.tensorflow_backend._get_available_gpus()
How to check if Keras is using GPU?
How to check if the code is running on GPU or CPU?
DirectX 진단도구
dixdiag
NVIDIA Graphics Card 인지 확인하는 방법
컴퓨터 그래픽카드 확인하는 방법
Keras가 GPU를 사용하고 있는지 확인하는 방법
Tensorflow가 GPU를 사용하고 있는지 확인하는 방법
get all cominations of col_list with length 2
R ggplot2로 bin 범위가 다른 히스토그램 그리기
how to get all cominations of col_list with length 2 using python
그룹별로 두 개 변수 간 상관계수를 구하는 사용자 정의 함수
a user-defined function of correlation coefficients with paired variables by groups
correlation coefficients with multiple columns by groups
그룹별 다수의 변수 간 상관관계 분석
tensorflow version check
tf.VERSION
텐서플로우 버전 확인
python -V
파이썬 버전 확인
python version check
numpy upgrade
tensorflow upgrade
numpy RuntimeError Traceback (most recent call last) RuntimeError
TypeError: softmax() got an unexpected keyword argument 'axis' 에러 대응 방법
오프라인에서 jupyter notebook에 Plotly로 그래프 그리기
setting the global theme with offline = True
plotly and cufflinks in offline mode
cufflinks wrapper on plotly
plotly offline
R ggplot2 함수를 Python에서 사용하는 방법
PlotNine library for plotting R ggplot2 in python
파이썬에서 R ggplot2 함수로 그래프 그리는 방법
R ggplot2 plotting in Python by PlotNine library
heatmap by python pandas
heatmap by python seaborn
heatmap by python matplotlib
mosaic()
statsmodels.graphics.mosaicplot
titanic dataset
visualization of multiple categorical variables
다변량 범주형 변수의 시각화
모자이크 그래프
line width
line style
line graph by pandas
line graph by seaborn
line graph by matplotlib
선 그래프
산점도 행렬의 대각원소를 커널밀도곡선으로 그리기
산점도 행렬의 대각원소를 박스 그림으로 그리기
산점도 행렬의 대각원소 히스토그램으로 그리기
여러 개의 변수 간 관계를 한꺼번에 볼 때 유용한 산점도 행렬
scatterplot matrix by pandas
scatterplot matrix by plotly
scatterplot matrix by seaborn
scatterplot matrix
산점도의 색과 크기 설정
4개 연속형 변수를 사용한 산점도
AttributeError: module 'seaborn' has no attribute 'scatterplot'
scatter plot by groups
그룹별 산점도 점 색깔과 모양 다르게 하기
df.plot.scatter()
sns.scatterplot()
sns.regplot()
adding a line in scatter plot
adding a circle in scatter plot

R, Python 분석과 프로그래밍의 친구 (by R Friend)

태그

카테고리

태그목록

티스토리툴바