Author: "Jan-Ming Ho" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Jan-Ming Ho"' showing total 166 results

Start Over Author "Jan-Ming Ho" Language undetermined

166 results on '"Jan-Ming Ho"'

1. Stock Selection System Through Suitability Index and Fuzzy-Based Quantitative Characteristics

Author: JIA-HAO SYU, Jan-Ming Ho, Chi-Jen Wu, and Jerry Chun-Wei Lin
Subjects: Computational Theory and Mathematics, Artificial Intelligence, Control and Systems Engineering, Applied Mathematics
Published: 2023

2. Effective Fuzzy System for Qualifying the Characteristics of Stocks by Random Trading

Author: Jerry Chun-Wei Lin, Mu-En Wu, Jan-Ming Ho, and Jia-Hao Syu
Subjects: Computational Theory and Mathematics, Artificial Intelligence, Control and Systems Engineering, Computer science, Applied Mathematics, Econometrics, Fuzzy control system
Published: 2022

3. Time-Critical Data Dissemination Under Flash Crowd Traffic

Author: Jan-Ming Ho and Chi-Jen Wu
Subjects: General Medicine
Published: 2022

4. Fuzzy-Based Stock Selection System through Suitability Index and Position Sizing

Author: Jia-Hao Syu, Jerry Chun-Wei Lin, Chi-Jen Wu, and Jan-Ming Ho
Published: 2022

5. Portfolio management system in equity market neutral using reinforcement learning

Author: Jia-Hao Syu, Mu-En Wu, Jerry Chun-Wei Lin, and Jan-Ming Ho
Subjects: 050208 finance, Computer science, Sharpe ratio, Financial risk, 05 social sciences, 02 engineering and technology, Market neutral, Artificial Intelligence, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, Econometrics, Resource allocation, Portfolio, Reinforcement learning, Drawdown (economics), 020201 artificial intelligence & image processing, Project portfolio management
Abstract: Portfolio management involves position sizing and resource allocation. Traditional and generic portfolio strategies require forecasting of future stock prices as model inputs, which is not a trivial task since those values are difficult to obtain in the real-world applications. To overcome the above limitations and provide a better solution for portfolio management, we developed a Portfolio Management System (PMS) using reinforcement learning with two neural networks (CNN and RNN). A novel reward function involving Sharpe ratios is also proposed to evaluate the performance of the developed systems. Experimental results indicate that the PMS with the Sharpe ratio reward function exhibits outstanding performance, increasing return by 39.0% and decreasing drawdown by 13.7% on average compared to the reward function of trading return. In addition, the proposed model is more suitable for the construction of a reinforcement learning portfolio, but has 1.98 times more drawdown risk than the . Among the conducted datasets, the PMS outperforms the benchmark strategies in TW50 and traditional stocks, but is inferior to a benchmark strategy in the financial dataset. The PMS is profitable, effective, and offers lower investment risk among almost all datasets. The novel reward function involving the Sharpe ratio enhances performance, and well supports resource-allocation for empirical stock trading.
Published: 2021

6. GABOLA: A Reliable Gap-Filling Strategy for de novo Chromosome-Level Assembly

Author: Yi-Chen Huang, Jan-Ming Ho, Chung-Yen Lina, Ping-Heng Hsieha, Wei-Hsuan Chuang, Pao-Yin Fu, Shu-Hwa Chen, Hsueh-Chien Cheng, and Yu-Jung Chang
Subjects: Gap filling, Exon, Source code, Base pair, media_common.quotation_subject, Chromosome, Sequence assembly, Computational biology, Biology, Gene, Genome, media_common
Abstract: We propose a novel method, GABOLA, which utilizes long-range genomic information provided by accurate linked short reads jointly with long reads to improve the integrity and resolution of whole genome assemblies especially in complex genetic regions. We validated GABOLA on human and Japanese eel genomes. On the two human samples, we filled in more bases spanning 23.3Mbp and 46.2Mbp than Supernova assembler, covering over 3,200 functional genes which includes 8,500 exons and 15,000 transcripts. Among them, multiple genes related to various types of cancer were identified. Moreover, we discovered additional 11,031,487 base pairs of repeat sequences and 218 exclusive repeat patterns, some of which are known to be linked to several disorders such as neuron degenerative diseases. As for the eel genome, we successfully raised the genetic benchmarking score to 94.6% while adding 24.7 million base pairs. These results manifest the capability of GABOLA in the optimization of whole genome assembly and the potential in precise disease diagnosis and high-quality non-model organism breeding.Availability: The docker image and source code of GABOLA assembler are available at https://hub.docker.com/r/lsbnb/gabola and https://github.com/lsbnb/gabola respectively.
Published: 2021

7. Portfolio Management System with Reinforcement Learning

Author: Jan-Ming Ho, Mu-En Wu, and Jia-Hao Syu
Subjects: Computer science, Financial risk, Sharpe ratio, 05 social sciences, Equity (finance), 02 engineering and technology, Market neutral, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, Econometrics, Portfolio, Drawdown (economics), Reinforcement learning, 020201 artificial intelligence & image processing, Project portfolio management, Total return, 050203 business & management
Abstract: Portfolio management is a critical issue which should be skilled by position sizing and resource allocation. Traditional and generic portfolio strategies require to forecast the future stocks prices as the model inputs, which is not a trivial task in the real-world applications. To solve the above limitations and provide a better solution for the portfolio management to the inventors, we then develop a portfolio management system (PMS) with equity market neutral strategy in reinforcement learning. A novel reward function involving Sharpe ratio is also designed to evaluate the performance of the developed systems. Experimental results indicate that the PMS with Sharpe ratio reward function has the outstanding performance, and increase the return 39.0% and decrease the drawdown of 13.7% on average than that with reward function of trading return. In addition, the developed PMS_CNN model is more suitable and profitable to construct RL portfolio, but has a 1.98 times more drawdown risk than the PMS_RNN. Overall, the proposed PMS outperforms the benchmark strategies in the measurements of total return and Sharpe ratio. The PMS is profitable and effective with lower investment risk, and the novel reward function by involving Sharpe ratio really enhances the performance, and well support the resource-allocation in the empirical stock trading.
Published: 2020

8. Neural Network-based ORB Strategies for Threshold Classification on Taiwan Futures Market

Author: Jan-Ming Ho, Hsiang-Chi Chen, and Jia-Hao Syu
Subjects: Multi-label classification, 0209 industrial biotechnology, Breakout, Artificial neural network, Computer science, business.industry, Sharpe ratio, Deep learning, Futures market, 02 engineering and technology, Perceptron, Convolutional neural network, Profit (economics), 020901 industrial engineering & automation, Technical analysis, Statistics, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Orb (optics)
Abstract: Opening Range Breakout (ORB) is a renowned technical analysis strategy in which two pre-determined thresholds are set up in the early stage after market opening to determine the direction of investment for the day. In the literature, ORB has been shown to produce significant profits on several stock markets. In this paper, we present novel ORB algorithms based on deep learning which take into account historical trends and price movement during the early market interval. The proposed scheme uses a multi-label classification framework with multi-layer perceptron neural models and convolutional neural network models to predict the most profitable thresholds. To generate ground truth values for evaluation, we compared two labeling methods (one-hot encoding and distributed encoding). In experiments based on an empirical dataset, the proposed strategy earned profits of 8126, annual returns of 14.003%, and a Sharpe ratio of 1.376. The proposed scheme outperformed the original ORB strategy by nearly 2 times on metrics mentioned above and decreased maximum drawdown by more than 60%. Experiment results revealed that the uppermost and lowermost classes of thresholds accounted for the majority of the predicted results. In other words, taking a long position at a lower boundary and a short position at a higher boundary increased the likelihood of generating higher profits, while reducing exposure to risk.
Published: 2020

9. Reinforcement Learning Control for Six-Phase Permanent Magnet Synchronous Motor Position Servo Drive

Author: Wei-Lun Peng, Ray-I Chang, Yung-Wen Lan, Shih-Gang Chen, Faa-Jeng Lin, and Jan-Ming Ho
Subjects: Tracking error, Control theory, Computer science, Trajectory, Reinforcement learning, Torque, Servo drive, Servomotor, Synchronous motor
Abstract: Since the permanent magnet synchronous motor (PMSM) has nonlinear dynamic behavior characteristics, it is difficult to develop an ideal controller. In this paper, we develop a novel method for the six-phase PMSM (6PPMSM) position servo drive based on deep reinforcement learning (RL). Comparison studies between the proposed controller and the recurrent fuzzy neural cerebellar model articulation network (RFNCMAN) controller are presented. The results show that our controller can follow the reference trajectories more precisely in general cases, where the average tracking error obtained is 90% smaller than that of RFNCMAN.
Published: 2020

10. Threshold-Adjusted ORB Strategies with Genetic Algorithm and Protective Closing Strategy on Taiwan Futures Market

Author: Jia-Hao Syu, Mu-En Wu, Jan-Ming Ho, and Chun-Hao Chen
Subjects: Mathematical optimization, 050208 finance, 021103 operations research, Computer science, Sharpe ratio, 05 social sciences, 0211 other engineering and technologies, Futures market, 02 engineering and technology, Technical analysis, 0502 economics and business, Genetic algorithm, Drawdown (economics), Trading strategy, Profitability index, Selection (genetic algorithm), Orb (optics)
Abstract: Opening range breakout (ORB) is a well-known intraday trading strategy that generates trading signals through technical analysis; however, ORB does not make full use of market characteristics and does not define closing strategy. These problems make the ORB strategy not stable or robust enough. In this paper, we adjust thresholds through historical data to enhance profitability, and design protective closing strategy to prevent unacceptable losses. However, there are numerous parameters combinations, and the solution space is approximately 214. Therefore, we implement genetic algorithm to improve the efficiency and rationality of parameter selection. We found that the performance of the GAORB_Sharpe with stop-loss mechanism is outstanding. The strategy we proposed can generate 9.303% annual return and 15.716% Sharpe ratio, which is 2.5% and 5% more than the original strategy, and cut the maximum drawdown by half. Further, it can save 90% of the computation by genetic algorithm. In summary, we recommend adjusting the threshold and implementing stop-loss mechanism to the ORB strategy, and selecting parameters through genetic algorithms to improve overall performance.
Published: 2020

11. Additional file 1 of EpiMOLAS: an intuitive web-based framework for genome-wide DNA methylation analysis

Author: Sheng-Yao Su, I-Hsuan Lu, Wen-Chih Cheng, Chung, Wei-Chun, Pao-Yang Chen, Jan-Ming Ho, Shu-Hwa Chen, and Chung-Yen Lin
Abstract: Additional file 1. This supplementary file provides the description and usage of DocMethyl and EpiMOLAS_web, including the installation steps on how to execute the workflow in DocMethyl and the usage guides on analyzing WGBS data through the modules in EpiMOLAS_web.
Published: 2020
Full Text: View/download PDF

12. On the Design of Profitable Index Based on the Mechanism of Random Tradings

Author: Shin-Huah Lee, Mu-En Wu, Jia-Hao Syu, and Jan-Ming Ho
Subjects: 050208 finance, Correlation coefficient, Computer science, 05 social sciences, Financial market, Contrarian, 02 engineering and technology, Positive correlation, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, Econometrics, 020201 artificial intelligence & image processing, Profitability index, Trading strategy, Stock (geology)
Abstract: Designing profitable trading strategies is an issue of interest to many experts and scholars. There are thousands of strategies in financial markets, and almost all of them can be divided into two types: momentum-type and contrarian-type. However, there is no formal way to determine which type of strategies are suitable for each stock. This study proposes a method to quantify and classify the momentum-type and the contrarian-type stocks for investors, which makes the trading strategies more quantitative. Our approach uses the technique of random trading and the proposed profitable index to quantify the stock attributes. We take the constituted stocks of Taiwan’s 50 (TW50) as research objects. According to the experimental results, there are 8 stocks in TW50 that are suitable for contrarian-type trading strategies, and the others 42 stocks are suitable for momentum-type trading strategies. We also use simple momentum and contrarian strategies to evaluate the effectiveness of the proposed algorithms and index. The results show the positive correlation between the momentum-type (contrarian-type) profitable index and the trading performance, and the correlation coefficient achieves 77.3% (80.3%). In conclusion, the scale of momentum-type and contrarian-type profitability index actually represents the profitability and the attribute of the stock.
Published: 2020

13. A Framework of Applying Kelly Stationary Index to Stock Trading in Taiwan Market

Author: Jan-Ming Ho, Jia-Hao Syu, and Mu-En Wu
Subjects: Stock trading, Correlation coefficient, Computer science, 05 social sciences, Kelly criterion, Strategy development, 0502 economics and business, Econometrics, Expected return, 050207 economics, Project portfolio management, Money management, 050203 business & management, Stock (geology)
Abstract: Portfolio management and money management have always been important issues for investors and researchers in the financial field. The Kelly criterion is a theoretical approach of money management, and is a mathematical method for optimizing long-term expected return. Kelly criterion requires the future outcomes distribution as input, which can be predicted through the techniques of machine learning (ML). With the revolutionary growth of the amount of information, big data is the key to boost ML prediction, therefore, we introduce a general Kelly framework, including the strength of Kelly, ML, and big data.In addition, we propose the Kelly stationary index (KSI) to quantity the stationarity of the stock’s outcomes distribution, which will affect the trading period and forecasting frequency. We calculate the KSI of each constituent stock of Taiwan’s 50, and apply the Kelly criterion strategy to verify the effectiveness of KSI. The experimental results show that there is a moderate downhill relationship between the strategy performance and KSI with the -0.591 of correlation coefficient. It also indicates that the closer the estimated distribution is to the actual distribution, the higher the expected profit. In the future, we will use KSI for money management, strategy development, and apply KSI into the general Kelly framework.
Published: 2019

14. Predictive Scheduling for DASH Video Streaming in an Underground Subway System

Author: Shin-Hung Chang, Jan-Ming Ho, Meng-Huang Lee, and Chia-Hsien Chou
Subjects: Computer science, Server, Dash, Real-time computing, Quality of experience, Service provider, Streaming algorithm, Mobile device, Heterogeneous network, Scheduling (computing)
Abstract: With the rapid increases in network bandwidth, watching videos online has become popular. To cope with dynamic and heterogeneous network conditions, video service providers use dynamic adaptive streaming over hyper-text transfer protocol (DASH) streaming technology to serve content. DASH servers dynamically adjust the steaming rate according to the available bandwidth. In this study, we investigated DASH streaming to passengers on an underground subway system, Taipei Mass Rapid Transit (MRT). We developed a mobile app to measure the bandwidth available to mobile devices on subway trips and compiled our measurements with archive data. We observed that the available bandwidth while entering and leaving stations is usually much higher than that while traveling through tunnels. Hence, the DASH streaming algorithm must adapt the video resolution according to a non- deterministic network bandwidth. We also defined an M-Low optimization problem, where using the minimum resolution in DASH streaming provides an acceptable watching experience. Assuming deterministic bandwidth throughout a subway trip, we developed an M-Lowo algorithm for M-Low optimal DASH video scheduling. Since the available bandwidth is generally non- deterministic, we also designed a predictive scheduling M-Lowp algorithm based on archive data to predict network bandwidth in subways and to adjust video resolution automatically to optimize the video-watching experience. To demonstrate the quality of experience improvement achievable using our M-Lowp algorithm, we employed the bandwidth data measured in Taipei MRT as benchmarks. The results demonstrate shown that M- Lowp scheduling is more effective than previously popular algorithms.
Published: 2019

15. A Fund Selection Robo-Advisor with Deep-learning Driven Market Prediction

Author: Chen-Sheng Gu, Ray-I Chang, Chung-Shu Wu, Hong-Po Hsieh, and Jan-Ming Ho
Subjects: Investment strategy, business.industry, Deep learning, 05 social sciences, 02 engineering and technology, Investment (macroeconomics), Market research, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, Econometrics, Economics, Portfolio, Capital asset pricing model, 020201 artificial intelligence & image processing, Artificial intelligence, 050207 economics, Portfolio optimization, business, Mutual fund
Abstract: This paper proposes a new investment strategy with deep-learning market prediction for mutual fund portfolio optimization. Our strategy uses the capital asset pricing model (CAPM) that applies macroeconomic factors to predict whether the market is bull or bear. Then, we develop a robo-advisor (RA) to predict future market, optimize portfolio and automate investment. Experiments use 22 years’ data of S&P500 and mutual funds of U.S. to validate our strategy. Results show that the accuracy of our market prediction method can reach 84.3% and the rate-of-return of our RA is 13.S7%. Our model is more accurate and profitable than other algorithms.
Published: 2019

16. Exploring the Persistent Behavior of Financial Markets

Author: William Cheung, Jan-Ming Ho, Chin-Laung Lei, Yi Cheng Tsai, Chuan-Ju Wang, and Chung Shu Wu
Subjects: Transaction cost, 050208 finance, Financial economics, 05 social sciences, Financial market, Monetary economics, Stock market index, 0502 economics and business, Economics, Market price, Trading strategy, 050207 economics, Investor behavior, Futures contract, Finance, Stock (geology)
Abstract: This paper presents the persistent behavior hypothesis for financial markets, which is tested statistically on five stock indices from 2001 to 2014. We find significant results in all five stock markets for the full sample period as well as subperiods. A persistent behavior strategy (PBS) on index futures is also presented, the net annual returns of which are significantly higher than 15% in all futures markets including transaction costs. The best performance, about 27%, occurs in the E-mini NASDAQ 100 and TAIEX futures. We also present studies on the impact of investor behavior over market price of TAIEX futures.
Published: 2018

17. Optimal File Dissemination Scheduling Under a General Binary Tree of Trust Relationship

Author: Wing-Kai Hon, Wei-Chen Lin, Yun-Ping Tien, and Jan-Ming Ho
Subjects: Discrete mathematics, Schedule, Asymptotically optimal algorithm, Binary tree, Computer science, Parallel algorithm, Network topology, Time complexity, Sequential algorithm, Scheduling (computing)
Abstract: Ku et al. (GLOBECOM 2012) first studied the file dissemination problem under hierarchical trust relationship. They showed that when the trust relationship is defined as a rooted full binary tree, then there exists an optimal schedule for file dissemination taking ⌈log_2 n⌉ rounds, where n is the total number of nodes including the source and destinations of broadcasting. Furthermore, they devised a linear-time algorithm to compute such a schedule. In this paper, we extend the file dissemination problem with the trust relationship in the form of general binary tree, i.e., each internal node is not restricted to have exactly two children. We show that an optimal schedule for file dissemination remains ⌈log_2 n⌉ rounds, irrespective of the tree topology, and such a schedule can be computed in linear time. While we are extending Ku et al.'s results, our algorithm is based on a completely different approach. We have also considered the case of finding such an optimal schedule in the parallel setting, and propose an algorithm with parallel time complexity O(h log^2 n), where h denotes height of the tree. Furthermore, we show that the sequential algorithm and the parallel algorithm work for the case when the trust relationship is a rooted DAG such that the out-degree of each node is bounded by two. Finally, we remark that our algorithms also produce asymptotically optimal schedules for general degree-d DAGs when d is a constant, while the problem becomes NP-hard even when the out-degree of each node in the DAG is limited to 6.
Published: 2019

18. Modified ORB Strategies with Threshold Adjusting on Taiwan Futures Market

Author: Jan-Ming Ho, Jia-Hao Syu, Shin-Huah Lee, and Mu-En Wu
Subjects: 050208 finance, Breakout, Index (economics), Computer science, Sharpe ratio, 05 social sciences, 02 engineering and technology, Upper and lower bounds, Technical analysis, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, Econometrics, 020201 artificial intelligence & image processing, Trading strategy, Futures contract, Orb (optics)
Abstract: Opening Range Breakout (ORB) is a fairly intraday trading strategy. We set the resistance and the support levels by the price in opening interval to follow the trend in the futures market. However, such kind of strategies is not profitable for most commodities in recent years in the changing market. In this paper, we attempt to improve the original ORB strategy by considering the effect of trends continuity on the event. We adjust the predetermined threshold for upper bound and lower bound. This strategy is called Threshold Adjusting ORB or TA_ORB. We implement this modified ORB strategy on the Taiwan Index Futures from 2008 to 2012. Compared with the original ORB strategy, we got 145.98% return in 2008 (bear market), 81.86% return in 2009 (bull market) and 32.25% annual return in 2008–2012 (five-year period) which are 4.0 times, 1.4 times, and 2.6 times more than original ORB, respectively. TA_ORB performs outstanding in large fluctuation, especially in the bear market. Performance can verify that the observations of TA_ORB improve the stability of the breakthrough signal, enhance the return, and reduce strategic risk. Further, we plan to use neural network to make more precise predictions and implement these strategies in different commodities.
Published: 2019

19. Additional file 2: of SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies

Author: Li-An Yang, Yu-Jung Chang, Shu-Hwa Chen, Chung-Yen Lin, and Jan-Ming Ho
Subjects: ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Abstract: The detailed workflow of post-assembly read type labelling. (PDF 498 kb)
Published: 2019
Full Text: View/download PDF

20. Additional file 4: of SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies

Author: Li-An Yang, Yu-Jung Chang, Shu-Hwa Chen, Chung-Yen Lin, and Jan-Ming Ho
Subjects: ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Abstract: The details of SQUAT runtime and resource consumption and the results of the wheat dataset. (PDF 364 kb)
Published: 2019
Full Text: View/download PDF

21. Optimal DASH Video Scheduling over Variable-Bit-Rate Networks

Author: Jan-Ming Ho, Kuan-Jen Wang, and Shin-Hung Chang
Subjects: Schedule, Computer science, Real-time computing, 020206 networking & telecommunications, 02 engineering and technology, Variable bitrate, Display resolution, Scheduling (computing), Server, Dash, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Quality of experience, Heterogeneous network
Abstract: To accommodate users' heterogeneous network bandwidth usage, current video-service providers, e.g., YouTube and Netflix, generally use DASH technology to deploy their streaming services. A HTTP server handles client-server negotiation and dynamically adjusts a series of specific video-streaming rates to each client according to their bandwidth requirements. The QoE (Quality of Experience) is an index of users' subjective opinions; we follow previous researchers' quantized QoE indices and introduce an extra index, called lexicographical minimization, as a measure of quality of a DASH video playback schedule, abbreviated as DASH scheduling, given the available network bandwidth. A video playback schedule with minimum low-resolution video segments is said to have the best QoE. This paper presents a linear time M-Low scheduling algorithm, which adjusts the video resolution and optimizes the QoE indices in a DASH streaming service. We refer to the following QoE metrics: playback without introducing freezes, minimizing low-resolution video segments, minimizing the number of resolution-switching events, and maximizing video playback bitrate. We prove that the playback schedule generated by the M-Low algorithm is lexicographically minimal when the network bandwidth is in a steady state. Moreover, we extend M-Low algorithm to M-LowS algorithm over Variable-Bit-Rate (VBR) networks and show through simulations that our proposed algorithm achieves better QoE indices than those of previously published algorithms.
Published: 2018

22. Evolutionary ORB-based model with protective closing strategies

Author: Mu-En Wu, Jia-Hao Syu, Jerry Chun-Wei Lin, and Jan-Ming Ho
Subjects: Mathematical optimization, Information Systems and Management, Computer science, Sharpe ratio, Management Information Systems, Relevant market, Artificial Intelligence, Technical analysis, Profitability index, Trading strategy, Closing (morphology), Robustness (economics), Software, Orb (optics)
Abstract: Opening range breakout (ORB) is a well-known intraday trading strategy via technical analysis. ORB lacks robustness against market uncertainties (e.g., information from contradictory sources), and does not consider all relevant market characteristics. Furthermore, the closing strategies in generic ORB are not well defined. In this study, we developed an evolutionary ORB-based model, which utilized historical data to optimize thresholds in order to enhance profitability, and developed protective closing strategies aimed at to prevent unacceptable losses. Selecting appropriate thresholds and parameters for ORB is a non-trivial task, due to the fact that the search space exceeds sixty-five thousand options. We used evolutionary computation to derive rational strategies and parameters for ORB. The proposed framework based on a genetic algorithm optimizes the parameters related to threshold selection and protective closing strategies. In experiments, this resulted in annual returns of 9.3% (representing a 2.8% improvement over the original strategy) and Sharpe ratio of 2.5 (an improvement of 1.0), while reducing the maximum drawdown by half. The proposed scheme also reduced computational overhead by 89% compared to a grid search.
Published: 2021

23. Optimal QoE Scheduling in MPEG-DASH Video Streaming

Author: Min-Lun Tsai, Meng-Huang Lee, Shin-Hung Chang, and Jan-Ming Ho
Subjects: Statistics and Probability, Technology, video streaming, Computer Networks and Communications, Computer science, Real-time computing, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, MPEG-DASH, IJIMAI, mpeg-dash, quality of experience (qoe), Computer Science Applications, Scheduling (computing), Artificial Intelligence, Signal Processing, Dash, motion vector, Computer Vision and Pattern Recognition, Video streaming, integer programming, quality of experience (QoE)
Abstract: DASH is a popular technology for video streaming over the Internet. However, the quality of experience (QoE), a measure of humans’ perceived satisfaction of the quality of these streamed videos, is their subjective opinion, which is difficult to evaluate. Previous studies only considered network-based indices and focused on them to provide smooth video playback instead of improving the true QoE experienced by humans. In this study, we designed a series of click density experiments to verify whether different resolutions could affect the QoE for different video scenes. We observed that, in a single video segment, different scenes with the same resolution could affect the viewer’s QoE differently. It is true that the user’s satisfaction as a result of watching high-resolution video segments is always greater than that when watching low-resolution video segments of the same scenes. However, the most important observation is that low-resolution video segments yield higher viewing QoE gain in slow motion scenes than in fast motion scenes. Thus, the inclusion of more high-resolution segments in the fast motion scenes and more low-resolution segments in the slow motion scenes would be expected to maximize the user’s viewing QoE. In this study, to evaluate the user’s true experience, we convert the viewing QoE into a satisfaction quality score, termed the Q-score, for scenes with different resolutions in each video segment. Additionally, we developed an optimal segment assignment (OSA) algorithm for Q-score optimization in environments characterized by a constrained network bandwidth. Our experimental results show that application of the OSA algorithm to the playback schedule significantly improved users’ viewing satisfaction.
Published: 2021

24. A Novel Pipeline Approach for Efficient Big Data Broadcasting

Author: Chi-Jen Wu, Chin-Fu Ku, Jan-Ming Ho, and Ming-Syan Chen
Subjects: Distributed database, business.industry, Computer science, Distributed computing, Node (networking), Pipeline (computing), 020206 networking & telecommunications, Cloud computing, 02 engineering and technology, Broadcasting, Computer Science Applications, Data modeling, Upload, Computational Theory and Mathematics, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Information Systems
Abstract: Big-data computing is a new critical challenge for the ICT industry. Engineers and researchers are dealing with data sets of petabyte scale in the cloud computing paradigm. Thus, the demand for building a service stack to distribute, manage, and process massive data sets has risen drastically. In this paper, we investigate the Big Data Broadcasting problem for a single source node to broadcast a big chunk of data to a set of nodes with the objective of minimizing the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed datacenters. This problem is one of the fundamental problems in distributed computing and is known to be NP-hard in heterogeneous environments. We model the Big-data broadcasting problem into a LockStep Broadcast Tree (LSBT) problem. The main idea of the LSBT model is to define a basic unit of upload bandwidth, $r$ , such that a node with capacity $c$ broadcasts data to a set of $\lfloor c/r\rfloor$ children at the rate $r$ . Note that $r$ is a parameter to be optimized as part of the LSBT problem. We further divide the broadcast data into $m$ chunks. These data chunks can then be broadcast down the LSBT in a pipeline manner. In a homogeneous network environment in which each node has the same upload capacity $c$ , we show that the optimal uplink rate $r^*$ of LSBT is either $c/2$ or $c/3$ , whichever gives the smaller maximum completion time. For heterogeneous environments, we present an $O(n \text{log}^2 n)$ algorithm to select an optimal uplink rate $r^*$ and to construct an optimal LSBT. Numerical results show that our approach performs well with less maximum completion time and lower computational complexity than other efficient solutions in literature.
Published: 2016

25. An optimal scheduling algorithm for DASH video streaming over variable bit rate networks

Author: Kuan Jen Wang, Jan-Ming Ho, and Shin Hung Chang
Subjects: Dynamic Adaptive Streaming over HTTP, Computer Networks and Communications, Hardware and Architecture, Computer science, Dash, IPTV, Quality of experience, Service provider, Variable bitrate, Algorithm, Software, Heterogeneous network, Scheduling (computing)
Abstract: With the rapid increase of network bandwidth, it becomes popular for people to watch video over the internet. To cope with dynamic and heterogeneous network condition, commercial service providers use DASH streaming technology to serve contents to their users. Under the limited bandwidth constraint, previous scheduling algorithms usually arrange as many as possible highest-resolution segments in a DASH streaming service. However, scheduling the most highest-resolution segments will lead to arrange the most lowest-resolution segments in this streaming service simultaneously. In this paper, we address that improving the whole video streaming quality should start from the key concept of scheduling minimum number of low resolution segments iteratively. We define an M-Low optimisation problem and propose a novel M-Low scheduling algorithm, which adjusts the video resolution and optimises the QoE indices in a DASH streaming service. Moreover, we show through simulations that our proposed MLow scheduling achieves a higher QoE measures than those of previously published algorithms.
Published: 2020

26. CloudEC: A MapReduce-based algorithm for correcting errors in next-generation sequencing big data

Author: Chung Yen Lin, Wei-Chun Chung, Jan-Ming Ho, and Der-Tsai Lee
Subjects: 0301 basic medicine, 03 medical and health sciences, 030104 developmental biology, Contig, business.industry, Computer science, Big data, Sequence assembly, Genomics, Computational problem, business, Algorithm, DNA sequencing
Abstract: Due to advancement in next-generation sequencing (NGS) technology, ultra-large datasets have been generated for further studies. These datasets contain a large amount of erroneous data. It is a computational problem to detect and correct the errors to improve performance of future applications, e.g., de novo assembly. In this article, we present a MapReduce-based algorithm, CloudEC, to correct sequencing errors in NGS data. We use five datasets and GAGE benchmark to compare efficiency and efficacy of CloudEC with CloudRS and error correction module of ALLPATHS-LG. Experiments show that CloudEC is 1.6 times faster than CloudRS. The corrected reads are then assembled by Velvet assembler, and the assembled results are examined by GAGE benchmark for correctness of assembly. It is shown that the correct N50 of the assembled contigs with CloudEC as the error corrector is larger than that with CloudRS as the error corrector, and is comparable to that with ALLPATHS-LG as the error corrector.
Published: 2017

27. Resource Delivery Service System for User Engagement Improvement

Author: Chen-Sheng Gu, Cheng-Yu Liu, Ray-I Chang, Hung-Min Hsu, Jan-Ming Ho, and Lee-Tse Ting
Subjects: Service system, Open platform, Multimedia, Computer science, business.industry, Deep learning, 05 social sciences, 050301 education, Construct (python library), computer.software_genre, Open educational resources, Metadata, Identification (information), Resource (project management), 0502 economics and business, ComputingMilieux_COMPUTERSANDEDUCATION, Artificial intelligence, 050207 economics, business, 0503 education, computer
Abstract: Open educational resources (OER) are important assets for students or teachers, used to help them search for useful resources. However, it is a challenge to improve the user engagement of OER. In this paper, we propose a system, called resource delivery service system (RDSS), in the Taiwan Open Platform for Educational Resources (TOPER) that actively recommends educational resources to users. RDSS includes three modules: high-quality resource identification, teaching subject identification, and teacher attribute identification. These modules can be used to recommend resources to users of TOPER. We applied deep learning and support vector machine to construct these modules in RDSS. The experimental results demonstrated that RDSS can achieve an accuracy of 86% in high-quality resource identification, an accuracy over 88% in teaching subject identification, and an accuracy of 86% in teacher attribute identification.
Published: 2017

28. An Innovative Framework for Building Agency-Free Credit Rating Systems

Author: Chung-Su Wu, Chia-Hsiang Chang, William W. Y. Hsu, Jan-Ming Ho, Jen-Ying Shih, and Wei-Chen Liou
Subjects: 050208 finance, Computer science, media_common.quotation_subject, 05 social sciences, Stability (learning theory), Logistic regression, Random forest, Data modeling, Support vector machine, Credit rating, Work (electrical), 0502 economics and business, Econometrics, Quality (business), 050207 economics, media_common
Abstract: We develop an intelligent credit rating system that can provide debtors' rating information without involving credit rating agencies. Several models are used for credit scoring in our work, including the Duffie's model, logistic regression, and random forest. We compare the performance of these models and build an in-depth understanding of the evaluation of credit rating. Furthermore, we propose a new framework to evaluate the performance of credit ratings from multiple perspectives. The framework contains two components, defaulter recognition and rating quality. We use generic indices, area under curve (AUC) of receiver operating characteristics (ROC) and log loss, to evaluate the defaulter recognition ability of credit rating models. However, rating quality is more complicated than defaulter recognition. Inspired by rating agencies, we propose indices that reflect stability and migration of rating. We also adopt minimum default distance and rating path for evaluation of rating quality. Experimental results indicate that random forest (RF) has the best performance among the generic indices, but its stability is demonstrated to be 63% lower than the other two models. Similar results were found in the ratings path to default and minimum default distance. In this study, we specify a general evaluation framework of credit rating system and reveal the possibility of agency-free credit rating.
Published: 2017

29. Linear-time accurate lattice algorithms for tail conditional expectation

Author: William W. Y. Hsu, Jan-Ming Ho, Bryant Chen, and Ming-Yang Kao
Subjects: High Energy Physics::Lattice, Extrapolation, Trinomial, Conditional expectation, Computer Science Applications, Computational Mathematics, Interpolation error, Lattice (order), Prefix sum, Computer Vision and Pattern Recognition, Algorithm, Time complexity, Finance, Value at risk, Mathematics
Abstract: This paper proposes novel lattice algorithms to compute tail conditional expectation of European calls and puts in linear time. We incorporate the technique of prefix-sum into tilting, trinomial, and extrapolation algorithms as well as some syntheses of these algorithms. Furthermore, we introduce fractional-step lattices to help reduce interpolation error in the extrapolation algorithms. We demonstrate the efficiency and accuracy of these algorithms with numerical results. A key finding is that combining the techniques of tilting lattice, extrapolation, and fractional steps substantially increases speed and accuracy.
Published: 2014

30. Homomorphic encryption application on FinancialCloud framework

Author: William W. Y. Hsu, Jan-Ming Ho, Hsin-Tsung Peng, and Min-Ruey Yu
Subjects: Homomorphic secret sharing, business.industry, Computer science, Data_MISCELLANEOUS, Client-side encryption, computer.software_genre, Computer security, Encryption, Filesystem-level encryption, Probabilistic encryption, Link encryption, Attribute-based encryption, On-the-fly encryption, business, computer
Abstract: Data security and privacy is a major concern for the users while using software services on the cloud. When users want to compute on a cloud service, traditional encryption schemes can be applied to encrypt and transfer the data to the cloud service. However, the service provider must decrypt the data for input into their computational model and thus the data content is exposed. If users do not want service providers to know what they are computing, then computing on encrypted data preserving privacy is an important issue. Homomorphic encryption is an encryption method where computations can be performed on the ciphertext, and the decrypted result of these computations is the same as if the computations were performed on the plaintext. However, the performance of this approach is currently inefficient. This paper presents an application of homomorphic encryption method on an open financial cloud framework (FinancialCloud) to perform calculations on encrypted data, therefore securing the data throughout the whole process. We demonstrate by example, showing that by applying improved algorithms can lessen the deficiencies induced by homomorphic encryptions.
Published: 2016

31. Frame Dispatcher: A Multi-frame Classification System for Social Movement by Using Microblogging Data

Author: Ray-I Chang, Dung-Sheng Chen, Wei-Sheng Zeng, Hung-Min Hsu, Jan-Ming Ho, Chen-Shuo Hung, and Shian-Hua Lin
Subjects: Framing (visual arts), 050402 sociology, Information retrieval, Subconscious, Microblogging, Computer science, media_common.quotation_subject, 05 social sciences, 050801 communication & media studies, Multi frame, World Wide Web, 0508 media and communications, Framing (social sciences), 0504 sociology, Framing (construction), Phenomenon, Social media, Social movement, media_common
Abstract: Framing is a phenomenon that is studied and debated widely in sociology and political science. It refers to the manner in which audiences interpret information and justify their claims or activities. The subconscious influence of framing might lead to opinion changes and social movements. However, multi-frame classification on microblogging data has not yet been investigated. In this study, we aim to classify a large number of posts into frames. We describe in detail the implementation of a new algorithm for multi-frame classification tasks called Frame Dispatcher, which aims to classify microblogging data into frames. In our experiments, we extracted over 15,000 posts from approximately 200 Facebook fan pages concerning an anti-curriculum student movement. The experimental results show that Frame Dispatcher can classify microblogging data into frames efficiently and effectively.
Published: 2016

32. A Scalable Server Architecture for Mobile Presence Services in Social Network Applications

Author: Jan-Ming Ho, Chi-Jen Wu, and Ming-Syan Chen
Subjects: Social network, Computer Networks and Communications, business.industry, Computer science, Distributed computing, Mobile computing, Telecommunications service, Cloud computing, Search algorithm, Network address, Server, Mobile search, The Internet, Mobile telephony, Electrical and Electronic Engineering, Presence service, business, Mobile device, Software, Computer network
Abstract: Social network applications are becoming increasingly popular on mobile devices. A mobile presence service is an essential component of a social network application because it maintains each mobile user's presence information, such as the current status (online/offline), GPS location and network address, and also updates the user's online friends with the information continually. If presence updates occur frequently, the enormous number of messages distributed by presence servers may lead to a scalability problem in a large-scale mobile presence service. To address the problem, we propose an efficient and scalable server architecture, called PresenceCloud, which enables mobile presence services to support large-scale social network applications. When a mobile user joins a network, PresenceCloud searches for the presence of his/her friends and notifies them of his/her arrival. PresenceCloud organizes presence servers into a quorum-based server-to-server architecture for efficient presence searching. It also leverages a directed search algorithm and a one-hop caching strategy to achieve small constant search latency. We analyze the performance of PresenceCloud in terms of the search cost and search satisfaction level. The search cost is defined as the total number of messages generated by the presence server when a user arrives; and search satisfaction level is defined as the time it takes to search for the arriving user's friend list. The results of simulations demonstrate that PresenceCloud achieves performance gains in the search cost without compromising search satisfaction.
Published: 2013

33. Scheduling of Optimal DASH Streaming

Author: Yu-Chi Liu, Ray-I Chang, Chi-Jen Wu, Wei-Chun Chung, Meng-Huang Lee, Kuan-Jen Wang, Shin-Hung Chang, and Jan-Ming Ho
Subjects: business.industry, Computer science, Real-time computing, 020206 networking & telecommunications, 020207 software engineering, IPTV, 02 engineering and technology, Scheduling (computing), Dynamic Adaptive Streaming over HTTP, Digital subscriber line, Rate change, Dash, 0202 electrical engineering, electronic engineering, information engineering, Quality of experience, business, Computer network
Abstract: DASH (Dynamic Adaptive Streaming over HTTP) is now the most popular standard in video streaming. For supporting DASH video transmission over residential networks with small bandwidth variation (such as DSL based network for IPTV), we design an optimal transmission schedule, L2H. Given a transmission rate and an initial delay, the schedule can optimize the QoE (quality of experience) metrics such as rebuffering, lexicographically maximal resolution, number of rate switching events, and smoothness of the rate change. We further present L2HB for considering usage of the system buffer when applying the L2H. L2HB comes up with its benefit when comparing with other research work by objective QoE evaluations. Besides, by introducing the system buffer size constraint, the proposed algorithm can control the transmission schedule to let highest-resolution segments appear as soon as possible for prompting the users to stay tuned.
Published: 2016

34. Disambiguating authors in citations on the web and authorship correlations

Author: William W. Y. Hsu, Hsin-Tsung Peng, Cheng-Yu Lu, and Jan-Ming Ho
Subjects: World Wide Web, Matching (statistics), Information retrieval, Artificial Intelligence, Computer science, Citation analysis, Similarity (psychology), General Engineering, Digital library, Degree (music), Computer Science Applications
Abstract: Members of the academic community have increasingly turned to digital libraries to search for the latest work of their peers. On account of their role in the academic community, it is very important that these digital libraries collect citations in a consistent, accurate, and up-to-date manner, yet they do not correctly compile citations for myriads of authors for various reasons including authors with the same name, a problem known as the ''name ambiguity problem.'' This problem occurs when multiple authors share the same name and particularly when names are simplified as in cases where names merely contain the first initial and the last name. This paper proposes a reliable and accurate pair-wise similarities approach to disambiguate names using supervised classification on Web correlations and authorship correlations. This approach makes use of Web correlations among citations assuming citations that co-refer on publication lists on the Web should to refer to the same author. This approach also makes use of authorship correlations assuming citations with the same rare author name refer to the same author, and furthermore, citations with the same full names of authors or e-mail addresses likely refer to the same author. These two types of correlations are measured in our approach using pair-wise similarity metrics. In addition, a binary classifier, as part of supervised classification, is applied to label matching pairs of citations using pair-wise similarity metrics, and these labels are then used to group citations into different clusters such that each cluster represents an individual author. Results show our approach greatly improves upon the name disambiguation accuracy and performance of other proposed approaches, especially in some name clusters with high degree of ambiguity.
Published: 2012

35. Design and Implementation of a Digital Video Archive System

Author: Yen-Chun Lin, Jan-Ming Ho, and Hsiang-An Wang
Subjects: Multimedia, Computer Networks and Communications, Computer science, Digital video, Digital library, computer.software_genre, computer, Software
Published: 2012

36. BibPro: A Citation Parser Based on Sequence Alignment

Author: Chien-Chih Chen, Kai-Hsiang Yang, Jan-Ming Ho, and Chuen-Liang Chen
Subjects: Parsing, Information retrieval, Computer science, business.industry, String (computer science), computer.software_genre, Computer Science Applications, Metadata, Information extraction, Computational Theory and Mathematics, Citation analysis, The Internet, Citation, business, computer, Information Systems, Data integration
Abstract: Dramatic increase in the number of academic publications has led to growing demand for efficient organization of the resources to meet researchers' needs. As a result, a number of network services have compiled databases from the public resources scattered over the Internet. However, publications by different conferences and journals adopt different citation styles. It is an interesting problem to accurately extract metadata from a citation string which is formatted in one of thousands of different styles. It has attracted a great deal of attention in research in recent years. In this paper, based on the notion of sequence alignment, we present a citation parser called BibPro that extracts components of a citation string. To demonstrate the efficacy of BibPro, we conducted experiments on three benchmark data sets. The results show that BibPro achieved over 90 percent accuracy on each benchmark. Even with citations and associated metadata retrieved from the web as training data, our experiments show that BibPro still achieves a reasonable performance.
Published: 2012

37. A novel content based image retrieval system using K-means/KNN with feature extraction

Author: Jan-Ming Ho, Ray-I Chang, Yu-Chung Wang, Shu-Yu Lin, and Chi-Wen Fann
Subjects: General Computer Science, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, k-means clustering, Content-based image retrieval, computer.software_genre, Automatic image annotation, Image texture, Visual Word, Data mining, Cluster analysis, Image retrieval, computer
Abstract: Image retrieval has been popular for several years. There are different system designs for content based image retrieval (CBIR) system. This paper propose a novel system architecture for CBIR system which combines techniques include content-based image and color analysis, as well as data mining techniques. To our best knowledge, this is the first time to propose segmentation and grid module, feature extraction module, K-means and k-nearest neighbor clustering algorithms and bring in the neighborhood module to build the CBIR system. Concept of neighborhood color analysis module which also recognizes the side of every grids of image is first contributed in this paper. The results show the CBIR systems performs well in the training and it also indicates there contains many interested issue to be optimized in the query stage of image retrieval.
Published: 2012

38. Complete font generation of Chinese characters in personal handwriting style

Author: Ray-I Chang, Jeng-Wei Lin, Chian-Ya Hong, Yu-Chun Wang, Jan-Ming Ho, and Shu-Yu Lin
Subjects: business.industry, Computer science, Character (computing), Speech recognition, Character encoding, computer.software_genre, Style (sociolinguistics), ComputingMilieux_GENERAL, Handwriting, Font, Preprocessor, Artificial intelligence, User interface, Chinese characters, business, computer, Natural language processing
Abstract: Since a complete Chinese font has typically several thousand or more Chinese characters and symbols, and most of them are much more complicated than English alphabets, it takes a lot of time and efforts for even professional font engineers to create a Chinese font. Although several attempts had been made to synthesize Chinese characters from strokes and components, it is still not easy to synthesize so many Chinese characters at one time. In this paper, we present an easy and fast solution for an ordinary user to create a Chinese font of his or her handwriting style. We adopt the approach: to synthesize Chinese characters using components extracted from the user's handwritings. In the preprocessing phase, we built a Web interface for crowds to label the positions and sizes of components of every Chinese character in the target character set. The standard Kai font was selected as a reference. We also devised an algorithm to find a small subset of Chinese characters having all required components to synthesize other Chinese characters. To create a personal handwriting font, with commonly-used 3,914 traditional Chinese characters, a user only has to handwrite 400 or so Chinese characters on a pad. One character by one character, our system can track every stroke, recognize and extract components from the user's handwritings. Then, every target Chinese character is synthesized from the extracted components, by placing them properly according to their position and size information. The experiment results show that although manually fine-tune is still required for few synthesized Chinese characters, users can create a Chinese font of their personal handwriting styles more easily and quickly.
Published: 2015

39. Subject-Keyphrase Extraction Based on Definition-Use Chain

Author: Yu-Jung Chang, Ray-I Chang, You-Jyun Wang, Shu-Yu Lin, Jan-Ming Ho, and Hung-Min Hsu
Subjects: Information retrieval, Relation (database), Computer science, business.industry, Feature extraction, Object (grammar), Information processing, Subject (documents), computer.software_genre, Artificial intelligence, Computational linguistics, business, computer, Natural language processing, Sentence
Abstract: In this paper, we propose a new concept called subject-keyphrase and also introduce a method to extract subject-keyphrases from a document. Subject-keyphrases refer to the words or phrases used to represent the sentence subjects of a document, i.e., the content of a document is organized around the subjects of the document. It can be expected that each sentence of a document is composed of a subject and an object, where the subject is defined in relation to its object. Using "definition" and "use" relations, we may thus identify the subjects in a given document by looking for keyphrases which appear frequently as subjects of sentences in the document. We thus present a subject-keyphrase extraction (SKE) algorithm based on the notion of definition-use chain (DU Chain) to identify subject-keyphrases. Experimental results show that SKE can successfully identify the subject-keyphrases to effectively capture the main idea of a document.
Published: 2015

40. A Content-Based Knowledge and Data Intensive System for Archaeological Motif Recognition

Author: Chao-Lung Ting, Ray-I Chang, Man-Fong Cheng, Lin Shu-Yu, Jan-Ming Ho, and Yu-Chun Wang
Subjects: Computer science, Template matching, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Iterative reconstruction, Motif (music), Image retrieval, Archaeology
Abstract: Archaeological excavations typically turn up large amounts of fragments from different ceramic objects. To preserve fragile historical remains, archaeologists generally rely on digital imaging techniques which are time consuming and require special technical training. Previous attempts to devise ways of facilitating this process have concentrated on the automatic assessment and recording of the shape features of potter forms to allow for automated reconstruction. In this paper, we present a content-based knowledge and data intensive system for archaeological motif recognition. It automatically recognizes motifs on fragments, thus facilitating human experts in reconstructing archaeological artifacts and in managing knowledge related to the decorative motif. The system consists of three modules: fragment images preprocessing, motif encoding, and element detection and template matching. The fragment preprocessing module reduces noise and provides a user interface to define region of interest (ROI). The motif encoding module then introduces the Lapita Pottery Online Database with element definitions. Finally, in the element detection and template matching module, detected elements are labeled and the related motif is recommended. Experimental results indicate that the proposed method not only increases scalability, but also contributes to the knowledge management and discovery of reconstructed archaeological relics.
Published: 2015

41. Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

Author: Wei-Chun Chung, Chih-Hao Fang, Yu-Jung Chang, Jan-Ming Ho, Chung Yen Lin, and Ping-Heng Hsieh
Subjects: Genetics, Whole genome sequencing, Cancer genome sequencing, Massive parallel sequencing, Shotgun sequencing, Research, 2 base encoding, Computational Biology, High-Throughput Nucleotide Sequencing, Sequence assembly, Hybrid genome assembly, Sequence Analysis, DNA, Computational biology, Biology, Deep sequencing, Perciformes, Contig Mapping, Bacillus cereus, Genome Size, Escherichia coli, Animals, Software, Biotechnology
Abstract: Recent progress in next-generation sequencing technology has afforded several improvements such as ultra-high throughput at low cost, very high read quality, and substantially increased sequencing depth. State-of-the-art high-throughput sequencers, such as the Illumina MiSeq system, can generate ~15 Gbp sequencing data per run, with >80% bases above Q30 and a sequencing depth of up to several 1000x for small genomes. Illumina HiSeq 2500 is capable of generating up to 1 Tbp per run, with >80% bases above Q30 and often >100x sequencing depth for large genomes. To speed up otherwise time-consuming genome assembly and/or to obtain a skeleton of the assembly quickly for scaffolding or progressive assembly, methods for noise removal and reduction of redundancy in the original data, with almost equal or better assembly results, are worth studying. We developed two subset selection methods for single-end reads and a method for paired-end reads based on base quality scores and other read analytic tools using the MapReduce framework. We proposed two strategies to select reads: MinimalQ and ProductQ. MinimalQ selects reads with minimal base-quality above a threshold. ProductQ selects reads with probability of no incorrect base above a threshold. In the single-end experiments, we used Escherichia coli and Bacillus cereus datasets of MiSeq, Velvet assembler for genome assembly, and GAGE benchmark tools for result evaluation. In the paired-end experiments, we used the giant grouper (Epinephelus lanceolatus) dataset of HiSeq, ALLPATHS-LG genome assembler, and QUAST quality assessment tool for comparing genome assemblies of the original set and the subset. The results show that subset selection not only can speed up the genome assembly but also can produce substantially longer scaffolds. Availability: The software is freely available at https://github.com/moneycat/QReadSelector .
Published: 2015

42. Optimal scheduling of QoE-aware HTTP adaptive streaming

Author: Chi-Jen Wu, Wei-Chun Chung, Yu-Chi Liu, Ray-I Chang, Yu-Hsien Chu, and Jan-Ming Ho
Subjects: Service (systems architecture), Task (computing), User experience design, business.industry, Computer science, Heuristic (computer science), Real-time computing, Bit rate, Bandwidth (computing), The Internet, Service provider, business, Computer network
Abstract: Recently HTTP adaptive streaming (HAS) has been leading the trend in the delivery of video content over Internet. The scope of this technology describes that a video is segmented into small intervals and encoded in different qualities for adapting the variance of network bandwidth. This method allows the client adjusting the quality of the requested stream dynamically. To the best of our knowledge, most heuristic algorithms proposed for HAS run a risk of freezing in video playback and thus induce a poor Quality of Experience (QoE). Therefore, how to maintain a good user experience becomes a more challenging task for the service provider. In this paper, we propose a optimal scheduling of QoE-aware HAS method which achieves the following QoE requirements. (1) Avoiding video streaming freezing (i.e. the client buffer underflow). (2) Minimizing the initial delay before the video streaming starts playing. Testing by the video Aladdin, experimental results show that our method can find the best QoE service for HAS under various QoE requirements and resource constraints.
Published: 2015

43. Variant Chinese Domain Name Resolution

Author: Jeng-Wei Lin, Feipei Lai, Jan-Ming Ho, and Li-Ming Tseng
Subjects: General Computer Science, Database, Computer science, Programming language, business.industry, Search engine indexing, ASCII, computer.software_genre, Internet Standard, Zone file, Scripting language, Scalability, The Internet, Artificial intelligence, Equivalence (formal languages), business, computer
Abstract: Many efforts in past years have been made to lower the linguistic barriers for non-native English speakers to access the Internet. Internet standard RFC 3490, referred to as IDNA (Internationalizing Domain Names in Applications), focuses on access to IDNs (Internationalized Domain Names) in a range of scripts that is broader in scope than the original ASCII. However, the use of character variants that have similar appearances and/or interpretations could create confusion. A variant IDL (Internationalized Domain Label), derived from an IDL by replacing some characters with their variants, should match the original IDL; and thus a variant IDN does. In RFC 3743, referred to as JET (Joint Engineering Team) Guidelines, it is suggested that zone administrators model this concept of equivalence as an atomic IDL package. When an IDL is registered, an IDL package is created that contains its variant IDLs generated according to the zone-specific Language Variant Tables (LVTs). In addition to the registered IDL, the name holder can request the domain registry to activate some of the variant IDLs, free or by an extra fee. The activated variant IDLs are stored in the zone files, and thus become resolvable. However, an issue of scalability arises when there is a large number of variant IDLs to be activated. In this article, the authors present a resolution protocol that resolves the variant IDLs into the registered IDL, specifically for Han character variants. Two Han characters are said to be variants of each other if they have the same meaning and are pronounced the same. Furthermore, Han character variants usually have similar appearances. It is not uncommon that a Chinese IDL has a large number of variant IDLs. The proposed protocol introduces a new RR (resource record) type, denoted as VarIdx RR, to associate a variant expression of the variant IDLs with the registered IDL. The label of the VarIdx RR, denoted as the variant index, is assigned by an indexing function that is designed to give the same value to all of the variant IDLs enumerated by the variant expression. When one of the variant IDLs is accessed, Internet applications can compute the variant index, look up the VarIdx RRs, and resolve the variant IDL into the registered IDL. The authors examine two sets of Chinese IDLs registered in TWNIC and CNNIC, respectively. The results show that for a registered Chinese IDL, a very small number of VarIdx RRs, usually one or two, are sufficient to activate all of its variant IDLs. The authors also represent a Web redirection service that employs the proposed resolution protocol to redirect a URL addressed by a variant IDN to the URL addressed by the registered IDN. The experiment results show that the proposed protocol successfully resolves the variant IDNs into the registered IDNs.
Published: 2008

44. Proof: A Novel DHT-Based Peer-to-Peer Search Engine

Author: Kai-Hsiang Yang and Jan-Ming Ho
Subjects: Information retrieval, Computer Networks and Communications, Computer science, Intersection (set theory), business.industry, Peer-to-peer, computer.software_genre, Inverted index, law.invention, Shared resource, Search engine, PageRank, law, Overhead (computing), Artificial intelligence, Electrical and Electronic Engineering, business, computer, Software
Abstract: In this paper we focus on building a large scale keyword search service over structured Peer-to-Peer (P2P) networks. Current state-of-the-art keyword search approaches for structured P2P systems are based on inverted list intersection. However, the biggest challenge in those approaches is that when the indices are distributed over peers, a simple query may cause a large amount of data to be transmitted over the network. We propose in this paper a new P2P keyword search scheme, called "Proof," which aims to reduce the network traffic generated during the intersection process. We applied three main ideas in Proof to reduce network traffic, including (1) using a sorted query flow, (2) storing content summaries in the inverted lists, and (3) setting a stop condition for the checking of content summaries. We also discuss the advantages and limitations of Proof, and conducted extensive experiments to evaluate the search performance and the quality of search results. Our simulation results showed that, compared with previous solutions, Proof can dramatically reduce network traffic while providing 100% precision and high recall of search results, at some additional storage overhead.
Published: 2007

45. SOP: Smart offloading proxy service for wireless content uploading over crowd events

Author: Chi-Jen Wu, Wei-Chun Chung, Hung-Ta Tai, Ray-I Chang, and Jan-Ming Ho
Subjects: Radio access network, Access network, Social network, Computer science, business.industry, Wireless network, Mobile computing, Wireless WAN, Mobile Web, Scheduling (computing), Public land mobile network, Upload, Server, Mobile station, Cellular network, Mobile database, Mobile search, Wireless, Multi-frequency network, The Internet, Mobile telephony, business, Municipal wireless network, Computer network
Abstract: Since commercialization of the Internet in mid-1990, human culture has changed dramatically due to booming of innovative online services. To cope with demands of network service providers to effectively and efficiently deliver contents and services to their clients, content distribution network technologies have been developed. These technologies have been optimized to transfer data from servers to users. It is not meant to support mobile users to upload and share real-time captured multimedia contents with his/her peers through social network services. It is even more difficult for the current network to support mobile users in a hot social event to shared live pictures and videos to their social network in real time. In this paper, we present our design of smart offloading proxy (SOP) service for wireless content uploading over crowd events. To test efficiency of SOP, we simulate a mobile network environment with wireless access network connected through a long-haul WAN to the target social network server. Preliminary experiments show that with an error rate of 1 % in the WAN, a mobile user may experience long file-uploading time of approximately 100 seconds in uploading a 10 MB file when a 54 Mbps WIFI media is simultaneously accessed by users randomly arrived at the inter-arrival time of 10 seconds. In contrast, it takes less than 10 seconds to upload the file if the WIFI is lightly loaded and the WAN is error free. We also show that by properly scheduling WIFI bandwidth to the mobile users, the file-uploading time can be reduced.
Published: 2015

46. FOS: A Funnel-Based Approach for Optimal Online Traffic Smoothing of Live Video

Author: Ray-I Chang, Jeng-Wei Lin, Feipei Lai, and Jan-Ming Ho
Subjects: Heuristic (computer science), Computer science, Frame (networking), Real-time computing, Variable bitrate, Upper and lower bounds, Computer Science Applications, Sliding window protocol, Signal Processing, Media Technology, Bandwidth (computing), Electrical and Electronic Engineering, Communication complexity, Time complexity, Smoothing
Abstract: Traffic smoothing is an efficient means to reduce the bandwidth requirement for transmitting a variable-bit-rate video stream. Several traffic-smoothing algorithms have been presented to offline compute the transmission schedule for a prerecorded video. For live video applications, Sen present a sliding-window algorithm, referred to as SLWIN(k), to online compute the transmission schedule on the fly. SLWIN(k) looks ahead W video frames to compute the transmission schedule for the next k frametimes, where klesw. Note that W is upper bounded by the initial delay of the transmission. The time complexity of SLWIN(k) is O(W*N/k) for an N frame live video. In this paper, we present an O(N) online traffic-smoothing algorithm and two variants, denoted as FOS, FOS1 and FOS2, respectively. Note that O(N) is a trivial lower bound of the time complexity of the traffic-smoothing problem. Thus, the proposed algorithm is optimal. We compare the performance of our algorithms with SLWIN(k) based on several benchmark video clips. Experiment results show that FOS2, which adopts the aggressive workahead heuristic, further reduces the bandwidth requirement and better utilizes the client buffer for real-time interactive applications in which the initial delays are small
Published: 2006

47. IDN server proxy architecture for Internationalized Domain Name resolution and experiences with providing Web services

Author: Feipei Lai, Jeng-Wei Lin, Jan-Ming Ho, and Li-Ming Tseng
Subjects: Name server, Web server, Computer Networks and Communications, Application server, Computer science, business.industry, Domain Name System, computer.software_genre, Domain (software engineering), World Wide Web, Root name server, Server, The Internet, business, computer
Abstract: The composition of traditional domain names are restricted to ASCII letters, digits, and hyphens (abbreviated as LDH). This makes it difficult for many to use their native language to name and access their Internet hosts. The IETF IDN (Internationalized Domain Name) Working Group proposes a mechanism, IDNA (Internationalizing Domain Names in Applications), for internationalized access to multilingual domain names. The proposal uses a preparation process that converts a Unicode IDN into an ACE (ASCII Compatible Encoding) string that uses only LDH. Thus, applications can look up the ACE string by using the existing DNS infrastructure. However, some of the domain name strings embedded in multilingual content do not have any charset tag so they cannot appropriately be converted into ACE strings. We noticed that many Internet applications allow users to use non-ASCII domain names. We were motivated to design an architecture for IDN resolution as well as to minimize the cost of modifying legacy Internet applications. We specifically focus on designing an IDN server proxy, which is located on the domain name server side, to handle domain names in multiple encodings. In this article, we study several architecture design issues including detection of charset encoding, routing of non-ACE IDN lookup requests, and so on.With respect to these design issues, we present an IDN server proxy architecture which stores ACE IDNs in domain name servers. Note that traditional domain name servers can be used without modification. An IDN server proxy, called Octopus, is employed on the domain name server side. Octopus converts a non-ACE IDN string into ACE upon receiving an IDN lookup request from remote users or autonomous systems. The ACE string is then forwarded to backend domain name servers (where the traditional domain names and ACE IDNs are stored) for further processing. This allows Internet users to access IDNs without having to upgrade their software.Based on the design and implementation of Octopus, we deployed a CDN (Chinese domain name) trial in July 2002. In this article, we present the results of testing Octopus IDN lookup functions as well as our experiences in providing CDN Web services. Several types of errors can occur if applications are unable to handle IDNs adequately. For example, a Web browser may erroneously parse an IDN within a URL. Many legacy Web servers are unable to process the IDN of a virtual host. Web application servers may have trouble completing some actions such as redirecting Web pages to alternative Web pages. Our studies help service providers understand potential problems when non-ASCII domain names are used and the best common practice at this stage. As well, the experiences give some guidance for software developers to develop IDNA-compliant Internet applications.
Published: 2006

48. On the design of trading schemes of equity funds based on random traders

Author: Mu-En Wu, Jan-Ming Ho, Chuan-Ju Wang, William W. Y. Hsu, and Ta-Wei Hung
Subjects: Fund of funds, Private equity fund, Private equity secondary market, Open-end fund, Econometrics, Closed-end fund, Passive management, Business, Algorithmic trading, computer.software_genre, Stock market index, computer
Abstract: We propose a novel approach, called random traders, to benchmark equity funds' performance. A random trader adopts an all-in-all-out strategy to buy and sell the market index at random timing with capital being negligible as compared with the market size. With the empirical distribution of a random trader's return, each equity fund is scored by the proportion of random traders with poorer performance. Based on this technique, we develop two trading schemes to show the profitability of random traders. The Scheme I achieves the accumulated profit of 104% by backtesting in 396 equity funds investing in Taiwan market from June 2004 to December 2012. Furthermore, we develop Scheme II for the purpose of reducing frequency of trading when taking account into the transaction fees. The experimental results show the accumulated profit of Scheme II achieves 111.93% with the transaction fees 1.5% for each trade. Compared to the traditional method, always investing funds of top 10% performance, our trading scheme gets more 62.32% profit during this period. Moreover, the performance of Scheme II achieves the top 3% of that in these 396 funds.
Published: 2014

49. Using geometric structures to improve the error correction algorithm of high-throughput sequencing data on MapReduce framework

Author: Wei-Chun Chung, Jan-Ming Ho, Der-Tsai Lee, and Yu-Jung Chang
Subjects: Set (abstract data type), Scheme (programming language), Computer science, Sequence assembly, Data mining, Error detection and correction, computer.software_genre, computer, Throughput (business), DNA sequencing, DNA Resequencing, computer.programming_language
Abstract: Next-generation sequencing (NGS) data are a rapidly growing example of big data and a source of new knowledge in science. However, sequencing errors remain unavoidable and reduce the quality of NGS data. Error correction, therefore, is a critical step in the successful utilization of NGS data, including de novo genome assembly and DNA resequencing. Since NGS throughput doubles approximately every five months and the length of NGS records (i.e., reads) is increasing, improvements in efficiency and effectiveness of computational strategies are needed. In this study, we aim to improve the performance of CloudRS, an open-source MapReduce application designed to correct sequencing errors in NGS data. We introduce the readmessage (RM) diagram to represent the set of messages, i.e., the key-value pairs generated on each read. We also present the Gradient-number Votes (GNV) scheme in order to trim off portions of the RM diagram, thereby reducing the total size of messages associated with each read. Experimental results show that the GNV scheme successfully reduce execution time and improve the quality of the de novo genome assembly.
Published: 2014

50. An optimal cache algorithm for streaming VBR video over a heterogeneous network

Author: Shin-Hung Chang, Ray-I Chang, Jan-Ming Ho, and Yen-Jen Oyang
Subjects: Computer Networks and Communications, Computer science, business.industry, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Real-time computing, Variable bitrate, Vbr video, Wide area network, Bit rate, Bandwidth (computing), Cache, Proxy (statistics), business, Cache algorithms, Algorithm, Heterogeneous network, Computer network
Abstract: High quality video content for on-demand services is usually stored and streamed in a compressed format with a VBR (variable bit rate) property; however, the streaming traffic is extremely bursty. If there is no client buffer to regulate the video's delivery, the backbone WAN (wide area network) bandwidth needs to allocate the video's peak bit rate to guarantee playback quality. To reduce the bandwidth requirement in the backbone WAN, previous researchers have proposed a Video Staging Mechanism to cache portions of the video in a video proxy close to clients. In this paper, we propose a very effective OC (optimal cache) algorithm to handle the Video Staging Mechanism and prove theoretically that the proxy cache computed by our OC algorithm for each video is minimal when all other resources remain constant. On the basis of experiment results, we cache the least amount of video data in the video proxy by using the OC algorithm, and reduce the WAN bandwidth requirement by an amount equal to that of conventional algorithms. In contrast, given the equal size of the storage in a video proxy, the OC algorithm reduces the bandwidth requirement in the backbone WAN much more than conventional algorithms.
Published: 2005

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Journal

Database

Publisher

166 results on '"Jan-Ming Ho"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources